[PATCH] Bus error on ARMv7 when using OSPF

Matthew Reeve webmail at mreeve.com
Tue Aug 3 10:34:36 CEST 2021


On 28/06/2021 09:46, Matthew Reeve wrote:
>
> On 24/06/2021 13:08, Ondrej Zajicek wrote:
>> On Fri, Jun 18, 2021 at 05:06:27PM +0100, Matthew Reeve wrote:
>>> Hi, yes sure, here it is. Please let me know if this does not give 
>>> you what
>>> you need.
>>>
>>> Thanks!
>>
>> Thanks, that looks like an issue with slists. We had similar issue with
>> lists code in the past and reworked them to be more conservative. Will
>> check that.
> Great, thanks. If you want to make any changes on a branch or 
> something, I can build it and test it on my hardware if it would help.
>>
>>> root at OpenWrt:/tmp# gdb debug/bird bird.1623776146.6869.7.core
>>> GNU gdb (GDB) 10.1
>>> Copyright (C) 2020 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.
>>> Type "show copying" and "show warranty" for details.
>>> This GDB was configured as "arm-openwrt-linux".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <https://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>>      <http://www.gnu.org/software/gdb/documentation/>.
>>>
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from debug/bird...
>>> [New LWP 6869]
>>> Core was generated by `./bird'.
>>> Program terminated with signal SIGBUS, Bus error.
>>> #0  ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646
>>> 1646    proto/ospf/rt.c: No such file or directory.
>>> (gdb) bt
>>> #0  ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646
>>> #1  ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1698
>>> #2  ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1688
>>> #3  ospf_disp (timer=<optimized out>) at proto/ospf/ospf.c:468
>>> #4  0x00061574 in timers_fire (loop=0xc4878 <main_timeloop>) at
>>> lib/timer.c:235
>>> #5  0x00012ca8 in io_loop () at sysdep/unix/io.c:2195
>>> #6  main (argc=<optimized out>, argv=<optimized out>) at
>>> sysdep/unix/main.c:939
>>> (gdb)
>>>
>>> On 18/06/2021 16:16, Ondrej Zajicek wrote:
>>>> On Mon, Jun 14, 2021 at 04:25:04PM +0100, Matthew Reeve wrote:
>>>>> Hi,
>>>>>
>>>>> when using bird 2.0.8 on openwrt 21.02 (and other versions) on a 
>>>>> Netgear
>>>>> R7800 router, if the OSPF protocol is used, either v2 or v3, bird
>>>>> immediately crashes on startup with:
>>>>>
>>>>> Fri Jun 11 14:41:11 2021 daemon.info bird: Started
>>>>> Fri Jun 11 14:41:11 2021 kern.err kernel: [ 3500.853248] Alignment 
>>>>> trap: not
>>>>> handling instruction f44c0a1f at [<00035848>] Fri Jun 11 14:41:11 
>>>>> 2021
>>>>> kern.alert kernel: [ 3500.853283] 8<--- cut here ---
>>>>> Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.859363] 
>>>>> Unhandled fault:
>>>>> alignment exception (0x801) at 0x007e0624
>>>>> Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.862443] pgd = 
>>>>> 0bbef4fd
>>>>> Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.868821] [007e0624]
>>>>> *pgd=5d6ca835, *pte=5c40b75f, *ppte=5c40bc7f
>>>>>
>>>>>
>>>>> This router uses an ARMv7 processor and the issue seems to be to 
>>>>> do with
>>>>> memory alignment issues. I've debugged it and traced it to an 
>>>>> access to the
>>>>> top_hash_entry struct. I've found that if I add the PACKED macro 
>>>>> to the
>>>>> struct definition then it fixes the problem, as per this patch:
>>>> Hi
>>>>
>>>> Thanks, could you try to get backtrace from the coredump using gdb 
>>>> to see
>>>> where is the invalid access?
>>>>
>>>>
Hi Ondrej,

just wondering if you'd had a chance to look at this any further yet please?

Many thanks,

Matt.



More information about the Bird-users mailing list