ospf: bird stuck in R state

Thu Dec 18 15:53:50 CET 2008

Hi :)

The following is a bird bug with some investigation but no good
solution yet.  Could bird developers please take a look and advise?

Thanks,

-- sizif

We are running ospf with bird 1.0.12 and from time to time discover
bird stuck in R state:

# ps auxw | grep bird
root     14269 56.9  0.0  3088 2064 ?        Rs   Dec16 690:56 /usr/sbin/bird

I have approached the process with gdb and have found the following:

0x0807b742 in s_merge (from=0x80e0e54, to=0x80d118c) at slists.c:35
(gdb) bt
#0  0x0807b742 in s_merge (from=0x80e0e54, to=0x80d118c) at slists.c:35
#1  0x0807b6e3 in s_rem_node (n=0x80e0e54) at slists.c:125
#2  0x08066fdb in lsa_install_new (lsa=0xbf983c80, body=0x80c8fc0,
    oa=0x8096340) at /home/sizif/bird/./proto/ospf/lsalib.c:459
#3  0x0806568a in ospf_lsupd_receive (ps=0x80b56e4, ifa=0x8099570,
    n=0x80b3bf8) at /home/sizif/bird/./proto/ospf/lsupd.c:487
#4  0x08060553 in ospf_rx_hook (sk=0x80b4310, size=64)
    at /home/sizif/bird/./proto/ospf/packet.c:346
#5  0x08074c27 in sk_read (s=0x80b4310) at io.c:1212
#6  0x080751af in io_loop () at io.c:1378
#7  0x080779eb in main (argc=3, argv=0xbf983fd4) at main.c:458

Here is the spot in s_merge where we cycle:

   /* Really merging */
   while (g->next)
>    g = g->next;

(gdb) p /x *g
$1 = {prev = 0x80e0e54, null = 0x0, next = 0x80ab1d0, node = 0x0}
(gdb) display /x g
$2 = 0x80ab1d0

That is, the list of slist readers is trivially cycled.  0x80ab1d0 is
the address of n->dbsi field of one of the neighbors:

(gdb) p /x *(struct ospf_neighbor *)0x80ab190
$31 = {n = {next = 0x80abc38, prev = 0x80aa6e8}, pool = 0x80ab160,
  ifa = 0x8099570, state = 0x4, inactim = 0x80ab848, imms = {byte = 0x2,
    bit = {ms = 0x0, m = 0x1, i = 0x0, padding = 0x0}}, dds = 0xcf4d0c79,
  ddr = 0xb846f866, myimms = {byte = 0x7, bit = {ms = 0x1, m = 0x1, i = 0x1,
      padding = 0x0}}, rid = 0xc0a84909, ip = 0xc0a84909, priority = 0xa,
  options = 0x2, dr = 0xc0a8491e, bdr = 0xc0a84930, adj = 0x0, dbsi = {
    prev = 0x80e0e54, null = 0x0, next = 0x80ab1d0, node = 0x0}, lsrql = {
    head = 0x80ab1e4, null = 0x0, tail = 0x80ab1e0,
    tail_readers = 0x80ab1f4}, lsrqh = 0x80ab8b8, lsrqi = {prev = 0x80ab1e4,
    null = 0x0, next = 0x0, node = 0x80ab1e4}, lsrtl = {head = 0x80e9f8c,
    null = 0x0, tail = 0x80e9e4c, tail_readers = 0x80ab214}, lsrti = {
    prev = 0x80ab208, null = 0x0, next = 0x0, node = 0x80ab208},
  lsrth = 0x80aba50, ldbdes = 0x80ab268, rxmt_timer = 0x80ab878, ackl = {{
      head = 0x80ab234, null = 0x0, tail = 0x80ab230}, {head = 0x80ab240,
      null = 0x0, tail = 0x80ab23c}}, ackd_timer = 0x80abbd8, csn = 0x0}

Another interesing observation is that both nodes given as arguments
to s_merge have identical readers field:

#1  0x0807b6e3 in s_rem_node (n=0x80e0e54) at slists.c:125
(gdb) p /x *n
$38 = {next = 0x80d118c, prev = 0x809a7b4, readers = 0x80ab1d0}
(gdb) p /x *(snode *)0x80d118c
$40 = {next = 0x80e0f44, prev = 0x809a7b4, readers = 0x80ab1d0}
(gdb) p /x *(snode *)0x80e0f44
$41 = {next = 0x80bcdec, prev = 0x80d118c, readers = 0x0}
(gdb) p /x *(snode *)0x80bcdec
$42 = {next = 0x80b6db4, prev = 0x80e0f44, readers = 0x0}
(gdb) p /x *(snode *)0x80b6db4
$43 = {next = 0x80e1174, prev = 0x80bcdec, readers = 0x0}
(gdb) p /x *n
$44 = {next = 0x80d118c, prev = 0x809a7b4, readers = 0x80ab1d0}
(gdb) p /x *(snode *)0x809a7b4
$45 = {next = 0x80d118c, prev = 0x809a764, readers = 0x0}
(gdb) p /x *(snode *)0x809a764
$46 = {next = 0x809a7b4, prev = 0x809a714, readers = 0x0}

How can an iterator be referenced by more than one slist node?  It
looks like someone did s_put(&(n->dbsi), ...) while the previous
iteration has not been completed.

The s_get...s_put pair in ospf_dbdes_send seems sound and safe.
I think the culprit is ospf_neigh_sm who does s_init(&(n->dbsi)) when
entering NEIGHBOR_EXCHANGE:

  case INM_NEGDONE:
    if (n->state == NEIGHBOR_EXSTART)
    {
      neigh_chstate(n, NEIGHBOR_EXCHANGE);
      s_init(&(n->dbsi), &po->lsal);

The s_init does good when the neighbor enters NEIGHBOR_EXCHANGE for
the first time.  But if the neighbor enters NEIGHBOR_EXCHANGE again
without finishing the n->dbsi run through po->lsal, we will get
exactly the effect we observe: n->dbsi relinked to the head node of
po->lsal without unlinking it from the node it currently points to.

And a neighbor can indeed leave NEIGHBOR_EXCHANGE with unfinished run
through po->lsal for a number of reasons, with INM_SEQMIS or
INM_BADLSREQ.

One solution I can think of is to keep n->dbsi in s_put state
*always*, except for a short s_get...s_put fragment in ospf_dbdes_send
where there are no state transitions.  Do initial s_init(&(n->dbsi))
in ospf_neighbor_new.  Do s_get before s_init in the ospf_neigh_sm
fragment cited above.  Do s_put unconditionally in ospf_dbdes_send.

Your thoughts?