Will bird block on syslog() call?

Pavlos Parissis pavlos.parissis at gmail.com
Fri Feb 24 20:48:48 CET 2017


On 24/02/2017 04:12 μμ, Ondrej Zajicek wrote:
> On Fri, Feb 24, 2017 at 01:13:55PM +0100, Pavlos Parissis wrote:
>> Hi,
>>
>> We have observed some instability on BFD protocol, where upstream router and/or
>> the server (Linux RedHat 7.3) declares the BFD session dead and as consequence
>> upstream router stops forwarding traffic to the server (we utilize ECMP).
>>
>> Our current hypothesis is that Bird log messages (only BGP KEEPALIVE messages
>> when there isn't any route change) via syslog glibc function, which connects to
>> UNIX socket (/dev/log) and the sender (Bird daemon) may block when the receiver
>> (rsyslogd) doesn't response fast enough or the buffer is full.
>>
>> On RedHat 7 servers there is a chain of daemons, which receive log messages via
>> UNIX socket.
>>
>> systemd-journald.service listens on /dev/log UNIX SOCKET and forwards messages
>> to /run/systemd/journal/syslog UNIX SOCKET where rsyslogd listens on.
>>
>> As far as I can see in the code and in the output of ps -eLl, Bird daemon is a
>> single threaded process (please correct me if I am wrong), so it could be that a
>> call to syslog blocks for X seconds when X is higher than the failure detection
>> time.
> 
> Hi
> 
> BIRD is single-threaded with the exception of BFD, which runs in a
> separate thread. Generally, interaction of BFD thread with the rest of
> BIRD is designed in a way that BFD thread should not wait on the main
> thread. So generally, the main thread blocked on syslog() should not
> cause problems to the BFD thread. There are some exceptions, like when
> the BFD thread wants to log itself (there is shared mutex around logging
> subsystem), but that is usually not a problem, as BFD do not log anything
> during regular operation (unless packet logging is enabled).
> 
> I would suggest to decrease min rx/tx interval to 100 ms (to see if that
> helps). 

If the hypothesis holds true, that is Bird blocks for 1.2secs, then sending BFD
messages at higher rate wont help. Do you think so ?

I could try the opposite, configure the upstream router to declare the BFD down
only after hasn't seen a BFD message for a period of 5seconds.

> And you could try 'watchdog warning' / 'debug latency' options
> (with appropriate values, like 500 ms) to track latency in the main
> thread to see if BFD problems are related to eventual latency problems in
> the main thread.
> 

Unfortunately, I still run 1.4.5 version, which doesn't those options, thus I
can't experiment with them. I guess this is yet another reason for upgrading to 1.6.3.

Thanks a lot for your reply, it is very much appreciated.

Cheers,
Pavlos

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20170224/ff8199a4/attachment.asc>


More information about the Bird-users mailing list