Identifying BGP convergence bottleneck

Patrick.deNiet at os3.nl Patrick.deNiet at os3.nl
Thu Jun 22 10:26:03 CEST 2017


Hi everyone!

I am currently looking into the performance of BIRD (bgp) as a
route-server with ~700 peers with 10k prefixes each. I'm noticing an
increase in convergence time as I increase these numbers (which is not a
surprise).

Currently, I am forcing these convergence times by flapping the link which
causes all peers to start sending updates at the same time. My particular
interest here is to find out what action specifically is causing this
convergence time. With convergence time I mean the time it takes for the
bird to process all updates and drop down from 100% cpu to a more "idle"
level.

In order to identify what is causing this, I'm looking to mark the start
and end of each phase as described in RFC4271 9.1. Decision Process. This
way I may be able to get an idea of where the cpu time in spent during all
this (best path calc, sending out updates, etc..).

Unfortunately, I do not have a deep enough understanding of the code and
have not managed to identify these points. Is anyone here able to give
some pointers as to where in the code we could place these markings to
measure this?

Other comments and insights are also welcome.

Kind regards,
Patrick



More information about the Bird-users mailing list