bird under heavy cpu load

Ondrej Zajicek santiago at crfreenet.org
Tue Mar 27 18:44:28 CEST 2012


On Tue, Mar 27, 2012 at 07:13:11PM +0400, Alexander V. Chernikov wrote:
> On 26.03.2012 03:25, Ondrej Zajicek wrote:
>> On Mon, Mar 12, 2012 at 11:22:10PM +0400, Oleg wrote:
>>> On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
>>>> On 12.03.2012 13:25, Oleg wrote:
>>>>> Hi, all.
>>>>>
>>>>> I have some experience with bird under heavy cpu load. I had a
>>>>> situation when bird do frequent updates of kernel table, because
>>>>> of bgp-session frequent down/up (because of cpu and/or net load).
>>
>> Hello
>>
>> Answering collectively for the whole thread:
>>
>> I did some preliminary testing and it on my test machine exporting full
>> BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and
>> 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore,
>> if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes
>> per second) for Linux and 40 kr/s for BSD, this still seems more than
>> enough for an edge router. Are there any estimates (using protocol statistics)
>> for number of updates to kernel proto in this case? How many protocols,
>> tables and ppie do you have in your case?
>>
>> The key to responsiveness (and ability to send keepalives on time)
>> during heavy CPU load is in granularity. The main problem in BIRD is
>> that whole route propagation is done synchronously - when route is
>> received, it is propagated through all pipes and all routing tables to
>> all final receivers in one step, which is problematic if you have
>> several hundreds of BGP sessions (but probably not too problematic with
> I've be been playing with BGP/core code in preparations for peer-groups  
> implementation.
>
> Setup: 1 peer with full-view (1), 1 peer as full-view receiver (10.
> both are disabled by default. We're starting bird, enables peer 1.
> After full-view is received we enables second peer.
>
> Some bgp bucket statistics:
> max_feed: 256 iterations: 1551 buckets: 362184 routes: 397056 effect: 8%
> max_feed: 512 iterations: 775 buckets: 351902 routes: 396800 effect: 11%
> max_feed: 1024 iterations: 387 buckets: 335773 routes: 396288 effect: 15%
> max_feed: 2048 iterations: 193 buckets: 300434 routes: 395264 effect: 23%
> max_feed: 4096 iterations: 96 buckets: 255752 routes: 393216 effect: 34%
> max_feed: 8192 iterations: 48 buckets: 216780 routes: 393216 effect: 44%
>
> 'Effect' means (routes - buckets) * 100 / routes e.g. how much prefixes  
> are stored in existing buckets.
>
> Maybe we can consider making max_feed value to be auto-tuned ?
> e.g. to be 8 or 16k for small total amount of protocols.
> If we assume max_feed * proto_count to be const (which keeps granularity  
> at the same level), and say that we use default feed (256) for 256  
> protocols, we can automatically recalculate max_feed on every {  
> configure, protocol enabled/disabled state change / whatever }

Is there any point to trying to achieve efficient route packing to buckets?
Most rx processing is done per route, so buckets just save some TCP data
transfers.

BTW, these results depends on many things like how big buffers kernel
has for TCP and how fast the other side is able to acknowledge receiving
data. I guess that first, BIRD probably flushes buckets from BGP to TCP
as fast as they are generated with minimal packing (depending mostly on
granularity or max_feed), later TCP buffers became full, sending updates
is postponed and BGP bucket cache started to fill (you could see that in
'show memory').

If you want to get efficient packing, probably most elegant solution
would be to add some delay (like 2 s) before activating/scheduling
sending BGP update packets. Or some smart approach, like if BGP bucket
cache contains at least x buckets, schedule updates immediately,
otherwise schedule them after 2 s.

>> Another possible problem is a kernel scan, which is also done
>> in one step, but at least in Linux it could probably be splitted to
>> smaller steps and does not took too much time if the kernel table is
>> in expected state.
> ...
> CLI interface can easily be another abuser:
> bird> show route count
> 2723321 of 2723321 routes for 407158 networks
>
> If I do 'show route' for such table this can block bird for 10? seconds.

Really? show route processing is splitted per 64 routes, so i suppose
that only the CLI session is blocked.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20120327/f746fc7e/attachment-0001.asc>


More information about the Bird-users mailing list