100% CPU load with device scanning enabled
Saso Tavcar
fast at ais42.net
Mon May 6 20:40:45 CEST 2019
Hi,
this is an OVS issue, already discussed:
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html>
...
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html>
Official OVS quote:
> We'd accept patches to improve OVS's routing table code. It's not
> designed to scale to 1,800,000 routes. We'd also take code to suppress
> the routing table code in cases where it isn't actually needed, since
> it's not always needed. But we can't take a patch to just delete it;
> I'm sure you understand.
I tried to apply this patch at that time, but was already useless for newer versions:
https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin <https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin>
Our workaround was to scale VM with 3 vCPU-s, since our average system load is 1.5 for BGP.
You can see what is happening:
[root at bgp1 ~]# top
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
654 root 10 -10 1284492 1.0g 20276 R 98.0 27.0 2513:01 ovs-vswitchd
16 root 20 0 0 0 0 S 2.0 0.0 24:45.60 ksoftirqd/1
[root at bgp1 ~]# ip route show
...
1.0.0.0/24 via 89.212.47.185 dev t2-v24-ha proto bird
1.0.4.0/24 via 89.212.47.185 dev t2-v24-ha proto bird
1.0.4.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
1.0.5.0/24 via 89.212.47.185 dev t2-v24-ha proto bird
Routes being constantly added and deleted:
[root at bgp1 ~]# ip monitor
...
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird
68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 2.16.70.0/23 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 88.221.28.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 23.50.188.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 92.122.68.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 88.221.100.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
Deleted 92.123.208.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
.....
Regards,
saso
> On 6 May 2019, at 19:30, Kees Meijs <kees at nefos.nl <mailto:kees at nefos.nl>> wrote:
>
> Hi list,
>
> We're in the process of replacing Quagga with BIRD but stumble upon a
> little problem.
>
> When device scanning is on (obviously default) our testing machine
> completely fills up a CPU core. The culprit isn't BIRD itself but an
> Open vSwitch daemon.
>
> After disabling the device protocol and restarting BIRD, everything goes
> back to it's quiet state.
>
> BIRD (1.6.3-2) and Open vSwitch (2.6.2~pre+git20161223-3) both were
> installed as Debian stable packages.
>
> The configuration is as simple as:
>
>> # This is a minimal configuration file, which allows the bird daemon
>> to start
>> # but will not cause anything else to happen.
>> #
>> # Please refer to the documentation in the bird-doc package or BIRD User's
>> # Guide on http://bird.network.cz/ <http://bird.network.cz/> for more information on configuring
>> BIRD and
>> # adding routing protocols.
>>
>> # Change this into your BIRD router ID. It's a world-wide unique
>> identification
>> # of your router, usually one of router's IPv4 addresses.
>> router id 1.2.3.4;
>>
>> # The Device protocol is not a real routing protocol. It doesn't
>> generate any
>> # routes and it only serves as a module for getting information about
>> network
>> # interfaces from the kernel.
>> protocol device {
>> }
>>
>> # The Kernel protocol is not a real routing protocol. Instead of
>> communicating
>> # with other routers in the network, it performs synchronization of BIRD's
>> # routing tables with the OS kernel.
>> protocol kernel {
>> metric 64; # Use explicit kernel route metric to avoid collisions
>> # with non-BIRD routes in the kernel routing table
>> import none;
>> export all; # Actually insert routes into the kernel routing table
>> }
>>
>> protocol bgp test {
>> description "BGP test";
>> local as REDACTED;
>> neighbor 1.2.3.4 as REDACTED;
>> direct;
>> next hop self;
>> deterministic med on;
>> export none;
>> import all;
>> }
>
> Meanwhile log messages such as below arise:
>
>> bird: Kernel dropped some netlink messages, will resync on next scan.
>
> For a test I deleted all existing Open vSwitch bridges and the load
> dropped again. After adding an empty new bridge, the load spikes again
> in an instant.
>
> This is unexpected behaviour. Maybe it's an implementation problem in
> Open vSwitch or maybe in BIRD. Anyway, it should happen I guess.
>
> Any clues?
>
> Thanks in advance!
>
> Regards,
> Kees
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20190506/8474c339/attachment.htm>
More information about the Bird-users
mailing list