[BMP] BIRD socket best practice

Maslanka, Pawel pmaslank at akamai.com
Mon Jun 14 15:48:12 CEST 2021


Hi Ondrej,

I would like to ask you to review changes designed to improve error handling caused by interrupted connection to the BMP collector. Additionally I cleaned up code form unnecessary conditions which I replaced with 'proto_state' checking. I added patch in attachment. This patch is fully compatible with 'bmp' branch on BIRD gitlab repo.
Of course, I would be happy if these changes could be merge to 'bmp' branch if will pass review successfully :)

Thanks,
----
 
Pawel Maslanka
Senior Software Engineer

Office: +1.617.444.1234
Cell: +1.617.444.1234
Akamai Technologies
150 Broadway
Cambridge, MA 02142
Connect with Us:
 <https://community.akamai.com/>  <http://blogs.akamai.com/>  <https://twitter.com/akamai>  <http://www.facebook.com/AkamaiTechnologies>  <http://www.linkedin.com/company/akamai-technologies>  <http://www.youtube.com/user/akamaitechnologies?feature=results_main>
 

On 6/7/21, 11:26 PM, "Maslanka, Pawel" <pmaslank at akamai.com> wrote:

    Hi Ondrej,

    Thank you for your input! This is valuable info for me. I will keep you update if I will work on that.

    Thanks,
    ----

    Pawel Maslanka
    Senior Software Engineer

    Office: +1.617.444.1234
    Cell: +1.617.444.1234
    Akamai Technologies
    150 Broadway
    Cambridge, MA 02142
    Connect with Us:
     <https://community.akamai.com/>  <http://blogs.akamai.com/>  <https://urldefense.com/v3/__https://twitter.com/akamai__;!!GjvTz_vk!AULVp9falRs1O6mA6nbqwMu-HoRB7NSqK4HpbAvUEFAkeIPgJ1FebljbqIgZnmU$ >  <https://urldefense.com/v3/__http://www.facebook.com/AkamaiTechnologies__;!!GjvTz_vk!AULVp9falRs1O6mA6nbqwMu-HoRB7NSqK4HpbAvUEFAkeIPgJ1FebljbH9LmKKA$ >  <https://urldefense.com/v3/__http://www.linkedin.com/company/akamai-technologies__;!!GjvTz_vk!AULVp9falRs1O6mA6nbqwMu-HoRB7NSqK4HpbAvUEFAkeIPgJ1FebljbsTxuAxQ$ >  <https://urldefense.com/v3/__http://www.youtube.com/user/akamaitechnologies?feature=results_main__;!!GjvTz_vk!AULVp9falRs1O6mA6nbqwMu-HoRB7NSqK4HpbAvUEFAkeIPgJ1FebljbSbik1HE$ >


    On 6/7/21, 2:33 AM, "Ondrej Zajicek" <santiago at crfreenet.org> wrote:

        On Thu, Jun 03, 2021 at 11:19:32PM +0000, Maslanka, Pawel wrote:
        > Hi BIRD team!
        > 
        > We found a case when BMP code is trying to connect with BMP collector service with sk_open(), this causes increasing CPU utilization. To reproduce this case, you have just:
        > 
        >   1.  Server machine where BMP PDU packets will be sent, should be reachable (so it can be pinged).
        >   2.  BMP collector service itself should not be running on this server.
        >   3.  Run BIRD with enabled BMP protocol.
        > 
        > After that you should observe that BIRD process has significantly increased CPU utilization. This is related somehow with “BIRD socket” because when I capture network traffic on host machine (where BIRD is running), I can see massive amount of TCP packets which are exchange between BIRD host machine and BMP collector machine. At the moment socket type related with BMP connection is SK_TCP_ACTIVE.
        > Do you have any idea what is going wrong or how BIRD socket should be properly use?

        Hi

        After failed attempt to connect() the socket err_hook is called. In such
        case err_hook is called and you are supposed to close the socket and
        either disable the protocol, or setup some timeout to restart connect
        attempts. See rpki_err_hook() or bgp_sock_err(). Otherwise, BIRD socket
        layer would try to connect() immediately again.

        This part is missing from bmp_sock_err() in our bmp branch, i should
        fix that. It is still WiP.

        > I need also a tip if there is a way to get notification from BIRD
        > socket if we lost connection with BMP collector service? One option is to

        If a connection is closed regularly, then socket err_hook is called,
        but with err=0.p_sock_err(). In most cases the handling would
        be similar to an actual error (try to re-establish connection after
        some timeout).

        > Currently we have switched to BMP code provided on bmp branch from gitlab BIRD repo.
        > 
        > Additionally I have a question referring to enclosed code. Can I free list node and node data itself when sk_send() returns value greater or equal to 0 (>= 0), like in the below code?
        > 
        >   WALK_LIST_DELSAFE(tx_data, tx_data_next, p->tx_queue)
        >   {
        >     ...
        >     rv = sk_send(p->sk, data_size);
        >     if (rv < 0) {
        >       return;
        >     }
        > 
        >     mb_free(tx_data->data);
        >     rem_node((node *) tx_data);
        >     mb_free(tx_data);
        >     if (rv == 0) {
        >       return;
        >     }
        >     ...
        > 
        > Or I should to do that only if sk_send() return value greater than 0 (> 0) ? My goal is sending all data from list if there was only "temporary" problem with sk_send().

        This looks OK. If sk_send() returns > 0, data were sent, you can free the
        data and continue the loop. If sk_send() returns 0, data were not sent,
        but they stay in sk->tbuf, so you can free the data from your tx_queue,
        and break the loop and wait for tx_hook to happen again.

        -- 
        Elen sila lumenn' omentielvo

        Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
        OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
        "To err is human -- to blame it on a computer is even more so."



-------------- next part --------------
A non-text attachment was scrubbed...
Name: bird_bmp_connection_error_handling.patch
Type: application/octet-stream
Size: 16256 bytes
Desc: bird_bmp_connection_error_handling.patch
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20210614/0a9e6f01/attachment.obj>


More information about the Bird-users mailing list