[PATCH] BGP: Do not start connect retry timer if connect immediately succeeds

Alexander Zubkov green at qrator.net
Sat Oct 18 11:44:40 CEST 2025


Hi all,

My guess is that it could depend on socket type. I.e. nonblocking tcp
socket in Linux might always return EINPROGRESS, but for unix socket other
situations might be possible.

Regards,
Alexander

On Fri, Oct 17, 2025, 22:02 Ze Xia <billxia135 at gmail.com> wrote:

> On Fri, Oct 17, 2025 at 10:57 PM Maria Matejka <maria.matejka at nic.cz>
> wrote:
> >
> > Hello Ze Xia,
> >
> > this looks like a real bug, yet I'm not sure whether we happen to
> observe it in real world often. Please, do you have any instructions how to
> trigger it reliably so that we can add it to our CI?
> >
> > Thanks,
> > Maria
> >
>
> I tried to trigger it "naturally" by creating 2 bird daemons connected
> through veth-pair, this fails to reproduce the bug. According to
> strace, connect() always returns -1 with errno = EINPROGRESS.
>
> However, I figured out that I can wait a little while for connect() to
> success by preloading a custom dynamically-linked library. My current
> implementation:
>
> #include <dlfcn.h>
> #include <errno.h>
> #include <poll.h>
> #include <sys/socket.h>
>
> // milliseconds
> #define MAX_CONNECT_BLOCKTIME 10
>
> typedef int (*connect_t)(int, const struct sockaddr *, socklen_t);
>
> __attribute__((visibility("default"))) int connect(int sock, const
> struct sockaddr *addr, socklen_t len)
> {
>     int orig_errno = errno;
>
>     connect_t true_connect = dlsym(RTLD_NEXT, "connect");
>     int r = true_connect(sock, addr, len);
>     if (!(addr->sa_family == AF_INET && r == -1 && errno == EINPROGRESS))
>         return r;
>
>     struct pollfd fds[1] = {{.fd = sock,
>                              .events = POLLOUT | POLLERR | POLLHUP}};
>     int poll_res = poll(fds, 1, MAX_CONNECT_BLOCKTIME);
>     if (poll_res == 0)
>     {
>         errno = EINPROGRESS;
>         return -1;
>     }
>     int err;
>     socklen_t errlen = sizeof(err);
>     getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen);
>     if (err == 0)
>     {
>         errno = orig_errno;
>         return 0;
>     }
>     else
>     {
>         errno = err;
>         return -1;
>     }
> }
>
> Compile it with:
>
> gcc bird-preload.c -fPIC -fvisibility=hidden -shared -o libpreload.so
>
> Then write the absolute path of libpreload.so to /etc/ld.so.preload
> (man ld.so for more information about LD_PRELOAD). I started 2 bird
> daemons inside a docker container with config file as in attachment of
> this mail, and connected them with veth-pair. When this libpreload.so
> is preloaded, the connect retry timer (2s) should fire every time and
> tears down the connection, causing a reconnection, which can be
> checked in the debug log.
>
> With the libpreload.so, bird should behave just like the thread does
> not get scheduled for a while (<10ms) when calling connect(), it seems
> to have no other side-effect to me. I'm not sure does this fits in
> your CI workflow though. Hope this helps!
>
> Regards,
> Ze Xia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20251018/2a06c058/attachment.htm>


More information about the Bird-users mailing list