[PATCH] BGP: Do not start connect retry timer if connect immediately succeeds
Ze Xia
billxia135 at gmail.com
Fri Oct 17 21:48:03 CEST 2025
On Fri, Oct 17, 2025 at 10:57 PM Maria Matejka <maria.matejka at nic.cz> wrote:
>
> Hello Ze Xia,
>
> this looks like a real bug, yet I'm not sure whether we happen to observe it in real world often. Please, do you have any instructions how to trigger it reliably so that we can add it to our CI?
>
> Thanks,
> Maria
>
I tried to trigger it "naturally" by creating 2 bird daemons connected
through veth-pair, this fails to reproduce the bug. According to
strace, connect() always returns -1 with errno = EINPROGRESS.
However, I figured out that I can wait a little while for connect() to
success by preloading a custom dynamically-linked library. My current
implementation:
#include <dlfcn.h>
#include <errno.h>
#include <poll.h>
#include <sys/socket.h>
// milliseconds
#define MAX_CONNECT_BLOCKTIME 10
typedef int (*connect_t)(int, const struct sockaddr *, socklen_t);
__attribute__((visibility("default"))) int connect(int sock, const
struct sockaddr *addr, socklen_t len)
{
int orig_errno = errno;
connect_t true_connect = dlsym(RTLD_NEXT, "connect");
int r = true_connect(sock, addr, len);
if (!(addr->sa_family == AF_INET && r == -1 && errno == EINPROGRESS))
return r;
struct pollfd fds[1] = {{.fd = sock,
.events = POLLOUT | POLLERR | POLLHUP}};
int poll_res = poll(fds, 1, MAX_CONNECT_BLOCKTIME);
if (poll_res == 0)
{
errno = EINPROGRESS;
return -1;
}
int err;
socklen_t errlen = sizeof(err);
getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen);
if (err == 0)
{
errno = orig_errno;
return 0;
}
else
{
errno = err;
return -1;
}
}
Compile it with:
gcc bird-preload.c -fPIC -fvisibility=hidden -shared -o libpreload.so
Then write the absolute path of libpreload.so to /etc/ld.so.preload
(man ld.so for more information about LD_PRELOAD). I started 2 bird
daemons inside a docker container with config file as in attachment of
this mail, and connected them with veth-pair. When this libpreload.so
is preloaded, the connect retry timer (2s) should fire every time and
tears down the connection, causing a reconnection, which can be
checked in the debug log.
With the libpreload.so, bird should behave just like the thread does
not get scheduled for a while (<10ms) when calling connect(), it seems
to have no other side-effect to me. I'm not sure does this fits in
your CI workflow though. Hope this helps!
Regards,
Ze Xia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node_1.conf
Type: application/octet-stream
Size: 343 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20251018/89795ab5/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node_2.conf
Type: application/octet-stream
Size: 342 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20251018/89795ab5/attachment-0001.obj>
More information about the Bird-users
mailing list