[PATCH 0/5] IP checksum improvements
Joakim Tjernlund
joakim.tjernlund at transmode.se
Mon Apr 26 01:57:29 CEST 2010
>
> > Ondrej Zajicek <santiago at crfreenet.org> wrote on 2010/04/25 23:20:52:
> > >
> > > On Sun, Apr 25, 2010 at 11:41:17AM +0200, Joakim Tjernlund wrote:
> > > > Here are a series of performance improvements on the
> > > > Internet checksum. With these changes applied I get about
> > > > 20-30% better performance on x86 and PowerPC.
> > >
> > > Although i agree with Martin Mares that such kind of optimizations
> > > should be done mainly if we know (from profiling) that BIRD spends
> > > a significant share of time (during update processing) in that function,
> > > i did some changes to the checksum function and merged some of these
> > > patches.
> > >
> > > I did some more optimizations (changing the loop condition, removing len
> > > decrement) and together with your change to add32 i got two times faster
> > > checksum function (on x86) than the old code. Changing postincrement to
> > > preincrement leads to worse results (only 1.4 times faster than the old
> > > code) so i kept postincrement.
> >
> > On x86? That is strange. On x86 that should only lead to one
> > extra add outside the loop, or so I think.
>
> Ah, now I think I know. The while(buf < end) is optimized for
> post inc so that is why.
>
> I do think performance is worse on every other arch as the above is probably
> very x86 tuned.
tested little and was surprised, only 3-5% slower with the while loop
compared to my for loop, it is mainly the post increment that does that.
On x86 I can hardly see any difference between post and pre inc.
However, gcc won't inline add32 as it is too big on ppc and that
is a disaster. Could you add inline to add32?
Jocke
More information about the Bird-users
mailing list