[PATCH 0/5] IP checksum improvements
Joakim Tjernlund
joakim.tjernlund at transmode.se
Mon Apr 26 10:46:53 CEST 2010
Martin Mares <mj at ucw.cz> wrote on 2010/04/26 10:31:31:
>
> Hello!
>
> > while(buf != end) got worse in ppc. gcc 4.3.4 got even more worse
> > than gcc 3.4.6. I think it is safe to say that gcc 4.3.4 is busted when
> > it comes to optimization, even on x86. Seen -O1 do better than -O2 for
> > x86 with gcc 3.4.3.
>
> BTW have you tried unrolling the loops or using __attribute__((hot))?
Not ATM, will probably don't do much on ppc. Its branch prediction
makes the loop "free" when do a for(;len;--len) loop.
>
> > Since gcc in general isn't very good at optimization I think the best bet
> > is to have different loops for different archs. I seen people do that based on
> > endian:
> > #ifdef CPU_BIG_ENDIAN
> > for(buf--; len, --len)
> > sum = acc32(sum, *++buf);
> > #else
> > while(buf != end)
> > sum = add32(sum, *buf++);
> > #endif
>
> Huh, what should do endianity have in common with the choice of pre-/postincrement?
Because most archs that can deal with preinc. are big endian, the
for loop is important too. Decrement and test for zero is basically free.
More information about the Bird-users
mailing list