[PATCH] ipsum_calc_block: Optimize size and speed
Joakim Tjernlund
joakim.tjernlund at transmode.se
Fri Apr 23 22:19:23 CEST 2010
Ondrej Zajicek <santiago at crfreenet.org> wrote on 2010/04/23 21:39:06:
>
> On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote:
> > Martin Mares <mj at ucw.cz> wrote on 2010/04/23 19:23:18:
> > >
> > > Hello!
> > >
> > > > > > So there isn't really difference in performance of both
> > > > > > implementations. Even on slow embedded AMD Geode CPU, it gives
> > > > > > ~ 180 MB/s.
> > > >
> > > > No difference? what does 1.2 mean? to me this means 20% which is a lot
> > >
> > > Yes, but according to Santiago's benchmarks, your code is sometimes 20%
> > > faster, sometimes 20% slower. It does not seem like a reason for change.
> >
> > uhh, 20% slower? Ahh now I see, the MIPS. That is really strange. Santiago, are
> > you sure that is not a typo?
>
> FYI, code z = sum + x, z + (z < sum) was compiled to:
>
> addu $2,$3,$2
> sltu $3,$2,$3
> addu $3,$2,$3
OK, MIPS has always been a strange platform to me.
So I had to test myself again:
x84 Core 2 duo, 3.1 MHz:
New code:
64 byte buffer: 5899 +/-2.3%
128 byte buffer: 5570 +/-3.1%
256 byte buffer: 5797 +/-0.3%
512 byte buffer: 5501 +/-1.1%
1024 byte buffer: 5357 +/-1.5%
2048 byte buffer: 5277 +/-0.6%
4096 byte buffer: 5249 +/-1.2%
8192 byte buffer: 5245 +/-2.1%
16384 byte buffer: 5221 +/-1.6%
Old code:
64 byte buffer: 7237 +/-0.4%
128 byte buffer: 6505 +/-1.7%
256 byte buffer: 6075 +/-1.6%
512 byte buffer: 6120 +/-1.6%
1024 byte buffer: 5773 +/-8.2%
2048 byte buffer: 5790 +/-2.0%
4096 byte buffer: 5474 +/-0.7%
8192 byte buffer: 5679 +/-47.1%
16384 byte buffer: 5339 +/-1.3%
PowerPC MPC 8321, 266 Mhz
New Code:
64 byte buffer: 68349 +/-8.0%
128 byte buffer: 58271 +/-8.7%
256 byte buffer: 52945 +/-8.4%
512 byte buffer: 50535 +/-8.6%
1024 byte buffer: 49288 +/-9.6%
2048 byte buffer: 48984 +/-10.3%
4096 byte buffer: 48345 +/-8.6%
8192 byte buffer: 48127 +/-8.4%
Old Code:
64 byte buffer: 68349 +/-8.0%
128 byte buffer: 58271 +/-8.7%
256 byte buffer: 52945 +/-8.4%
512 byte buffer: 50535 +/-8.6%
1024 byte buffer: 49288 +/-9.6%
2048 byte buffer: 48984 +/-10.3%
4096 byte buffer: 48345 +/-8.6%
8192 byte buffer: 48127 +/-8.4%
Just for fun, replace add32 with
static inline
unsigned long
add32(unsigned long sum, unsigned long x)
{
asm ("addc %0, %0, %1": "=r"(sum) : "r" (x));
return sum;
}
MPC 8321 with asm addc:
64 byte buffer: 52007 +/-8.7%
128 byte buffer: 41986 +/-9.9%
256 byte buffer: 37160 +/-11.4%
512 byte buffer: 34593 +/-10.3%
1024 byte buffer: 33265 +/-10.4%
2048 byte buffer: 32648 +/-11.4%
4096 byte buffer: 32843 +/-14.1%
8192 byte buffer: 32223 +/-12.5%
So the new code is better on both platforms and the asm addc on ppc is
very fast.
Test prog attached.
Jocke
(See attached file: crc32test.c)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crc32test.c
Type: application/octet-stream
Size: 2526 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20100423/78a8eea7/attachment-0001.obj>
More information about the Bird-users
mailing list