[PATCH] ipsum_calc_block: Optimize size and speed

Fri Apr 23 22:19:23 CEST 2010

Ondrej Zajicek <santiago at crfreenet.org> wrote on 2010/04/23 21:39:06:
>
> On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote:
> > Martin Mares <mj at ucw.cz> wrote on 2010/04/23 19:23:18:
> > >
> > > Hello!
> > >
> > > > > > So there isn't really difference in performance of both
> > > > > > implementations. Even on slow embedded AMD Geode CPU, it gives
> > > > > > ~ 180 MB/s.
> > > >
> > > > No difference? what does 1.2 mean? to me this means 20% which is a lot
> > >
> > > Yes, but according to Santiago's benchmarks, your code is sometimes 20%
> > > faster, sometimes 20% slower. It does not seem like a reason for change.
> >
> > uhh, 20% slower? Ahh now I see, the MIPS. That is really strange. Santiago, are
> > you sure that is not a typo?
>
> FYI, code z = sum + x, z + (z < sum) was compiled to:
>
> addu    $2,$3,$2
> sltu    $3,$2,$3
> addu    $3,$2,$3

OK, MIPS has always been a strange platform to me.
So I had to test myself again:
x84 Core 2 duo, 3.1 MHz:
New code:
   64 byte buffer:   5899 +/-2.3%
  128 byte buffer:   5570 +/-3.1%
  256 byte buffer:   5797 +/-0.3%
  512 byte buffer:   5501 +/-1.1%
 1024 byte buffer:   5357 +/-1.5%
 2048 byte buffer:   5277 +/-0.6%
 4096 byte buffer:   5249 +/-1.2%
 8192 byte buffer:   5245 +/-2.1%
16384 byte buffer:   5221 +/-1.6%

Old code:
   64 byte buffer:   7237 +/-0.4%
  128 byte buffer:   6505 +/-1.7%
  256 byte buffer:   6075 +/-1.6%
  512 byte buffer:   6120 +/-1.6%
 1024 byte buffer:   5773 +/-8.2%
 2048 byte buffer:   5790 +/-2.0%
 4096 byte buffer:   5474 +/-0.7%
 8192 byte buffer:   5679 +/-47.1%
16384 byte buffer:   5339 +/-1.3%

PowerPC MPC 8321, 266 Mhz

New Code:
   64 byte buffer:  68349 +/-8.0%
  128 byte buffer:  58271 +/-8.7%
  256 byte buffer:  52945 +/-8.4%
  512 byte buffer:  50535 +/-8.6%
 1024 byte buffer:  49288 +/-9.6%
 2048 byte buffer:  48984 +/-10.3%
 4096 byte buffer:  48345 +/-8.6%
 8192 byte buffer:  48127 +/-8.4%

Old Code:
   64 byte buffer:  68349 +/-8.0%
  128 byte buffer:  58271 +/-8.7%
  256 byte buffer:  52945 +/-8.4%
  512 byte buffer:  50535 +/-8.6%
 1024 byte buffer:  49288 +/-9.6%
 2048 byte buffer:  48984 +/-10.3%
 4096 byte buffer:  48345 +/-8.6%
 8192 byte buffer:  48127 +/-8.4%

Just for fun, replace add32 with
static inline
unsigned long
add32(unsigned long sum, unsigned long x)
{
	asm ("addc %0, %0, %1": "=r"(sum) : "r" (x));
	return sum;
}
MPC 8321 with asm addc:
   64 byte buffer:  52007 +/-8.7%
  128 byte buffer:  41986 +/-9.9%
  256 byte buffer:  37160 +/-11.4%
  512 byte buffer:  34593 +/-10.3%
 1024 byte buffer:  33265 +/-10.4%
 2048 byte buffer:  32648 +/-11.4%
 4096 byte buffer:  32843 +/-14.1%
 8192 byte buffer:  32223 +/-12.5%

So the new code is better on both platforms and the asm addc on ppc is
very fast.

Test prog attached.

 Jocke

(See attached file: crc32test.c)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crc32test.c
Type: application/octet-stream
Size: 2526 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20100423/78a8eea7/attachment-0001.obj>