Re: Cost of XLogInsert CRC calculations - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject Re: Cost of XLogInsert CRC calculations
Date
Msg-id 9EB50F1A91413F4FA63019487FCD251D11332C@WEBBASEDDC.webbasedltd.local
Whole thread Raw
In response to Re: Cost of XLogInsert CRC calculations  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Cost of XLogInsert CRC calculations  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
List pgsql-hackers
> -----Original Message-----
> From: Simon Riggs [mailto:simon@2ndquadrant.com]
> Sent: 12 May 2005 16:52
> To: Mark Cave-Ayland (External)
> Cc: 'Christopher Kings-Lynne'; 'Tom Lane'; 'Bruce Momjian';
> pgsql-hackers@postgresql.org
> Subject: RE: [HACKERS] Cost of XLogInsert CRC calculations

(cut)
> It might be possible to speed things up by a factor of two using a
> two- byte lookup table rather than a one byte lookup. This would take
> up 65536 table entries, which is only 256KB. If we held that in shared
> memory we could calculate the entries at startup, or read them in from
> a file. It would only use the same space as if we had 255 connections,
> so not a great increase in memory usage overall. Need to look at the
> effect of retrieving everything from memory rather than from
> cache, so it probably wouldn't be twice as fast.
>
> Best Regards, Simon Riggs


Hi everyone,

As per this thread, I decided to spend a little time over the weekend
looking at the performance of the existing CRC64 code in order to determine
whether we could code the algorithm in a more efficient manner. In order to
do this, I devised a couple of test programs using code cut/pasted from
pg_crc.h (and pg_resetxlog) in order perform some timings.

These tests were done on two systems: a P4 1.8GHz laptop with MingW (Win32)
and gcc 3.2.3, and a Xeon 2.8GHz server running FC1 and gcc 3.3.2.

The program used to time the CRC64 calculation simply did the following:
1) Create an 8k buffer of data
2) Fill the buffer with some repeatable values; in this case thealgorithm used was the following:
    for (i = 0 ; i < 8192 ; i++ )    {        *data = (char )(i % 256);    }
3) Calculate the CRC64 of the buffer 10,000 times and display the
result,making a note of the start and end times. Using this information an
estimate of the cost of a single CRC calculation can esaily be found.

I noticed looking at the code that there were two versions of the CRC code,
one using 32-bit arguments wrapped within a check for INT64_IS_BUSTED and
another using 64-bit unsigned integers. Since I wasn't sure initially which
one was being used, I decided to test both of them :) Both programs were
compiled using gcc's -O2 flag for optimisation.

The first thing to notice about the results was that I obtained different
CRC values using both algorithms on Win32, which were both different to the
results obtained under Linux:
Win32 MingW:1) INT64_IS_BUSTED code (32-bit):    0x541d4a27 (crc1) 0x8dfa57ae
(crc0)2) 64-bit code:                0x817f722b1f17b253 (crc0)
Linux (FC1):1) INT64_IS_BUSTED code (32-bit):    0x21d064a2 (crc1) 0xe7c74332
(crc0)2) 64-bit code:                0x21d064a2e7c74332 (crc0)

The second interesting thing to notice was the comparative speeds:
Win32 MingW:1) INT64_IS_BUSTED code (32-bit):    ~57us per calculation2) 64-bit code:                ~130us per
calculation
Linux (FC1):1) INT64_IS_BUSTED code (32-bit):    ~29us per calculation2) 64-bit code:                ~76us per
calculation


Looking at the flags in pg_config.h from both source builds, it would appear
that the 64-bit code is being used since the compiler defines
HAVE_LONG_LONG_INT_64. So that means there could be a potential 100%
performance increase by just using the existing INT64_IS_BUSTED code on x86
32 bit processors. I don't know given the rate of xlog records being written
whether or not this translates to more than a couple of minutes in an insert
of a million records or so.

I did have a look at the assembly code produced by the INT64_IS_BUSTED code
and it appears fairly tight; I'm no x86 expert but I think it would be hard
to optimise what was produced by the compiler. If anyone else is interested
in looking at these results then I will happily send the test programs
off-list - however my main concern is that the Win32 CRC64 results just
don't seem to be right :(


Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk




pgsql-hackers by date:

Previous
From: Jeffrey Baker
Date:
Subject: Re: bitmap scans, btree scans, and tid order
Next
From: Дмитрий Летучий
Date:
Subject: postgreSQL as deductive DBMS