Re: New CRC algorithm: Slicing by 8 - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: New CRC algorithm: Slicing by 8
Date
Msg-id 65937bea0610190131m51addbf5obc8b4a8c7b4cbdf@mail.gmail.com
Whole thread Raw
Responses Fwd: New CRC algorithm: Slicing by 8  ("Gurjeet Singh" <singh.gurjeet@gmail.com>)
Re: New CRC algorithm: Slicing by 8  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: New CRC algorithm: Slicing by 8  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Resending... The previous 3 attempt haven't succeeded, probably because of size restrictions; so sending in two parts.

On 10/18/06, Gurjeet Singh < singh.gurjeet@gmail.com> wrote:
Hi all,

    Please refer to the following link for a new algorithm to calculate CRC, developed by Intel.

    http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm

    Please find attached the patch (pg_crc_sb8_4.diff.gz), that implements this algorithm (originally obtained from http://sourceforge.net/projects/slicing-by-8 ).

    I tested it using pgbench, and found an average improvement of about 15 tps (291 vs. 307 tps) when running with a scaling-factor of 10, number of clients: 10, and 1000 transactions per client (I don't remember these pgbench params exactly; it's been quite a while now).

    The author of the algo says that it works three times faster (theoretically) than the conventional CRC calculation methods, but the tps improvements didn't reflect that; so, to prove the speed, I extracted the source files (both, PG and SB8) and made a saparate project ( test.tar.gz).

    The resulting binary's comand format is:

<snip>
  <prog_name> <algo_name> [ <scaling_factor> [ <block_size> [ <list_size> ] ] ]


  algo_name     : PG or SB8 or BOTH (just the first char will also do!)
  scaling_factor: the multiple to use when creating the list of buffer blocks.
  block_size    : buffer size (in KBs); default is 4.
  list_size     : number of data blocks to create for the test; default is 100.

  So the command : a.out p 3 1024 300
   will test the PG's algo over 900 blocks of 1 MB each.
</snip>

    The test.sh script is a wrapper around this binary, to test each of PG and SB8, and both algos simultaneously, three times each. I performed two tests using this script to prove SB8's effectiveness on large and small sized data blocks:

    sh test.sh 1 1024 800 -- 800 blocks of 1 MB each; effective data size 800 MB
    sh test.sh 800 1 1024 -- 800 * 1024 blocks of 1 KB each; effective data size 800 MB

    The new algo shined in both test configurations. It performs 3 to 4 times better than regular CRC algo in both types of tests. The results are in the files results_1_1024_800.out and results_800_1_1024.out, respectively.

    To test it yourself, extract all the files from test.tar.gz, and issue the following commands (assuming bash):

sh make.sh
sh test.sh 1 1024 800
sh test.sh 800 1 1024

(Note: you would like to reduce the effective data size on a Windows box. I used 500 MB on windows+MinGW. I tested with these params since I have only 1 GB of RAM, and increasing the data size beyond these numbers caused the memory contents to be spilled over into swap-space/pagefile, and this disk I/O causes a severe distortion in results.)

   The tests were performed on Fedora Core 5 and MinGW on WinXP Professional.

    If possible, people should test it on different platforms, so as to ensure that it doesn't perform any worse than older implementation on any supported platform (please post the results, if you do test it). Also, recommendations for a better (than pgbench) benhmark are welcome, so that we can show the improvement when it is used as a part of PG.

Best regards,

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com




--
gurjeet@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Attachment

pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: bug or feature, || -operator and NULLs
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: pg_internal.init is hazardous to your health