Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
Date
Msg-id 201005300233.o4U2XGp14143@momjian.us
Whole thread Raw
In response to [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up  (Andres Freund <andres@anarazel.de>)
Responses Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
List pgsql-hackers
Added to TODO:
Consider a faster CRC32 algorithm    * http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php 

---------------------------------------------------------------------------

Andres Freund wrote:
> Hi,
> 
> I started to analyze XLogInsert because it was the major bottleneck when 
> creating some materialized view/cached tables/whatever.
> Analyzing it I could see that content of the COMP_CRC32 macro was taking most 
> of the time which isn't immediately obvious when you profile because it 
> obviously doesn't show up as a separate function.
> I first put it into functions to make it easier to profile. I couldn't measure 
> any difference for COPY, CTAS and a simple pgbench run on 3 kinds of hardware 
> (Core2, older Xeon, older Sparc systems).
> 
> I looked a bit around for faster implementations of CRC32 and found one in 
> zlib. After adapting it (pg uses slightly different computation (non-
> inverted)) I found that it increases the speed of the CRC32 calculation itself 
> 3 fold.
> It does that by not only using one lookup table but four (one for each byte of 
> a word). Those four calculations are independent and thus are considerably 
> faster on somewhat recent hardware.
> Also it does memory lookups in 4 byte steps instead of 1 byte as the pg 
> version (thats only about ~8% benefit in itself).
> 
> I wrote a preliminary patch which includes both, the original implementation 
> and the new one switchable via an #define.
> 
> 
> I tested performance differences in a small number of scenarios:
> - CTAS/INSERT ... SELECT (8-30%)
> - COPY (3-20%)
> - pgbench (no real difference unless directly after a checkpoint)
> 
> Setup:
> 
> CREATE TABLE blub (ai int, bi int, aibi int);
> CREATE TABLE speedtest (ai int, bi int, aibi int);
> 
> 
> INSERT ... SELECT:
> 
> Statement:
> INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000) 
> a(i), generate_series(1, 1000) b(i);
> 
> legacy crc:
> 
> 11526.588
> 11406.518
> 11412.182
> 11430.245
> 
> zlib:
> 9977.394
> 9945.408
> 9840.907
> 9842.875
> 
> 
> COPY:
> Statement:
> ('blub' enlarged here 4 times, as otherwise the variances were to large)
> 
> COPY blub TO '/tmp/b' BINARY;
> ...
> CHECKPOINT;TRUNCATE speedtest; COPY speedtest FROM '/tmp/b' BINARY;
> 
> legacy:
> 44835.840
> 44832.876
> 
> zlib:
> 39530.549
> 39365.109
> 39295.167
> 
> The performance differences are bigger if the table rows are significantly 
> bigger. 
> 
> Do you think something like that is sensible? If yes, I will make it into a 
> proper patch and such.
> 
> Thanks,
> 
> Andres
> 
> INSERT ... SELECT profile before patch:
> 
>     20.22%         postgres  postgres   [.] comp_crc32
>      5.77%         postgres  postgres   [.] XLogInsert
>      5.55%         postgres  postgres   [.] LWLockAcquire
>      5.21%         postgres  [kernel.   [k] copy_user_generic_string
>      4.64%         postgres  postgres   [.] LWLockRelease
>      4.39%         postgres  postgres   [.] ReadBuffer_common
>      2.75%         postgres  postgres   [.] heap_insert
>      2.22%         postgres  libc-2.1   [.] memcpy
>      2.09%         postgres  postgres   [.] UnlockReleaseBuffer
>      1.85%         postgres  postgres   [.] hash_any
>      1.77%         postgres  [kernel.   [k] clear_page_c
>      1.69%         postgres  postgres   [.] hash_search_with_hash_value
>      1.61%         postgres  postgres   [.] heapgettup_pagemode
>      1.50%         postgres  postgres   [.] PageAddItem
>      1.42%         postgres  postgres   [.] MarkBufferDirty
>      1.28%         postgres  postgres   [.] RelationGetBufferForTuple
>      1.15%         postgres  postgres   [.] ExecModifyTable
>      1.06%         postgres  postgres   [.] RelationPutHeapTuple
> 
> 
> After:
> 
>      9.97%         postgres  postgres   [.] comp_crc32
>      5.95%         postgres  [kernel.   [k] copy_user_generic_string
>      5.94%         postgres  postgres   [.] LWLockAcquire
>      5.64%         postgres  postgres   [.] XLogInsert
>      5.11%         postgres  postgres   [.] LWLockRelease
>      4.63%         postgres  postgres   [.] ReadBuffer_common
>      3.45%         postgres  postgres   [.] heap_insert
>      2.54%         postgres  libc-2.1   [.] memcpy
>      2.03%         postgres  postgres   [.] UnlockReleaseBuffer
>      1.94%         postgres  postgres   [.] hash_search_with_hash_value
>      1.84%         postgres  postgres   [.] hash_any
>      1.73%         postgres  [kernel.   [k] clear_page_c
>      1.68%         postgres  postgres   [.] PageAddItem
>      1.62%         postgres  postgres   [.] heapgettup_pagemode
>      1.52%         postgres  postgres   [.] RelationGetBufferForTuple
>      1.47%         postgres  postgres   [.] MarkBufferDirty
>      1.30%         postgres  postgres   [.] ExecModifyTable
>      1.23%         postgres  postgres   [.] RelationPutHeapTuple

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Specification for Trusted PLs?
Next
From: Robert Haas
Date:
Subject: Re: small exclusion constraints patch