[RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up |
Date | |
Msg-id | 201005202227.49990.andres@anarazel.de Whole thread Raw |
Responses |
Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT
... SELECT + speeding it up
Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up |
List | pgsql-hackers |
Hi, I started to analyze XLogInsert because it was the major bottleneck when creating some materialized view/cached tables/whatever. Analyzing it I could see that content of the COMP_CRC32 macro was taking most of the time which isn't immediately obvious when you profile because it obviously doesn't show up as a separate function. I first put it into functions to make it easier to profile. I couldn't measure any difference for COPY, CTAS and a simple pgbench run on 3 kinds of hardware (Core2, older Xeon, older Sparc systems). I looked a bit around for faster implementations of CRC32 and found one in zlib. After adapting it (pg uses slightly different computation (non- inverted)) I found that it increases the speed of the CRC32 calculation itself 3 fold. It does that by not only using one lookup table but four (one for each byte of a word). Those four calculations are independent and thus are considerably faster on somewhat recent hardware. Also it does memory lookups in 4 byte steps instead of 1 byte as the pg version (thats only about ~8% benefit in itself). I wrote a preliminary patch which includes both, the original implementation and the new one switchable via an #define. I tested performance differences in a small number of scenarios: - CTAS/INSERT ... SELECT (8-30%) - COPY (3-20%) - pgbench (no real difference unless directly after a checkpoint) Setup: CREATE TABLE blub (ai int, bi int, aibi int); CREATE TABLE speedtest (ai int, bi int, aibi int); INSERT ... SELECT: Statement: INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000) a(i), generate_series(1, 1000) b(i); legacy crc: 11526.588 11406.518 11412.182 11430.245 zlib: 9977.394 9945.408 9840.907 9842.875 COPY: Statement: ('blub' enlarged here 4 times, as otherwise the variances were to large) COPY blub TO '/tmp/b' BINARY; ... CHECKPOINT;TRUNCATE speedtest; COPY speedtest FROM '/tmp/b' BINARY; legacy: 44835.840 44832.876 zlib: 39530.549 39365.109 39295.167 The performance differences are bigger if the table rows are significantly bigger. Do you think something like that is sensible? If yes, I will make it into a proper patch and such. Thanks, Andres INSERT ... SELECT profile before patch: 20.22% postgres postgres [.] comp_crc32 5.77% postgres postgres [.] XLogInsert 5.55% postgres postgres [.] LWLockAcquire 5.21% postgres [kernel. [k] copy_user_generic_string 4.64% postgres postgres [.] LWLockRelease 4.39% postgres postgres [.] ReadBuffer_common 2.75% postgres postgres [.] heap_insert 2.22% postgres libc-2.1 [.] memcpy 2.09% postgres postgres [.] UnlockReleaseBuffer 1.85% postgres postgres [.] hash_any 1.77% postgres [kernel. [k] clear_page_c 1.69% postgres postgres [.] hash_search_with_hash_value 1.61% postgres postgres [.] heapgettup_pagemode 1.50% postgres postgres [.] PageAddItem 1.42% postgres postgres [.] MarkBufferDirty 1.28% postgres postgres [.] RelationGetBufferForTuple 1.15% postgres postgres [.] ExecModifyTable 1.06% postgres postgres [.] RelationPutHeapTuple After: 9.97% postgres postgres [.] comp_crc32 5.95% postgres [kernel. [k] copy_user_generic_string 5.94% postgres postgres [.] LWLockAcquire 5.64% postgres postgres [.]XLogInsert 5.11% postgres postgres [.] LWLockRelease 4.63% postgres postgres [.] ReadBuffer_common 3.45% postgres postgres [.] heap_insert 2.54% postgres libc-2.1 [.] memcpy 2.03% postgres postgres [.] UnlockReleaseBuffer 1.94% postgres postgres [.] hash_search_with_hash_value 1.84% postgres postgres [.] hash_any 1.73% postgres [kernel. [k] clear_page_c 1.68% postgres postgres [.] PageAddItem 1.62% postgres postgres [.] heapgettup_pagemode 1.52% postgres postgres [.] RelationGetBufferForTuple 1.47% postgres postgres [.] MarkBufferDirty 1.30% postgres postgres [.] ExecModifyTable 1.23% postgres postgres [.] RelationPutHeapTuple
pgsql-hackers by date: