Thread: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

Hi,

I started to analyze XLogInsert because it was the major bottleneck when 
creating some materialized view/cached tables/whatever.
Analyzing it I could see that content of the COMP_CRC32 macro was taking most 
of the time which isn't immediately obvious when you profile because it 
obviously doesn't show up as a separate function.
I first put it into functions to make it easier to profile. I couldn't measure 
any difference for COPY, CTAS and a simple pgbench run on 3 kinds of hardware 
(Core2, older Xeon, older Sparc systems).

I looked a bit around for faster implementations of CRC32 and found one in 
zlib. After adapting it (pg uses slightly different computation (non-
inverted)) I found that it increases the speed of the CRC32 calculation itself 
3 fold.
It does that by not only using one lookup table but four (one for each byte of 
a word). Those four calculations are independent and thus are considerably 
faster on somewhat recent hardware.
Also it does memory lookups in 4 byte steps instead of 1 byte as the pg 
version (thats only about ~8% benefit in itself).

I wrote a preliminary patch which includes both, the original implementation 
and the new one switchable via an #define.


I tested performance differences in a small number of scenarios:
- CTAS/INSERT ... SELECT (8-30%)
- COPY (3-20%)
- pgbench (no real difference unless directly after a checkpoint)

Setup:

CREATE TABLE blub (ai int, bi int, aibi int);
CREATE TABLE speedtest (ai int, bi int, aibi int);


INSERT ... SELECT:

Statement:
INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000) 
a(i), generate_series(1, 1000) b(i);

legacy crc:

11526.588
11406.518
11412.182
11430.245

zlib:
9977.394
9945.408
9840.907
9842.875


COPY:
Statement:
('blub' enlarged here 4 times, as otherwise the variances were to large)

COPY blub TO '/tmp/b' BINARY;
...
CHECKPOINT;TRUNCATE speedtest; COPY speedtest FROM '/tmp/b' BINARY;

legacy:
44835.840
44832.876

zlib:
39530.549
39365.109
39295.167

The performance differences are bigger if the table rows are significantly 
bigger. 

Do you think something like that is sensible? If yes, I will make it into a 
proper patch and such.

Thanks,

Andres

INSERT ... SELECT profile before patch:
   20.22%         postgres  postgres   [.] comp_crc32    5.77%         postgres  postgres   [.] XLogInsert    5.55%
   postgres  postgres   [.] LWLockAcquire    5.21%         postgres  [kernel.   [k] copy_user_generic_string    4.64%
     postgres  postgres   [.] LWLockRelease    4.39%         postgres  postgres   [.] ReadBuffer_common    2.75%
postgres  postgres   [.] heap_insert    2.22%         postgres  libc-2.1   [.] memcpy    2.09%         postgres
postgres  [.] UnlockReleaseBuffer    1.85%         postgres  postgres   [.] hash_any    1.77%         postgres
[kernel.  [k] clear_page_c    1.69%         postgres  postgres   [.] hash_search_with_hash_value    1.61%
postgres postgres   [.] heapgettup_pagemode    1.50%         postgres  postgres   [.] PageAddItem    1.42%
postgres postgres   [.] MarkBufferDirty    1.28%         postgres  postgres   [.] RelationGetBufferForTuple    1.15%
    postgres  postgres   [.] ExecModifyTable    1.06%         postgres  postgres   [.] RelationPutHeapTuple
 


After:
    9.97%         postgres  postgres   [.] comp_crc32    5.95%         postgres  [kernel.   [k]
copy_user_generic_string   5.94%         postgres  postgres   [.] LWLockAcquire    5.64%         postgres  postgres
[.]XLogInsert    5.11%         postgres  postgres   [.] LWLockRelease    4.63%         postgres  postgres   [.]
ReadBuffer_common   3.45%         postgres  postgres   [.] heap_insert    2.54%         postgres  libc-2.1   [.] memcpy
  2.03%         postgres  postgres   [.] UnlockReleaseBuffer    1.94%         postgres  postgres   [.]
hash_search_with_hash_value   1.84%         postgres  postgres   [.] hash_any    1.73%         postgres  [kernel.   [k]
clear_page_c   1.68%         postgres  postgres   [.] PageAddItem    1.62%         postgres  postgres   [.]
heapgettup_pagemode   1.52%         postgres  postgres   [.] RelationGetBufferForTuple    1.47%         postgres
postgres  [.] MarkBufferDirty    1.30%         postgres  postgres   [.] ExecModifyTable    1.23%         postgres
postgres  [.] RelationPutHeapTuple 

Andres,

* Andres Freund (andres@anarazel.de) wrote:
> Statement:
> INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000)
> a(i), generate_series(1, 1000) b(i);
>
> legacy crc:
>
> zlib:

Is this legacy crc using the function-based calls, or the macro?  Do you
have statistics for the zlib approach vs unmodified PG?

> Do you think something like that is sensible? If yes, I will make it into a
> proper patch and such.

I think that in general we're typically looking for ways to improve
performance, yes.. :)
Thanks,
    Stephen

Hi Stephen,

On Thursday 20 May 2010 22:39:26 Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > Statement:
> > INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000)
> > a(i), generate_series(1, 1000) b(i);
> > 
> > legacy crc:
> Is this legacy crc using the function-based calls, or the macro?  Do you
> have statistics for the zlib approach vs unmodified PG?
'legacy' is out of line as well. I couldn't find a real performance difference 
above noise between out of line (function) and inline (macro). If anything out 
of line was a bit faster (instruction cache usage could cause that).

So vanilla<->zlib should be the same as legacy<->zlib

Greetings, 
Andres


On Thu, May 20, 2010 at 4:27 PM, Andres Freund <andres@anarazel.de> wrote:
> I looked a bit around for faster implementations of CRC32 and found one in
> zlib. After adapting it (pg uses slightly different computation (non-
> inverted)) I found that it increases the speed of the CRC32 calculation itself
> 3 fold.

But zlib is not under the PostgreSQL license.

...Robert


On Friday 21 May 2010 05:40:03 Robert Haas wrote:
> On Thu, May 20, 2010 at 4:27 PM, Andres Freund <andres@anarazel.de> wrote:
> > I looked a bit around for faster implementations of CRC32 and found one
> > in zlib. After adapting it (pg uses slightly different computation (non-
> > inverted)) I found that it increases the speed of the CRC32 calculation
> > itself 3 fold.
> 
> But zlib is not under the PostgreSQL license.
Yes. But:
1. the zlib license shouldn't be a problem in itself - pg_dump also already 
links to zlib
2. I planned to ask Mark Adler whether he would support relicising those bits. 
I have read some other discussions where he was supportive of doing such a 
thing
3. Given that idea was posted publically on the usenet it is not hard to 
produce an independent implementation.

So I do not see any big problems there... Or am I missing something?

Greetings,

Andres

/* zlib.h -- interface of the 'zlib' general purpose compression library version 1.2.2, October 3rd, 2004
 Copyright (C) 1995-2004 Jean-loup Gailly and Mark Adler
 This software is provided 'as-is', without any express or implied warranty.  In no event will the authors be held
liablefor any damages arising from the use of this software.
 
 Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter
itand redistribute it freely, subject to the following restrictions:
 
 1. The origin of this software must not be misrepresented; you must not    claim that you wrote the original software.
Ifyou use this software    in a product, an acknowledgment in the product documentation would be    appreciated but is
notrequired. 2. Altered source versions must be plainly marked as such, and must not be    misrepresented as being the
originalsoftware. 3. This notice may not be removed or altered from any source distribution.
 
 Jean-loup Gailly jloup@gzip.org Mark Adler madler@alumni.caltech.edu

*/



Added to TODO:
Consider a faster CRC32 algorithm    * http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php 

---------------------------------------------------------------------------

Andres Freund wrote:
> Hi,
> 
> I started to analyze XLogInsert because it was the major bottleneck when 
> creating some materialized view/cached tables/whatever.
> Analyzing it I could see that content of the COMP_CRC32 macro was taking most 
> of the time which isn't immediately obvious when you profile because it 
> obviously doesn't show up as a separate function.
> I first put it into functions to make it easier to profile. I couldn't measure 
> any difference for COPY, CTAS and a simple pgbench run on 3 kinds of hardware 
> (Core2, older Xeon, older Sparc systems).
> 
> I looked a bit around for faster implementations of CRC32 and found one in 
> zlib. After adapting it (pg uses slightly different computation (non-
> inverted)) I found that it increases the speed of the CRC32 calculation itself 
> 3 fold.
> It does that by not only using one lookup table but four (one for each byte of 
> a word). Those four calculations are independent and thus are considerably 
> faster on somewhat recent hardware.
> Also it does memory lookups in 4 byte steps instead of 1 byte as the pg 
> version (thats only about ~8% benefit in itself).
> 
> I wrote a preliminary patch which includes both, the original implementation 
> and the new one switchable via an #define.
> 
> 
> I tested performance differences in a small number of scenarios:
> - CTAS/INSERT ... SELECT (8-30%)
> - COPY (3-20%)
> - pgbench (no real difference unless directly after a checkpoint)
> 
> Setup:
> 
> CREATE TABLE blub (ai int, bi int, aibi int);
> CREATE TABLE speedtest (ai int, bi int, aibi int);
> 
> 
> INSERT ... SELECT:
> 
> Statement:
> INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000) 
> a(i), generate_series(1, 1000) b(i);
> 
> legacy crc:
> 
> 11526.588
> 11406.518
> 11412.182
> 11430.245
> 
> zlib:
> 9977.394
> 9945.408
> 9840.907
> 9842.875
> 
> 
> COPY:
> Statement:
> ('blub' enlarged here 4 times, as otherwise the variances were to large)
> 
> COPY blub TO '/tmp/b' BINARY;
> ...
> CHECKPOINT;TRUNCATE speedtest; COPY speedtest FROM '/tmp/b' BINARY;
> 
> legacy:
> 44835.840
> 44832.876
> 
> zlib:
> 39530.549
> 39365.109
> 39295.167
> 
> The performance differences are bigger if the table rows are significantly 
> bigger. 
> 
> Do you think something like that is sensible? If yes, I will make it into a 
> proper patch and such.
> 
> Thanks,
> 
> Andres
> 
> INSERT ... SELECT profile before patch:
> 
>     20.22%         postgres  postgres   [.] comp_crc32
>      5.77%         postgres  postgres   [.] XLogInsert
>      5.55%         postgres  postgres   [.] LWLockAcquire
>      5.21%         postgres  [kernel.   [k] copy_user_generic_string
>      4.64%         postgres  postgres   [.] LWLockRelease
>      4.39%         postgres  postgres   [.] ReadBuffer_common
>      2.75%         postgres  postgres   [.] heap_insert
>      2.22%         postgres  libc-2.1   [.] memcpy
>      2.09%         postgres  postgres   [.] UnlockReleaseBuffer
>      1.85%         postgres  postgres   [.] hash_any
>      1.77%         postgres  [kernel.   [k] clear_page_c
>      1.69%         postgres  postgres   [.] hash_search_with_hash_value
>      1.61%         postgres  postgres   [.] heapgettup_pagemode
>      1.50%         postgres  postgres   [.] PageAddItem
>      1.42%         postgres  postgres   [.] MarkBufferDirty
>      1.28%         postgres  postgres   [.] RelationGetBufferForTuple
>      1.15%         postgres  postgres   [.] ExecModifyTable
>      1.06%         postgres  postgres   [.] RelationPutHeapTuple
> 
> 
> After:
> 
>      9.97%         postgres  postgres   [.] comp_crc32
>      5.95%         postgres  [kernel.   [k] copy_user_generic_string
>      5.94%         postgres  postgres   [.] LWLockAcquire
>      5.64%         postgres  postgres   [.] XLogInsert
>      5.11%         postgres  postgres   [.] LWLockRelease
>      4.63%         postgres  postgres   [.] ReadBuffer_common
>      3.45%         postgres  postgres   [.] heap_insert
>      2.54%         postgres  libc-2.1   [.] memcpy
>      2.03%         postgres  postgres   [.] UnlockReleaseBuffer
>      1.94%         postgres  postgres   [.] hash_search_with_hash_value
>      1.84%         postgres  postgres   [.] hash_any
>      1.73%         postgres  [kernel.   [k] clear_page_c
>      1.68%         postgres  postgres   [.] PageAddItem
>      1.62%         postgres  postgres   [.] heapgettup_pagemode
>      1.52%         postgres  postgres   [.] RelationGetBufferForTuple
>      1.47%         postgres  postgres   [.] MarkBufferDirty
>      1.30%         postgres  postgres   [.] ExecModifyTable
>      1.23%         postgres  postgres   [.] RelationPutHeapTuple

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


This sounds familiar. If you search back in the archives around 2004
or so I think you'll find a similar discussion when we replaced the
crc32 implementation with what we have now. We put a fair amount of
effort into searching for faster implementations so if you've found
one 3x faster I'm pretty startled. Are you sure it's faster on all
architectures and not a win sometimes and a loss other times? And are
you sure it's faster in our use case where we're crcing small
sequences of data often and not crcing a large block?


On Sun, May 30, 2010 at 3:56 AM, Greg Stark <gsstark@mit.edu> wrote:
> This sounds familiar. If you search back in the archives around 2004
> or so I think you'll find a similar discussion when we replaced the
> crc32 implementation with what we have now.

Fwiw here's the thread (from 2005):

http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/43811
-- 
greg


Greg Stark <gsstark@mit.edu> writes:
> On Sun, May 30, 2010 at 3:56 AM, Greg Stark <gsstark@mit.edu> wrote:
>> This sounds familiar. If you search back in the archives around 2004
>> or so I think you'll find a similar discussion when we replaced the
>> crc32 implementation with what we have now.

> Fwiw here's the thread (from 2005):
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/43811

I read through that thread and couldn't find much discussion of
alternative CRC implementations --- we spent all our time on arguing
about whether we needed 64-bit CRC or not.
        regards, tom lane


On Sunday 30 May 2010 04:56:09 Greg Stark wrote:
> This sounds familiar. If you search back in the archives around 2004
> or so I think you'll find a similar discussion when we replaced the
> crc32 implementation with what we have now. We put a fair amount of
> effort into searching for faster implementations so if you've found
> one 3x faster I'm pretty startled. 
All of those didnt think of computing more than one byte at the same time. 
Most if not all current architectures are more or less superscalar (explictly 
by the compiler or implicitly by somewhat intelligent silicon) - the current 
algorithm has an ordering restrictions that prevent any benefit from that.
Basically it needs the CRC of the last byte for the next one - the zlib/my 
version computes 4 bytes independently and then squashes them together which 
results in way much better overall usage.

> Are you sure it's faster on all
> architectures and not a win sometimes and a loss other times? And are
> you sure it's faster in our use case where we're crcing small
> sequences of data often and not crcing a large block?
I tried on several and it was never a loss at 16+ bytes, never worse at 8, and 
most of the time equal if not better at 4. Sizes of 1-4 are somewhat slower as  
they use the same algorithm as the old version but do have an additional jump. 
Thats a difference of about 3-4cycles.

I will try to implement an updated patch sometime these days.

Andres


On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I read through that thread and couldn't find much discussion of
> alternative CRC implementations --- we spent all our time on arguing
> about whether we needed 64-bit CRC or not.

Alright, how about this thread?

http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/71741

This actually sounds like precisely the same algorithm. Perhaps this
implementation is much better but your tests on the old one showed a
big difference between smaller and larger data sequences.


-- 
greg


On Sun, May 30, 2010 at 5:29 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I read through that thread and couldn't find much discussion of
>> alternative CRC implementations --- we spent all our time on arguing
>> about whether we needed 64-bit CRC or not.
>
> Alright, how about this thread?
>
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/71741

Huh, actually apparently this is right about on schedule for
reconsidering this topic:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/71903

:)

-- 
greg


On Sunday 30 May 2010 18:29:31 Greg Stark wrote:
> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I read through that thread and couldn't find much discussion of
> > alternative CRC implementations --- we spent all our time on arguing
> > about whether we needed 64-bit CRC or not.
> 
> Alright, how about this thread?
> 
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/71741
> 
> This actually sounds like precisely the same algorithm. Perhaps this
> implementation is much better but your tests on the old one showed a
> big difference between smaller and larger data sequences.
I haven't yet had a chance to read the intel paper (I am in the train and 
latency is 30s+ and the original link is dead), but I got the sf.net 
implementation.

Seeing it I think I might know the reason why it wasn't as much faster as 
promised - it introduces ordering constraints by avoiding shifts by using 
term2. Not sure though.

Anybody got the implementation by Gurjeet? I couldn't find it online (within 
the constraints of the connection).

Greetings,

Andres


On Sunday 30 May 2010 18:43:12 Greg Stark wrote:
> On Sun, May 30, 2010 at 5:29 PM, Greg Stark <gsstark@mit.edu> wrote:
> > On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I read through that thread and couldn't find much discussion of
> >> alternative CRC implementations --- we spent all our time on arguing
> >> about whether we needed 64-bit CRC or not.
> > 
> > Alright, how about this thread?
> > 
> > http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/71741
> 
> Huh, actually apparently this is right about on schedule for
> reconsidering this topic:
> 
> http://article.gmane.org/gmane.comp.db.postgresql.devel.general/71903
Oh, and the first zlib version sporting the 4 separate shifted tables approach 
was 1.2.0 (9 March 2003) ;-)

Andres


> On Sunday 30 May 2010 18:29:31 Greg Stark wrote:
>> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > I read through that thread and couldn't find much discussion of
>> > alternative CRC implementations --- we spent all our time on arguing
>> > about whether we needed 64-bit CRC or not.

SSE4.2 has a hardware CRC32 instruction, this might be interesting to  
use...


On Monday 07 June 2010 12:37:13 Pierre C wrote:
> > On Sunday 30 May 2010 18:29:31 Greg Stark wrote:
> >> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > I read through that thread and couldn't find much discussion of
> >> > alternative CRC implementations --- we spent all our time on arguing
> >> > about whether we needed 64-bit CRC or not.
> 
> SSE4.2 has a hardware CRC32 instruction, this might be interesting to
> use...
Different polynom unfortunately...

Andres


On Jun 7, 2010, at 12:45 , Andres Freund wrote:
> On Monday 07 June 2010 12:37:13 Pierre C wrote:
>>> On Sunday 30 May 2010 18:29:31 Greg Stark wrote:
>>>> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> I read through that thread and couldn't find much discussion of
>>>>> alternative CRC implementations --- we spent all our time on arguing
>>>>> about whether we needed 64-bit CRC or not.
>>
>> SSE4.2 has a hardware CRC32 instruction, this might be interesting to
>> use...
> Different polynom unfortunately...

Since only the WAL uses CRC, I guess the polynomial could be changed though. pg_upgrade for example shouldn't care.

RFC3385 compares different checksumming methods for use in iSCSI, and CRC32c (which uses the same polynomial as the
SSE4.2instruction) wins. Here's  
a link: http://www.faqs.org/rfcs/rfc3385.html

best regards,
Florian Pflug



Florian Pflug wrote:
> On Jun 7, 2010, at 12:45 , Andres Freund wrote:
>   
>> On Monday 07 June 2010 12:37:13 Pierre C wrote:
>>     
>>>> On Sunday 30 May 2010 18:29:31 Greg Stark wrote:
>>>>         
>>>>> On Sun, May 30, 2010 at 4:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>           
>>>>>> I read through that thread and couldn't find much discussion of
>>>>>> alternative CRC implementations --- we spent all our time on arguing
>>>>>> about whether we needed 64-bit CRC or not.
>>>>>>             
>>> SSE4.2 has a hardware CRC32 instruction, this might be interesting to
>>> use...
>>>       
>> Different polynom unfortunately...
>>     
>
> Since only the WAL uses CRC, I guess the polynomial could be changed though. pg_upgrade for example shouldn't care.
>
> RFC3385 compares different checksumming methods for use in iSCSI, and CRC32c (which uses the same polynomial as the
SSE4.2instruction) wins. Here's 
 
> a link: http://www.faqs.org/rfcs/rfc3385.html
>   
The linux kernel also uses it when it's availabe, see e.g. 
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/crypto/crc32c-intel.c

regards,
Yeb Havinga



> The linux kernel also uses it when it's availabe, see e.g.  
> http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/crypto/crc32c-intel.c

If you guys are interested I have a Core i7 here, could run a little  
benchmark.


Hi,

This thread died after me not implementing a new version and some potential 
license problems.

I still think its worthwile (and I used it in production for some time) so I 
would like to implement a version fit for the next commitfest.

The code where I started out from is under the zlib license - which is to my 
knowledge compatible with PGs licence. Whats the position of HACKERS there? 
There already is some separately licenced code around and were already linking 
to zlib licenced code...

For simplicitly I asked Mark Adler (the original Copyright Owner) if he would 
be willing to relicence - he is not.

For anybody not hording all old mail like me here is a link to the archives 
about my old patch:

http://archives.postgresql.org/message-
id/201005202227.49990.andres@anarazel.de


Andres


On Sat, Oct 30, 2010 at 5:05 AM, Andres Freund <andres@anarazel.de> wrote:
> This thread died after me not implementing a new version and some potential
> license problems.
>
> I still think its worthwile (and I used it in production for some time) so I
> would like to implement a version fit for the next commitfest.
>
> The code where I started out from is under the zlib license - which is to my
> knowledge compatible with PGs licence. Whats the position of HACKERS there?
> There already is some separately licenced code around and were already linking
> to zlib licenced code...
>
> For simplicitly I asked Mark Adler (the original Copyright Owner) if he would
> be willing to relicence - he is not.
>
> For anybody not hording all old mail like me here is a link to the archives
> about my old patch:
>
> http://archives.postgresql.org/message-id/201005202227.49990.andres@anarazel.de

IANAL, but the license doesn't appear incompatible to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


The Saturday 30 October 2010 11:05:17, Andres Freund wrote :
> Hi,
> 
> This thread died after me not implementing a new version and some potential
> license problems.
> 
> I still think its worthwile (and I used it in production for some time) so
> I would like to implement a version fit for the next commitfest.
> 
> The code where I started out from is under the zlib license - which is to
> my knowledge compatible with PGs licence. Whats the position of HACKERS
> there? There already is some separately licenced code around and were
> already linking to zlib licenced code...
> 
> For simplicitly I asked Mark Adler (the original Copyright Owner) if he
> would be willing to relicence - he is not.
> 
> For anybody not hording all old mail like me here is a link to the archives
> about my old patch:
> 
> http://archives.postgresql.org/message-
> id/201005202227.49990.andres@anarazel.de
> 
> 
> Andres

I forgot to report this a few months ago:

I had a very intensive COPY load, and this patch helped. The context was a 
server that was CPU bound on loading data (8 COPY on the same table in 
parallel, not indexed). This patch gave me a 10% boost in load time. I don't 
have the figures right now, but I could try to do this test again if this can 
help. At that time, I just tried it out of curiosity, but the load time was 
sufficient without it, so I didn't spend more time on it.