Home > mailing lists

Re: pglz performance - Mailing list pgsql-hackers

From	Tels
Subject	Re: pglz performance
Date	November 3, 2019 12:24:43
Msg-id	d56c85b989a3bd8c0a98d79553276b0e@bloodgate.com Whole thread Raw
In response to	Re: pglz performance (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses	Re: pglz performance
List	pgsql-hackers

Tree view

Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
>> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre@2ndquadrant.com> 
>> написал(а):
> PFA two patches:
> v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
> test_pglz extension)
> v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
> as 'hacked8')

Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:

  else
   {
   * An unset control bit means LITERAL BYTE. So we just
   * copy one from INPUT to OUTPUT.
   */
   *dp++ = *sp++;
   }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:

  else
   {
   /*
   * An unset control bit means LITERAL BYTE. We count
   * these and copy them later.
   */
   literal_bytes ++;
   }

and in the case of:

   if (ctrl & 1)
     {
     /* First copy all the literal bytes */
     if (literal_bytes > 0)
       {
       memcpy( sp, dp, literal_bytes);
       sp += literal_bytes;
       dp += literal_bytes;
       literal_bytes = 0;
       }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.

Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.

I wonder if you agree and what would happen if you try this variant on 
your corpus tests.

Best regards,

Tels

pgsql-hackers by date:

From: Gilles Darold
Date: 03 November 2019, 11:12:38
Subject: Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdwmessage.

From: Dent John
Date: 03 November 2019, 14:51:14
Subject: Re: The flinfo->fn_extra question, from me this time.

Re: pglz performance - Mailing list pgsql-hackers

Previous

Next