Re: pglz performance - Mailing list pgsql-hackers

From Tels
Subject Re: pglz performance
Date
Msg-id d56c85b989a3bd8c0a98d79553276b0e@bloodgate.com
Whole thread Raw
In response to Re: pglz performance  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: pglz performance  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
>> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre@2ndquadrant.com> 
>> написал(а):
> PFA two patches:
> v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
> test_pglz extension)
> v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
> as 'hacked8')

Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:

  else
   {
   * An unset control bit means LITERAL BYTE. So we just
   * copy one from INPUT to OUTPUT.
   */
   *dp++ = *sp++;
   }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:

  else
   {
   /*
   * An unset control bit means LITERAL BYTE. We count
   * these and copy them later.
   */
   literal_bytes ++;
   }

and in the case of:

   if (ctrl & 1)
     {
     /* First copy all the literal bytes */
     if (literal_bytes > 0)
       {
       memcpy( sp, dp, literal_bytes);
       sp += literal_bytes;
       dp += literal_bytes;
       literal_bytes = 0;
       }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.

Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.

I wonder if you agree and what would happen if you try this variant on 
your corpus tests.

Best regards,

Tels



pgsql-hackers by date:

Previous
From: Gilles Darold
Date:
Subject: Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdwmessage.
Next
From: Dent John
Date:
Subject: Re: The flinfo->fn_extra question, from me this time.