Re: Optimize partial TOAST decompression - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Optimize partial TOAST decompression
Date
Msg-id 20191001123420.uwzxtqr7qvleyxef@development
Whole thread Raw
In response to Re: Optimize partial TOAST decompression  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Optimize partial TOAST decompression
List pgsql-hackers
On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote:
>On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:
>>
>>
>>>30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>>>
>>>On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:
>>>>
>>>>
>>>>>30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>>>>>
>>>>>I mean this:
>>>>>
>>>>> /*
>>>>>  * Use int64 to prevent overflow during calculation.
>>>>>  */
>>>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
>>>>>
>>>>>I'm not very familiar with pglz internals, but I'm a bit puzzled by
>>>>>this. My first instinct was to compare it to this:
>>>>>
>>>>> #define PGLZ_MAX_OUTPUT(_dlen)    ((_dlen) + 4)
>>>>>
>>>>>but clearly that's a very different (much simpler) formula. So why
>>>>>shouldn't pglz_maximum_compressed_size simply use this macro?
>>>
>>>>
>>>>compressed_size accounts for possible increase of size during
>>>>compression. pglz can consume up to 1 control byte for each 8 bytes of
>>>>data in worst case.
>>>
>>>OK, but does that actually translate in to the formula? We essentially
>>>need to count 8-byte chunks in raw data, and multiply that by 9. Which
>>>gives us something like
>>>
>>> nchunks = ((rawsize + 7) / 8) * 9;
>>>
>>>which is not quite what the patch does.
>>
>>I'm afraid neither formula is correct, but all this is hair-splitting differences.
>>
>
>Sure. I just want to be sure the formula is safe and we won't end up
>using too low value in some corner case.
>
>>Your formula does not account for the fact that we may not need all bytes from last chunk.
>>Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total
>>But nchunks = 9.
>>
>
>OK, so essentially this means my formula works with whole chunks, i.e.
>if we happen to need just a part of a decompressed chunk, we still
>request enough data to decompress it whole. This way we may request up
>to 7 extra bytes, which seems fine.
>
>>Binguo's formula is appending 1 control bit per data byte and one extra
>>control byte.  Consider size = 8 bytes. We need 1 control byte, 8
>>literals, 9 total.  But compressed_size = 10.
>>
>>Mathematically correct formula is compressed_size = (int32) ((int64)
>>rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
>>control bits for overflow.
>>
>>But this equations make no big difference, each formula is safe. I'd
>>pick one which is easier to understand and document (IMO, its nchunks =
>>((rawsize + 7) / 8) * 9).
>>
>
>I'd use the *mathematically correct* formula, it doesn't seem to be any
>more complex, and the "one bit per byte, complete bytes" explanation
>seems quite understandable.
>

Pushed.

I've ended up using the *mathematically correct* formula, hopefully
with sufficient explanation why it's correct. I've also polished a
couple more comments, and pushed like that.

Thanks to Binguo Bao for this improvement, and all the reviewers in this
thread.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Drop Trigger Mechanism with Detached partitions
Next
From: Andrew Dunstan
Date:
Subject: Re: Proposal: Make use of C99 designated initialisers fornulls/values arrays