Re: Optimize partial TOAST decompression - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Optimize partial TOAST decompression |
Date | |
Msg-id | 20191001123420.uwzxtqr7qvleyxef@development Whole thread Raw |
In response to | Re: Optimize partial TOAST decompression (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Optimize partial TOAST decompression
|
List | pgsql-hackers |
On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote: >On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote: >> >> >>>30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>> >>>On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: >>>> >>>> >>>>>30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>>>> >>>>>I mean this: >>>>> >>>>> /* >>>>> * Use int64 to prevent overflow during calculation. >>>>> */ >>>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >>>>> >>>>>I'm not very familiar with pglz internals, but I'm a bit puzzled by >>>>>this. My first instinct was to compare it to this: >>>>> >>>>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >>>>> >>>>>but clearly that's a very different (much simpler) formula. So why >>>>>shouldn't pglz_maximum_compressed_size simply use this macro? >>> >>>> >>>>compressed_size accounts for possible increase of size during >>>>compression. pglz can consume up to 1 control byte for each 8 bytes of >>>>data in worst case. >>> >>>OK, but does that actually translate in to the formula? We essentially >>>need to count 8-byte chunks in raw data, and multiply that by 9. Which >>>gives us something like >>> >>> nchunks = ((rawsize + 7) / 8) * 9; >>> >>>which is not quite what the patch does. >> >>I'm afraid neither formula is correct, but all this is hair-splitting differences. >> > >Sure. I just want to be sure the formula is safe and we won't end up >using too low value in some corner case. > >>Your formula does not account for the fact that we may not need all bytes from last chunk. >>Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total >>But nchunks = 9. >> > >OK, so essentially this means my formula works with whole chunks, i.e. >if we happen to need just a part of a decompressed chunk, we still >request enough data to decompress it whole. This way we may request up >to 7 extra bytes, which seems fine. > >>Binguo's formula is appending 1 control bit per data byte and one extra >>control byte. Consider size = 8 bytes. We need 1 control byte, 8 >>literals, 9 total. But compressed_size = 10. >> >>Mathematically correct formula is compressed_size = (int32) ((int64) >>rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7 >>control bits for overflow. >> >>But this equations make no big difference, each formula is safe. I'd >>pick one which is easier to understand and document (IMO, its nchunks = >>((rawsize + 7) / 8) * 9). >> > >I'd use the *mathematically correct* formula, it doesn't seem to be any >more complex, and the "one bit per byte, complete bytes" explanation >seems quite understandable. > Pushed. I've ended up using the *mathematically correct* formula, hopefully with sufficient explanation why it's correct. I've also polished a couple more comments, and pushed like that. Thanks to Binguo Bao for this improvement, and all the reviewers in this thread. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: