Re: Optimize partial TOAST decompression - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Optimize partial TOAST decompression |
Date | |
Msg-id | 20190709211219.hbo6tyfxxcy2ek3j@development Whole thread Raw |
In response to | Re: Optimize partial TOAST decompression (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Optimize partial TOAST decompression
|
List | pgsql-hackers |
On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote: >On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote: >>Hi, Tomas! >>Thanks for your testing and the suggestion. >> >>That's quite bizarre behavior - it does work with a prefix, but not with >>>suffix. And the exact ERROR changes after the prefix query. >> >> >>I think bug is caused by "#2 0x00000000004c3b08 in >>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, >>slicelength=-1) at tuptoaster.c:315", >>since I ignore the case where slicelength is negative, and I've appended >>some comments for heap_tuple_untoast_attr_slice for the case. >> >>FWIW the new code added to this >>>function does not adhere to our code style, and would deserve some >>>additional explanation of what it's doing/why. Same for the >>>heap_tuple_untoast_attr_slice, BTW. >> >> >>I've added more comments to explain the code's behavior. >>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to >>"TOAST_COMPRESS_DATA" since >>it is used to get toast compressed data rather than raw data. >> > >Thanks, this seems to address the issue - I can no longer reproduce the >crashes, allowing the benchmark to complete. I'm attaching the script I >used and spreadsheet with a summary of results. > >For the cases I've tested (100k - 10M values, different compressibility, >different prefix/length values), the results are kinda mixed - many >cases got much faster (~2x), but other cases got slower too. We're >however talking about queries taking a couple of miliseconds, so in >absolute numbers the differences are small. > >That does not mean the optimization is useless - but the example shared >at the beginning of this thread is quite extreme, as the values are >extremely compressible. Each value is ~38MB (in plaintext), but a table >with 100 such values has only ~40MB. Tha's 100:1 compression ratio, >which I think is not typical for real-world data sets. > >The data I've used are less extreme, depending on the fraction of random >data in values. > >I went through the code too. I've reworder a couple of comments and code >style issues, but there are a couple of more serious issues. > > >1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA? > >This seems unnecessary, and it discards the clear hint that it's about >accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE. >IMHO we should keep the original naming. > > >2) pglz_maximum_compressed_size signatures are confusing > >There are two places with pglz_maximum_compressed_size signature, and >those places are kinda out of sync when it comes to parameter names: > > int32 > pglz_maximum_compressed_size(int32 raw_slice_size, > int32 total_compressed_size) > > extern > int32 pglz_maximum_compressed_size(int32 raw_slice_size, > int32 raw_size); > >Also, pg_lzcompress.c has no concept of a "slice" because it only deals >with simple compression, slicing is responsibility of the tuptoaster. So >we should not mix those two, not even in comments. > > >I propose tweaks per the attached patch - I think it makes the code >clearer, and it's mostly cosmetic stuff. But I haven't tested the >changes beyond "it compiles". > > >regards > FWIW I've done another round of tests with larger values (up to ~10MB) and the larger the values the better the speedup (a bit as expected). Attached is the script I used and a spreasheet with a summary. We still need a patch addressing the review comments, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: