Re: pglz compression performance, take two - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: pglz compression performance, take two |
Date | |
Msg-id | e04e4097-4cac-445b-f7bf-53edf305eccc@enterprisedb.com Whole thread Raw |
In response to | Re: pglz compression performance, take two (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pglz compression performance, take two
|
List | pgsql-hackers |
On 2/7/23 21:18, Andres Freund wrote: > Hi, > > On 2023-02-05 10:36:39 -0800, Andrey Borodin wrote: >> On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin <amborodin86@gmail.com> wrote: >>> >>> Hello! Please find attached v8. >> >> I got some interesting feedback from some patch users. >> There was an oversight that frequently yielded results that are 1,2 or >> 3 bytes longer than expected. >> Looking closer I found that the correctness of the last 3-byte tail is >> checked in two places. PFA fix for this. Previously compressed data >> was correct, however in some cases few bytes longer than the result of >> current pglz implementation. > > This version fails on cfbot, due to address sanitizer: > > https://cirrus-ci.com/task/4921632586727424 > https://api.cirrus-ci.com/v1/artifact/task/4921632586727424/log/src/test/regress/log/initdb.log > > > performing post-bootstrap initialization ... ================================================================= > ==15991==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61e000002ee0 at pc 0x558e1b847b16 bp 0x7ffd35782f70sp 0x7ffd35782f68 > READ of size 1 at 0x61e000002ee0 thread T0 > #0 0x558e1b847b15 in pglz_hist_add /tmp/cirrus-ci-build/src/common/pg_lzcompress.c:310 > #1 0x558e1b847b15 in pglz_compress /tmp/cirrus-ci-build/src/common/pg_lzcompress.c:680 > #2 0x558e1aa86ef0 in pglz_compress_datum /tmp/cirrus-ci-build/src/backend/access/common/toast_compression.c:65 > #3 0x558e1aa87af2 in toast_compress_datum /tmp/cirrus-ci-build/src/backend/access/common/toast_internals.c:68 > #4 0x558e1ac22989 in toast_tuple_try_compression /tmp/cirrus-ci-build/src/backend/access/table/toast_helper.c:234 > #5 0x558e1ab6af24 in heap_toast_insert_or_update /tmp/cirrus-ci-build/src/backend/access/heap/heaptoast.c:197 > #6 0x558e1ab4a2a6 in heap_update /tmp/cirrus-ci-build/src/backend/access/heap/heapam.c:3533 > ... > Yeah, and valgrind seems to hit the same issue (it's not labeled as buffer overflow, but it seems to be exactly the same place): ==380682== Invalid read of size 1 ==380682== at 0xBCEAAB: pglz_hist_add (pg_lzcompress.c:310) ==380682== by 0xBCF130: pglz_compress (pg_lzcompress.c:670) ==380682== by 0x4A911F: pglz_compress_datum (toast_compression.c:65) ==380682== by 0x4A97E2: toast_compress_datum (toast_internals.c:68) ==380682== by 0x54CCA4: toast_tuple_try_compression (toast_helper.c:234) ==380682== by 0x4FFC33: heap_toast_insert_or_update (heaptoast.c:197) ==380682== by 0x4ED498: heap_update (heapam.c:3624) ==380682== by 0x4EE023: simple_heap_update (heapam.c:4060) ==380682== by 0x5B1B2B: CatalogTupleUpdateWithInfo (indexing.c:329) ==380682== by 0x65C3AB: update_attstats (analyze.c:1741) ==380682== by 0x65A054: do_analyze_rel (analyze.c:602) ==380682== by 0x659405: analyze_rel (analyze.c:261) ==380682== by 0x70A162: vacuum (vacuum.c:523) ==380682== by 0x8DF8F7: autovacuum_do_vac_analyze (autovacuum.c:3155) ==380682== by 0x8DE74A: do_autovacuum (autovacuum.c:2473) ==380682== by 0x8DD49E: AutoVacWorkerMain (autovacuum.c:1716) ==380682== by 0x8DD097: StartAutoVacWorker (autovacuum.c:1494) ==380682== by 0x8EA5B2: StartAutovacuumWorker (postmaster.c:5481) ==380682== by 0x8EA10A: process_pm_pmsignal (postmaster.c:5192) ==380682== by 0x8E6121: ServerLoop (postmaster.c:1770) ==380682== Address 0xe722c78 is 103,368 bytes inside a recently re-allocated block of size 131,072 alloc'd ==380682== at 0x48457AB: malloc (vg_replace_malloc.c:393) ==380682== by 0xB95423: AllocSetAlloc (aset.c:929) ==380682== by 0xBA2B6C: palloc (mcxt.c:1224) ==380682== by 0x4A0962: heap_copytuple (heaptuple.c:687) ==380682== by 0x73A2BB: tts_buffer_heap_copy_heap_tuple (execTuples.c:842) ==380682== by 0x658E42: ExecCopySlotHeapTuple (tuptable.h:464) ==380682== by 0x65B288: acquire_sample_rows (analyze.c:1261) ==380682== by 0x659E42: do_analyze_rel (analyze.c:536) ==380682== by 0x659405: analyze_rel (analyze.c:261) ==380682== by 0x70A162: vacuum (vacuum.c:523) ==380682== by 0x8DF8F7: autovacuum_do_vac_analyze (autovacuum.c:3155) ==380682== by 0x8DE74A: do_autovacuum (autovacuum.c:2473) ==380682== by 0x8DD49E: AutoVacWorkerMain (autovacuum.c:1716) ==380682== by 0x8DD097: StartAutoVacWorker (autovacuum.c:1494) ==380682== by 0x8EA5B2: StartAutovacuumWorker (postmaster.c:5481) ==380682== by 0x8EA10A: process_pm_pmsignal (postmaster.c:5192) ==380682== by 0x8E6121: ServerLoop (postmaster.c:1770) ==380682== by 0x8E5B54: PostmasterMain (postmaster.c:1463) ==380682== by 0x7A806C: main (main.c:200) ==380682== The place allocating the buffer changes over time, but the first part (invalid read) seems to be exactly the same. FWIW I did run previous versions using valgrind, so this gotta be due some recent change. > > Independent of this failure, I'm worried about the cost/benefit analysis of a > pglz change that changes this much at once. It's quite hard to review. > I agree. I think I managed to understand what the patch does during the review, but it's so much harder - it'd definitely be better to have this split into smaller parts, somehow. Interestingly enough the commit message actually says this: This patch accumulates several changes to pglz compression: 1. Convert macro-functions to regular functions for readability 2. Use more compact hash table with uint16 indexes instead of pointers 3. Avoid prev pointer in hash table 4. Use 4-byte comparisons during a search instead of 1-byte comparisons Which I think is a pretty good recipe how to split the patch. (And we also need a better commit message, or at least a proposal.) This'd probably also help when investigating the extra byte issue, discussed yesterday. (Assuming it's not related to the invalid access reported by valgrind / asan). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: