Thread: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
While testing my nbtree heap TID keyspace patch, I came across a case where amcheck reliably reports corruption. It appeared that a 4 byte varlena index entry that was expected in an index was not actually present. However, index scan queries with the "missing" value in their qual didn't actually give wrong answers. This was reproducible on the master branch, too. It turned out that the problem has existed since the heapallindexed enhancement made it into Postgres 11 was committed. The heapallindexed enhancement that made it into Postgres 11 assumes that the representation of index tuples produced by index_form_tuple() (or all relevant index_form_tuple() callers) is deterministic: for every possible heap tuple input there must be a single possible (bitwise) output. There is no real corruption present with the test case, though it's not entirely clear that this is best thought of as a bug in amcheck -- I'd prefer to make sure that amcheck's expectations are actually met here, rather than have amcheck normalize its input to eliminate the difference in bitwise representation. Steps to reproduce are rather delicate -- I stumbled upon the problem entirely by accident. I can share the full test case if that helps, but will hold off for now, since it involves a pg_dump that's a few megabytes in size. Here is an outline of what I'm doing: pg_restore -d postgres /home/pg/code/suffix_truncation_test/bib_refs_small.dump pg@postgres:5432 [9532]=# \d mgd.bib_refs Table "mgd.bib_refs" Column │ Type │ Collation │ Nullable │ Default ───────────────────┼─────────────────────────────┼───────────┼──────────┼───────── _refs_key │ integer │ │ not null │ _reviewstatus_key │ integer │ │ not null │ reftype │ character(4) │ │ not null │ authors │ text │ │ │ _primary │ character varying(60) │ │ │ title │ text │ │ │ journal │ character varying(100) │ │ │ vol │ character varying(20) │ │ │ issue │ character varying(25) │ │ │ date │ character varying(30) │ │ │ year │ integer │ │ │ pgs │ character varying(30) │ │ │ nlmstatus │ character(1) │ │ not null │ abstract │ text │ │ │ isreviewarticle │ smallint │ │ not null │ _createdby_key │ integer │ │ not null │ 1001 _modifiedby_key │ integer │ │ not null │ 1001 creation_date │ timestamp without time zone │ │ not null │ now() modification_date │ timestamp without time zone │ │ not null │ now() Indexes: "bib_refs_pkey" PRIMARY KEY, btree (_refs_key) "bib_refs_idx_authors" btree (authors) "bib_refs_idx_createdby_key" btree (_createdby_key) "bib_refs_idx_isprimary" btree (_primary) "bib_refs_idx_journal" btree (journal) "bib_refs_idx_modifiedby_key" btree (_modifiedby_key) "bib_refs_idx_reviewstatus_key" btree (_reviewstatus_key) "bib_refs_idx_title" btree (title) "bib_refs_idx_year" btree (year) psql -d postgres -c "create table bug (like mgd.bib_refs);" psql -d postgres -c "create index on bug (title);" psql -d postgres -c "insert into bug select * from mgd.bib_refs;" psql -d postgres -c "create extension if not exists amcheck;" psql -d postgres -c "analyze; set maintenance_work_mem='128MB';" psql -d postgres -c "select bt_index_parent_check('bug_title_idx', true);" ERROR: heap tuple (579,4) from table "bug" lacks matching index tuple within index "bug_title_idx" Here are details of the offending datum in the heap: pg@postgres:5432 [9532]=# select title, length(title), pg_column_size(title) from bug where ctid = '(579,4)'; ─[ RECORD 1 ]──┬──── title │ Final report on the safety assessment of trilaurin, triarachidin, tribehenin, tricaprin, tricaprylin, trierucin, triheptanoin, triheptylundecanoin, triisononanoin, triisopalmitin, triisostearin, trilinolein, trimyristin, trioctanoin, triolein, tripalmitin, tripalmitolein, triricinolein, tristearin, triundecanoin, glyceryl triacetyl hydroxystearate, glyceryl triacetyl ricinoleate, and gl. length │ 390 pg_column_size │ 234 Does anyone have any idea why the 4 byte varlena (text) datum in the single attribute index "bug_title_idx" is uncompressed, while the value in the heap is compressed? No other value in any other index happens to trip the problem, though this is complicated real-world database with many similar indexes over tens of gigabytes of data (I have quite a number of these "INSERT ... SELECT" tests for my nbtree patch). What you see here is a partially boiled-down test case. I've started some preliminary debugging work. A "REINDEX index bug_title_idx" makes amcheck happy, since the index tuple that points to heap tuple '(579,4)' ends up being compressed in exactly the same way as it is in the heap. The initial "INSERT ... SELECT" clearly makes the executor produce compressed values for heap_insert(), though not for btinsert() in this one instance. I've been able to confirm this from gdb. -- Peter Geoghegan
Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > The heapallindexed enhancement that made it into Postgres 11 assumes > that the representation of index tuples produced by index_form_tuple() > (or all relevant index_form_tuple() callers) is deterministic: for > every possible heap tuple input there must be a single possible > (bitwise) output. That assumption seems unbelievably fragile. How badly do things break when it's violated? Also, is the assumption just that a fixed source tuple will generate identical index entries across repeated index_form_tuple attempts? Or is it assuming that logically equal index entries will be bitwise equal? The latter is broken on its face, because index_form_tuple() doesn't try to hide differences in the toasting state of source datums. regards, tom lane
Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
On Mon, Jan 14, 2019 at 1:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > The heapallindexed enhancement that made it into Postgres 11 assumes > > that the representation of index tuples produced by index_form_tuple() > > (or all relevant index_form_tuple() callers) is deterministic: for > > every possible heap tuple input there must be a single possible > > (bitwise) output. > > That assumption seems unbelievably fragile. How badly do things > break when it's violated? Well, they break. You get a false positive report of corruption, since there isn't a bitwise identical version of the datum from the heap in the index for that same tuple. This seems to be very unlikely in practice, but amcheck is concerned with unlikely outcomes. > Also, is the assumption just that a fixed source tuple will generate > identical index entries across repeated index_form_tuple attempts? I would have said that the assumption is that a fixed source tuple will generate identical index entries. The problem with that is that my idea of what constitutes a fixed input now seems to have been faulty. I didn't think that the executor could mutate TOAST state in a way that made this kind of inconsistency possible. > Or is it assuming that logically equal index entries will be bitwise > equal? The latter is broken on its face, because index_form_tuple() > doesn't try to hide differences in the toasting state of source > datums. Logical equality as I understand the term doesn't enter into it at all -- B-Tree operator class semantics are not involved here. I'm not sure if that's what you meant, but I want to be clear on that. amcheck certainly knows that it cannot assume that scankey logical equality is the same thing as bitwise equality. -- Peter Geoghegan
Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
On Mon, Jan 14, 2019 at 1:46 PM Peter Geoghegan <pg@bowt.ie> wrote: > I would have said that the assumption is that a fixed source tuple > will generate identical index entries. The problem with that is that > my idea of what constitutes a fixed input now seems to have been > faulty. I didn't think that the executor could mutate TOAST state in a > way that made this kind of inconsistency possible. The source tuple (by which I mean the mgd.bib_refs heap tuple) is a HEAP_HASEXTERNAL tuple. If I update it to make a particularly long text field NULL (UPDATE mgd.bib_refs SET abstract = NULL), and then "INSERT INTO bug SELECT * FROM mgd.bib_refs", amcheck stops complaining about the index on "bug.title" is missing. Even though the "abstract" field has nothing to do with the index. The source of the inconsistency here must be within heap_prepare_insert() -- the external datum handling: /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ if (relation->rd_rel->relkind != RELKIND_RELATION && relation->rd_rel->relkind != RELKIND_MATVIEW) { /* toast table entries should never be recursively toasted */ Assert(!HeapTupleHasExternal(tup)); return tup; } else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD) return toast_insert_or_update(relation, tup, NULL, options); else return tup; Even leaving that aside, I really should have spotted that TOAST_TUPLE_THRESHOLD is a different thing to TOAST_INDEX_TARGET. The two things are always controlled independently. Mea culpa. The fix here must be to normalize index tuples that are compressed within amcheck, both during initial fingerprinting, and during subsequent probes of the Bloom filter in bt_tuple_present_callback(). -- Peter Geoghegan
Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
On Mon, Jan 14, 2019 at 2:37 PM Peter Geoghegan <pg@bowt.ie> wrote: > The fix here must be to normalize index tuples that are compressed > within amcheck, both during initial fingerprinting, and during > subsequent probes of the Bloom filter in bt_tuple_present_callback(). I happened to talk to Andres about this in person yesterday. He thought that there was reason to be concerned about the need for logical normalization beyond TOAST issues. Expression indexes were a particular concern, because they could in principle have a change in the on-disk representation without a change of logical values -- false positives could result. He suggested that the long term solution was to bring hash operator class hash functions into Bloom filter hashing, at least where available. I wasn't very enthused about this idea, because it will be expensive and complicated for an uncertain benefit. There are hardly any btree operator classes that can ever have bitwise distinct datums that are equal, anyway (leaving aside issues with TOAST). For the cases that do exist (e.g. numeric_ops display scale), we may not really want to normalize the differences away. Having an index tuple with a numeric_ops datum containing the wrong display scale but with everything else correct still counts as corruption. It now occurs to me that if we wanted to go further than simply normalizing away TOAST differences, my pending nbtree patch could enable a simpler and more flexible way of doing that than bringing hash opclasses into it, at least on the master branch. We could simply do an index look-up for the exact tuple of interest in the event of a Bloom filter probe indicating its apparent absence (corruption) -- even heap TID can participate in the search. In addition, that would cover the whole universe of logical differences, known and unknown (e.g. it wouldn't matter if somebody initialized alignment padding to something non-zero, since that doesn't cause wrong answers to queries). We might even want to offer an option where the Bloom filter is bypassed (we go straight to probing the indexes) some proportion of the time, or when a big misestimation when sizing the Bloom filter is detected (i.e. almost all bits in the Bloom filter bitset are set at the time we start probing the filter). -- Peter Geoghegan
Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote: > > The fix here must be to normalize index tuples that are compressed > > within amcheck, both during initial fingerprinting, and during > > subsequent probes of the Bloom filter in bt_tuple_present_callback(). > > I happened to talk to Andres about this in person yesterday. He > thought that there was reason to be concerned about the need for > logical normalization beyond TOAST issues. Expression indexes were a > particular concern, because they could in principle have a change in > the on-disk representation without a change of logical values -- false > positives could result. He suggested that the long term solution was > to bring hash operator class hash functions into Bloom filter hashing, > at least where available. I think that the best way forward is to normalize to compensate for inconsistent input datum TOAST state, and leave it at that. ISTM that logical normalization beyond that (based on hashing, or anything else) creates more problems than it solves. I am concerned about cases like INCLUDE indexes (which may have datums that lack even a B-Tree opclass), and about the logical-though-semantically-relevant facets of some datatypes such as numeric's display scale. If I can get an example from Andres of a case where further logical normalization is necessary to avoid false positives with expression indexes, that may change things. (BTW, I implemented another amcheck enhancement that searches indexes from the root to find matches -- the code is a trivial addition to the new patch series I'm working on, and seems like a better way to do enhanced logical normalization if that proves to be truly necessary.) Attached draft patch fixes the bug by doing fairly simple normalization. I think that TOAST compression of datums in indexes is fairly rare in practice, so I'm not very worried about the fact that this won't perform as well as it could with indexes that have a lot of compressed datums. I think that the interface I've added might need to be expanded for other things in the future (e.g., to make amcheck work with nbtree-native duplicate compression), and not worrying about the performance too much helps with that goal. I'll pick this up next week, and likely commit a fix by Wednesday or Thursday if there are no objections. I'm not sure if the test case is worth including. -- Peter Geoghegan
Attachment
Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
From
Peter Geoghegan
Date:
On Fri, Feb 1, 2019 at 6:27 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached draft patch fixes the bug by doing fairly simple > normalization. I think that TOAST compression of datums in indexes is > fairly rare in practice, so I'm not very worried about the fact that > this won't perform as well as it could with indexes that have a lot of > compressed datums. I think that the interface I've added might need to > be expanded for other things in the future (e.g., to make amcheck work > with nbtree-native duplicate compression), and not worrying about the > performance too much helps with that goal. > > I'll pick this up next week, and likely commit a fix by Wednesday or > Thursday if there are no objections. I'm not sure if the test case is > worth including. On second thought, the test should look like this: $ psql --no-psqlrc --echo-queries -f bug_repro.sql drop table if exists bug; DROP TABLE create table bug (buggy text); CREATE TABLE alter table bug alter column buggy set storage plain; ALTER TABLE create index toasty on bug(buggy); CREATE INDEX alter table bug alter column buggy set storage extended; ALTER TABLE insert into bug select repeat ('a', 2100); INSERT 0 1 select bt_index_parent_check('toasty', true); psql:bug_repro.sql:7: ERROR: heap tuple (0,1) from table "bug" lacks matching index tuple within index "toasty" This relies on the fact that the pg_attribute entry for the index is more or less a straight copy of the corresponding pg_attribute entry for the table at the time of the CREATE INDEX. I arrange to make storage of the index attribute plain and storage for the heap attribute extended. TOAST is applied inconsistently between toast_insert_or_update() and index_form_tuple() without really relying on implementation details that are subject to change. -- Peter Geoghegan