Thread: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives

While testing my nbtree heap TID keyspace patch, I came across a case
where amcheck reliably reports corruption. It appeared that a 4 byte
varlena index entry that was expected in an index was not actually
present. However, index scan queries with the "missing" value in their
qual didn't actually give wrong answers. This was reproducible on the
master branch, too. It turned out that the problem has existed since
the heapallindexed enhancement made it into Postgres 11 was committed.

The heapallindexed enhancement that made it into Postgres 11 assumes
that the representation of index tuples produced by index_form_tuple()
(or all relevant index_form_tuple() callers) is deterministic: for
every possible heap tuple input there must be a single possible
(bitwise) output. There is no real corruption present with the test
case, though it's not entirely clear that this is best thought of as a
bug in amcheck -- I'd prefer to make sure that amcheck's expectations
are actually met here, rather than have amcheck normalize its input to
eliminate the difference in bitwise representation.

Steps to reproduce are rather delicate -- I stumbled upon the problem
entirely by accident. I can share the full test case if that helps,
but will hold off for now, since it involves a pg_dump that's a few
megabytes in size. Here is an outline of what I'm doing:

pg_restore -d postgres /home/pg/code/suffix_truncation_test/bib_refs_small.dump

pg@postgres:5432 [9532]=# \d mgd.bib_refs
                               Table "mgd.bib_refs"
      Column       │            Type             │ Collation │
Nullable │ Default
───────────────────┼─────────────────────────────┼───────────┼──────────┼─────────
 _refs_key         │ integer                     │           │ not null │
 _reviewstatus_key │ integer                     │           │ not null │
 reftype           │ character(4)                │           │ not null │
 authors           │ text                        │           │          │
 _primary          │ character varying(60)       │           │          │
 title             │ text                        │           │          │
 journal           │ character varying(100)      │           │          │
 vol               │ character varying(20)       │           │          │
 issue             │ character varying(25)       │           │          │
 date              │ character varying(30)       │           │          │
 year              │ integer                     │           │          │
 pgs               │ character varying(30)       │           │          │
 nlmstatus         │ character(1)                │           │ not null │
 abstract          │ text                        │           │          │
 isreviewarticle   │ smallint                    │           │ not null │
 _createdby_key    │ integer                     │           │ not null │ 1001
 _modifiedby_key   │ integer                     │           │ not null │ 1001
 creation_date     │ timestamp without time zone │           │ not null │ now()
 modification_date │ timestamp without time zone │           │ not null │ now()
Indexes:
    "bib_refs_pkey" PRIMARY KEY, btree (_refs_key)
    "bib_refs_idx_authors" btree (authors)
    "bib_refs_idx_createdby_key" btree (_createdby_key)
    "bib_refs_idx_isprimary" btree (_primary)
    "bib_refs_idx_journal" btree (journal)
    "bib_refs_idx_modifiedby_key" btree (_modifiedby_key)
    "bib_refs_idx_reviewstatus_key" btree (_reviewstatus_key)
    "bib_refs_idx_title" btree (title)
    "bib_refs_idx_year" btree (year)

psql -d postgres -c "create table bug (like mgd.bib_refs);"
psql -d postgres -c "create index on bug (title);"
psql -d postgres -c "insert into bug select * from mgd.bib_refs;"
psql -d postgres -c "create extension if not exists amcheck;"
psql -d postgres -c "analyze; set maintenance_work_mem='128MB';"
psql -d postgres -c "select bt_index_parent_check('bug_title_idx', true);"
ERROR:  heap tuple (579,4) from table "bug" lacks matching index tuple
within index "bug_title_idx"

Here are details of the offending datum in the heap:

pg@postgres:5432 [9532]=# select title, length(title),
pg_column_size(title)  from bug where ctid = '(579,4)';
─[ RECORD 1 ]──┬────
title          │ Final report on the safety assessment of trilaurin,
triarachidin, tribehenin, tricaprin, tricaprylin, trierucin,
triheptanoin, triheptylundecanoin, triisononanoin, triisopalmitin,
triisostearin, trilinolein, trimyristin, trioctanoin, triolein,
tripalmitin, tripalmitolein, triricinolein, tristearin, triundecanoin,
glyceryl triacetyl hydroxystearate, glyceryl triacetyl ricinoleate,
and gl.
length         │ 390
pg_column_size │ 234

Does anyone have any idea why the 4 byte varlena (text) datum in the
single attribute index "bug_title_idx" is uncompressed, while the
value in the heap is compressed? No other value in any other index
happens to trip the problem, though this is complicated real-world
database with many similar indexes over tens of gigabytes of data (I
have quite a number of these "INSERT ... SELECT" tests for my nbtree
patch). What you see here is a partially boiled-down test case.

I've started some preliminary debugging work. A "REINDEX index
bug_title_idx" makes amcheck happy, since the index tuple that points
to heap tuple '(579,4)' ends up being compressed in exactly the same
way as it is in the heap. The initial "INSERT ... SELECT" clearly
makes the executor produce compressed values for heap_insert(), though
not for btinsert() in this one instance. I've been able to confirm
this from gdb.

--
Peter Geoghegan


Peter Geoghegan <pg@bowt.ie> writes:
> The heapallindexed enhancement that made it into Postgres 11 assumes
> that the representation of index tuples produced by index_form_tuple()
> (or all relevant index_form_tuple() callers) is deterministic: for
> every possible heap tuple input there must be a single possible
> (bitwise) output.

That assumption seems unbelievably fragile.  How badly do things
break when it's violated?

Also, is the assumption just that a fixed source tuple will generate
identical index entries across repeated index_form_tuple attempts?
Or is it assuming that logically equal index entries will be bitwise
equal?  The latter is broken on its face, because index_form_tuple()
doesn't try to hide differences in the toasting state of source
datums.

            regards, tom lane


On Mon, Jan 14, 2019 at 1:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > The heapallindexed enhancement that made it into Postgres 11 assumes
> > that the representation of index tuples produced by index_form_tuple()
> > (or all relevant index_form_tuple() callers) is deterministic: for
> > every possible heap tuple input there must be a single possible
> > (bitwise) output.
>
> That assumption seems unbelievably fragile.  How badly do things
> break when it's violated?

Well, they break. You get a false positive report of corruption, since
there isn't a bitwise identical version of the datum from the heap in
the index for that same tuple. This seems to be very unlikely in
practice, but amcheck is concerned with unlikely outcomes.

> Also, is the assumption just that a fixed source tuple will generate
> identical index entries across repeated index_form_tuple attempts?

I would have said that the assumption is that a fixed source tuple
will generate identical index entries. The problem with that is that
my idea of what constitutes a fixed input now seems to have been
faulty. I didn't think that the executor could mutate TOAST state in a
way that made this kind of inconsistency possible.

> Or is it assuming that logically equal index entries will be bitwise
> equal?  The latter is broken on its face, because index_form_tuple()
> doesn't try to hide differences in the toasting state of source
> datums.

Logical equality as I understand the term doesn't enter into it at all
-- B-Tree operator class semantics are not involved here. I'm not sure
if that's what you meant, but I want to be clear on that. amcheck
certainly knows that it cannot assume that scankey logical equality is
the same thing as bitwise equality.

-- 
Peter Geoghegan


On Mon, Jan 14, 2019 at 1:46 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I would have said that the assumption is that a fixed source tuple
> will generate identical index entries. The problem with that is that
> my idea of what constitutes a fixed input now seems to have been
> faulty. I didn't think that the executor could mutate TOAST state in a
> way that made this kind of inconsistency possible.

The source tuple (by which I mean the mgd.bib_refs heap tuple) is a
HEAP_HASEXTERNAL tuple. If I update it to make a particularly long
text field NULL (UPDATE mgd.bib_refs SET abstract = NULL), and then
"INSERT INTO bug SELECT * FROM mgd.bib_refs", amcheck stops
complaining about the index on "bug.title" is missing. Even though the
"abstract" field has nothing to do with the index.

The source of the inconsistency here must be within
heap_prepare_insert() -- the external datum handling:

    /*
     * If the new tuple is too big for storage or contains already toasted
     * out-of-line attributes from some other relation, invoke the toaster.
     */
    if (relation->rd_rel->relkind != RELKIND_RELATION &&
        relation->rd_rel->relkind != RELKIND_MATVIEW)
    {
        /* toast table entries should never be recursively toasted */
        Assert(!HeapTupleHasExternal(tup));
        return tup;
    }
    else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
        return toast_insert_or_update(relation, tup, NULL, options);
    else
        return tup;

Even leaving that aside, I really should have spotted that
TOAST_TUPLE_THRESHOLD is a different thing to TOAST_INDEX_TARGET. The
two things are always controlled independently. Mea culpa.

The fix here must be to normalize index tuples that are compressed
within amcheck, both during initial fingerprinting, and during
subsequent probes of the Bloom filter in bt_tuple_present_callback().

-- 
Peter Geoghegan


On Mon, Jan 14, 2019 at 2:37 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The fix here must be to normalize index tuples that are compressed
> within amcheck, both during initial fingerprinting, and during
> subsequent probes of the Bloom filter in bt_tuple_present_callback().

I happened to talk to Andres about this in person yesterday. He
thought that there was reason to be concerned about the need for
logical normalization beyond TOAST issues. Expression indexes were a
particular concern, because they could in principle have a change in
the on-disk representation without a change of logical values -- false
positives could result. He suggested that the long term solution was
to bring hash operator class hash functions into Bloom filter hashing,
at least where available.

I wasn't very enthused about this idea, because it will be expensive
and complicated for an uncertain benefit. There are hardly any btree
operator classes that can ever have bitwise distinct datums that are
equal, anyway (leaving aside issues with TOAST). For the cases that do
exist (e.g. numeric_ops display scale), we may not really want to
normalize the differences away. Having an index tuple with a
numeric_ops datum containing the wrong display scale but with
everything else correct still counts as corruption.

It now occurs to me that if we wanted to go further than simply
normalizing away TOAST differences, my pending nbtree patch could
enable a simpler and more flexible way of doing that than bringing
hash opclasses into it, at least on the master branch. We could simply
do an index look-up for the exact tuple of interest in the event of a
Bloom filter probe indicating its apparent absence (corruption) --
even heap TID can participate in the search. In addition, that would
cover the whole universe of logical differences, known and unknown
(e.g. it wouldn't matter if somebody initialized alignment padding to
something non-zero, since that doesn't cause wrong answers to
queries). We might even want to offer an option where the Bloom filter
is bypassed (we go straight to probing the indexes) some proportion of
the time, or when a big misestimation when sizing the Bloom filter is
detected (i.e. almost all bits in the Bloom filter bitset are set at
the time we start probing the filter).

-- 
Peter Geoghegan


On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > The fix here must be to normalize index tuples that are compressed
> > within amcheck, both during initial fingerprinting, and during
> > subsequent probes of the Bloom filter in bt_tuple_present_callback().
>
> I happened to talk to Andres about this in person yesterday. He
> thought that there was reason to be concerned about the need for
> logical normalization beyond TOAST issues. Expression indexes were a
> particular concern, because they could in principle have a change in
> the on-disk representation without a change of logical values -- false
> positives could result. He suggested that the long term solution was
> to bring hash operator class hash functions into Bloom filter hashing,
> at least where available.

I think that the best way forward is to normalize to compensate for
inconsistent input datum TOAST state, and leave it at that. ISTM that
logical normalization beyond that (based on hashing, or anything else)
creates more problems than it solves. I am concerned about cases like
INCLUDE indexes (which may have datums that lack even a B-Tree
opclass), and about the logical-though-semantically-relevant facets of
some datatypes such as numeric's display scale. If I can get an
example from Andres of a case where further logical normalization is
necessary to avoid false positives with expression indexes, that may
change things. (BTW, I implemented another amcheck enhancement that
searches indexes from the root to find matches -- the code is a
trivial addition to the new patch series I'm working on, and seems
like a better way to do enhanced logical normalization if that proves
to be truly necessary.)

Attached draft patch fixes the bug by doing fairly simple
normalization. I think that TOAST compression of datums in indexes is
fairly rare in practice, so I'm not very worried about the fact that
this won't perform as well as it could with indexes that have a lot of
compressed datums. I think that the interface I've added might need to
be expanded for other things in the future (e.g., to make amcheck work
with nbtree-native duplicate compression), and not worrying about the
performance too much helps with that goal.

I'll pick this up next week, and likely commit a fix by Wednesday or
Thursday if there are no objections. I'm not sure if the test case is
worth including.

-- 
Peter Geoghegan

Attachment
On Fri, Feb 1, 2019 at 6:27 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached draft patch fixes the bug by doing fairly simple
> normalization. I think that TOAST compression of datums in indexes is
> fairly rare in practice, so I'm not very worried about the fact that
> this won't perform as well as it could with indexes that have a lot of
> compressed datums. I think that the interface I've added might need to
> be expanded for other things in the future (e.g., to make amcheck work
> with nbtree-native duplicate compression), and not worrying about the
> performance too much helps with that goal.
>
> I'll pick this up next week, and likely commit a fix by Wednesday or
> Thursday if there are no objections. I'm not sure if the test case is
> worth including.

On second thought, the test should look like this:

$ psql --no-psqlrc --echo-queries -f bug_repro.sql
drop table if exists bug;
DROP TABLE
create table bug (buggy text);
CREATE TABLE
alter table bug alter column buggy set storage plain;
ALTER TABLE
create index toasty on bug(buggy);
CREATE INDEX
alter table bug alter column buggy set storage extended;
ALTER TABLE
insert into bug select repeat ('a', 2100);
INSERT 0 1
select bt_index_parent_check('toasty', true);
psql:bug_repro.sql:7: ERROR:  heap tuple (0,1) from table "bug" lacks
matching index tuple within index "toasty"

This relies on the fact that the pg_attribute entry for the index is
more or less a straight copy of the corresponding pg_attribute entry
for the table at the time of the CREATE INDEX. I arrange to make
storage of the index attribute plain and storage for the heap
attribute extended. TOAST is applied inconsistently between
toast_insert_or_update() and index_form_tuple() without really relying
on implementation details that are subject to change.

-- 
Peter Geoghegan