Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_amcheck contrib application
Date
Msg-id CA+Tgmob88VSOSUxn2M=ajT0TrVF8KrLQtGBpUAEP2=geNSHJQw@mail.gmail.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update the toast to have only 6 chunks numbered
[0..5]except we corruptly keep chunks numbered [12..17] in the toast table.  We'd rather see a report like this: 
[ toast value NNN chunk NNN has sequence number NNN, but expected
sequence number NNN ]
> than one like this:
[ toast value NNN contains chunk NNN where chunk NNN was expected ]
>
> because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory.  You
needthe distinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not
bethe same. 

Hmm, I see your point, and that's a good example to illustrate it.
But, with that example in front of me, I am rather doubtful that
either of these is what users actually want. Consider the case where I
should have chunks 0..17 and chunk 1 is just plain gone. This, by the
way, seems like a pretty likely case to arise in practice, since all
we need is for a block to get truncated away or zeroed erroneously, or
for a tuple to get pruned that shouldn't. With either of the above
schemes, I guess we're going to get a message about every chunk from 2
to 17, complaining that they're all misnumbered. We might also get a
complaint that the last chunk is the wrong size, and that the total
number of chunks isn't right. What we really want is a single
complaint saying chunk 1 is missing.

Likewise, in your example, I sort of feel like what I really want,
rather than either of the above outputs, is to get some messages like
this:

toast value NNN contains unexpected extra chunk [12-17]

Both your phrasing for those messages and what I suggested make it
sound like the problem is that the chunk number is wrong. But that
doesn't seem like it's taking the right view of the situation. Chunks
12-17 shouldn't exist at all, and if they do, we should say that, e.g.
by complaining about something like "toast value 16444 chunk 12
follows last expected chunk 5"

In other words, I don't buy the idea that the user will accept the
idea that there's a chunk number and a chunk sequence number, and that
they should know the difference between those things and what each of
them are. They're entitled to imagine that there's just one thing, and
that we're going to tell them about value that are extra or missing.
The fact that we're not doing that seems like it's just a matter of
missing code. If we start the index scan and get chunk 4, we can
easily emit messages for chunks 0..3 right on the spot, declaring them
missing. Things do get a bit hairy if the index scan returns values
out of order: what if it gives us chunk_seq = 2 and then chunk_seq =
1? But I think we could handle that by just issuing a complaint in any
such case that "toast index returns chunks out of order for toast
value NNN" and stopping further checking of that toast value.

> Purely as manual testing, and not part of the patch, I hacked the backend a bit to allow direct modification of the
toasttable.  After corrupting the toast with the following bit of SQL: 
>
>         WITH chunk_limit AS (
>                 SELECT chunk_id, MAX(chunk_seq) AS maxseq
>                         FROM $toastname
>                         GROUP BY chunk_id)
>                 INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
>                         (SELECT t.chunk_id,
>                                         t.chunk_seq + cl.maxseq + CASE WHEN t.chunk_seq < 3 THEN 1 ELSE 7 END,
>                                         t.chunk_data
>                                 FROM $toastname t
>                                 INNER JOIN chunk_limit cl
>                                 ON t.chunk_id = cl.chunk_id)
>
> pg_amcheck reports the following corruption messages:
>
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 6 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 7 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 8 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> #     toast value 16444 was expected to end at chunk 6, but ended at chunk 12
>
> I think if we'd left out the first three messages, it would read strangely.  We would be complaining about three
chunkswith the wrong sequence number, then conclude that there were six extra chunks.  A sufficiently savvy user might
deducethe presence of chunks 6, 7, and 8, but the problem is more obvious (to my eyes, at least) if we keep the first
threemessages.  This seems like a judgement call and not a clear argument either way, so if you still want me to change
it,I guess I don't mind doing so. 

I mean, looking at it, the question here is why it's not just using
the same message for all of them. The fact that the chunk numbers are
higher than 5 is the problem. The sequence numbers seem like just a
distraction.

> > On a related note, as I think I said before, I still think we should
> > be rejiggering this so that we're not testing both the size of each
> > individual chunk and the total size, because that ought to be
> > redundant. That might be better done as a separate patch but I think
> > we should try to clean it up.
>
> Can you point me to the exact check you are mentioning, and with which patch applied?  I don't see any examples of
thisafter applying the v18-0003. 

Hmm, my mistake, I think.

> > I'm a little worried about whether the additional test cases are
> > Endian-dependent at all. I don't immediately know what might be wrong
> > with them, but I'm going to think about that some more later. Any
> > chance you have access to a Big-endian box where you can test this?
>
> I don't have a Big-endian box, but I think one of them may be wrong now that you mention the issue:
>
> # Corrupt column c's toast pointer va_extinfo field
>
> The problem is that the 30-bit extsize and 2-bit cmid split is not being handled in the perl test, and I don't see an
easyway to have perl's pack/unpack do that for us.  There isn't any requirement that each possible corruption we check
actuallybe manifested in the regression tests.  The simplest solution is to remove this problematic test, so that's
whatI did.  The other two new tests corrupt c_va_toastrelid and c_va_rawsize, both of which are read/written using
unpack/pack,so perl should handle the endianness for us (I hope). 

I don't immediately see why this particular thing should be an issue.
The format of the varlena header itself is different on big-endian and
little-endian machines, which is why postgres.h has all this stuff
conditioned on WORDS_BIGENDIAN. But va_extinfo doesn't have any
similar treatment, so I'm not sure what could go wrong there, as long
as the 4-byte value as a whole is being packed and unpacked according
to the machine's endian-ness.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Dubious coding in nbtinsert.c
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Custom compression methods