Thread: Bugs in TOAST handling, OID assignment and redo recovery

Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:
Hello,

One of our 2ndQuadrant support customers recently reported a sudden rush of TOAST errors post a crash recovery, nearly causing an outage. Most errors read like this:

ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn

While we could bring back the cluster to normal quickly using some workarounds, I investigated this in more detail and identified two long standing bugs in TOAST as well as redo recovery.  

Immediately after the crash recovery, server started issuing OIDs that conflicted with the OIDs issued just prior to the crash. The XLOG_NEXTOID and buffer LSN interlock is supposed to guard against that, but a bug in the redo recovery breaks that protection. We shall see that in a minute.

When duplicate OIDs are issued, either because of a bug in redo recovery or because of OID counter wraparound, GetNewOidWithIndex() is supposed to protect us against that by first scanning the toast table to check if a duplicate chunk_id or toast value already exists. If so, a new OID is requested and we repeat until a non-conflicting OID is found. 

This mostly works, except in one corner case. GetNewOidWithIndex() uses SnapshotDirty to look for conflicting OID. But this fails if the existing toast tuple was recently deleted by another committed transaction. If the OldestXmin has not advanced enough, such tuples are reported as RECENTLY_DEAD  and fail the SnapshotDirty qualifications. So toast_save_datum() happily goes ahead and inserts a conflicting toast tuple. The UNIQUE index on <chunk_id, chunk_seq> does not complain because it seems the other tuple as RECENTLY_DEAD too. At this point, we have two toast tuples with the same <chunk_id, chunk_seq> - one of these is RECENTLY_DEAD and the other is LIVE. Later when toast_fetch_datum() and friends scan the toast table for the chunk_id (toast value), it uses SnapshotToast for retrieving the tuple. This special snapshot doesn't do any visibility checks on the toast tuple, assuming that since the main heap tuple is visible, the toast tuple(s) should be visible too. This snapshot unfortunately also sees the  RECENTLY_DEAD toast tuple and thus returns duplicate toast tuples, leading to the errors such as above. This same issue can also lead to other kinds of errors in tuptoaster.c.

ISTM that the right fix is to teach toast_save_datum() to check for existence of duplicate chunk_id by scanning the table with the same SnapshotToast that it later uses to fetch the tuples. We already do that in case of toast rewrite, but not for regular inserts. I propose to do that for regular path too, as done in the attached patch.

Duplicate OIDs may be generated when the counter wraps around and it may cause this problem.

- OID counter wraps around
- Old toast tuple is deleted and the deleting transaction commits
- OldestXmin is held back by an open transaction
- The same OID if then inserted again into the toast table
- The toasted value is fetched before the table is vacuumed or the index pointer is killed. This leads to duplicates being returned.

This is probably quite a corner case, but not impossible given that we already have reports for such problems. But the problem gets accentuated in case of a crash recovery or on a Hot standby. This is because of another somewhat unrelated bug which manifested in our case.

An interesting case to consider is the following sequence of events:

1. ONLINE checkpoint starts. We note down the nextOid counter, after rightly accounting for the cached values too. CreateCheckPoint() does this:

    LWLockAcquire(OidGenLock, LW_SHARED);
    checkPoint.nextOid = ShmemVariableCache->nextOid;
    if (!shutdown)
        checkPoint.nextOid += ShmemVariableCache->oidCount;
    LWLockRelease(OidGenLock);
 
2. The cached OIDs are then consumed, and a new set of OIDs are allocated, generating XLOG_NEXTOID WAL record.

3. More OIDs are consumed from the newly allocated range. Those OIDs make to the disk after following buffer LSN interlock. So XLOG_NEXTOID gets flushed to disk as well.

4. Checkpoint finishes, it writes a ONLINE checkpoint record with the nextOid value saved in step 1

5. CRASH

The redo recovery starts at some LSN written before step 1. It initialises the ShmemVariableCache->nextOid with the value stored in the checkpoint record and starts replaying WAL records. When it sees the XLOG_NEXTOID WAL record generated at step 2, it correctly advances the OID counter to cover the next range. 

But while applying ONLINE checkpoint written at step 4, we do this:

        /* ... but still treat OID counter as exact */
        LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
        ShmemVariableCache->nextOid = checkPoint.nextOid;
        ShmemVariableCache->oidCount = 0;
        LWLockRelease(OidGenLock);

So effectively we overwrite the nextOid counter and drag it backwards again. If we don't see another XLOG_NEXTOID record before the crash recovery ends, then we shall start the server with a stale value, generating duplicate OIDs immediately after the restart.

I don't think this is an unusual case and I'm surprised that we didn't notice this before. Most likely the other protections in the system to guard against wrapped around OIDs masked the bug. The TOAST table though escapes those protections in the specific case we discussed above and brought out the bug in open.

I think the right fix is to simply ignore the nextOid counter while replaying ONLINE checkpoint record. We must have already initialised ShmemVariableCache->nextOid to the value stored in this (or some previous) checkpoint record when redo recovery is started. As and when we see XLOG_NEXTOID record, we would maintain the shared memory counter correctly. If we don't see any XLOG_NEXTOID record, the value we started with must be the correct value. I see no problem even when OID wraps around during redo recovery. XLOG_NEXTOID should record that correctly.

Curiously neither REINDEX nor VACUUM would fix it until the OldestXmin counter advances enough so that RECENTLY_DEAD tuples are ignored by the REINDEX or removed by VACUUM. But after OldestXmin progresses, a REINDEX or a VACUUM should fix these transient errors. Also once manifested, the errors would stay until a REINDEX or VACUUM is performed.

We tested this at least upto 9.1 and the bugs exist there too. In fact, these bugs probably existed forever, though I did not check very old releases.

Attached is a simple reproducer and a proposed fix to address both the bugs. We should consider backpatching it all supported releases.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Bernd Helmle
Date:
Am Dienstag, den 10.04.2018, 15:59 +0530 schrieb Pavan Deolasee:
> One of our 2ndQuadrant support customers recently reported a sudden
> rush of
> TOAST errors post a crash recovery, nearly causing an outage. Most
> errors
> read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn
> 
> While we could bring back the cluster to normal quickly using some
> workarounds, I investigated this in more detail and identified two
> long
> standing bugs in TOAST as well as redo recovery.

Wow! I'm currently also investigating issues on a customer system, too,
where suddenly TOAST errors arised after shutdown immediate.

I haven't dug into your findings yet, but it seems to perfectly fit
into the reports i got.

    Thanks,
        Bernd



Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Heikki Linnakangas
Date:
On 10/04/18 13:29, Pavan Deolasee wrote:
> Hello,
> 
> One of our 2ndQuadrant support customers recently reported a sudden rush of
> TOAST errors post a crash recovery, nearly causing an outage. Most errors
> read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn
> 
> While we could bring back the cluster to normal quickly using some
> workarounds, I investigated this in more detail and identified two long
> standing bugs in TOAST as well as redo recovery.

Great detective work!

> ISTM that the right fix is to teach toast_save_datum() to check for
> existence of duplicate chunk_id by scanning the table with the same
> SnapshotToast that it later uses to fetch the tuples. We already do that in
> case of toast rewrite, but not for regular inserts. I propose to do that
> for regular path too, as done in the attached patch.

It would seem more straightforward to add a snapshot parameter to 
GetNewOidWithIndex(), so that the this one caller could pass 
SnapshotToast, while others pass SnapshotDirty. With your patch, you 
check the index twice: first with SnapshotDirty, in 
GetNewOidWithIndex(), and then with SnapshotToast, in the caller.

If I'm reading the rewrite case correctly, it's a bit different and 
quite special. In the loop with GetNewOidWithIndex(), it needs to check 
that the OID is unused in two tables, the old TOAST table, and the new 
one. You can only pass one index to GetNewOidWithIndex(), so it needs to 
check the second index manually. It's not because of the snapshot issue. 
Although I wonder if we should be using SnapshotToast in that 
GetNewOidWithIndex() call, too. I.e. if we should be checking both the 
old and the new toast table with SnapshotToast.

> I think the right fix is to simply ignore the nextOid counter while
> replaying ONLINE checkpoint record. We must have already initialised
> ShmemVariableCache->nextOid
> to the value stored in this (or some previous) checkpoint record when redo
> recovery is started. As and when we see XLOG_NEXTOID record, we would
> maintain the shared memory counter correctly. If we don't see any
> XLOG_NEXTOID record, the value we started with must be the correct value. I
> see no problem even when OID wraps around during redo recovery.
> XLOG_NEXTOID should record that correctly.

Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the 
value in the online checkpoint record is greater than 
ShmemVariableCache->nextXid. But we don't have such a wraparound-aware 
concept of "greater than" for OIDs. Ignoring the online checkpoint 
record's nextOid value seem fine to me.

- Heikki


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:
Hi Heikki,

On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:


It would seem more straightforward to add a snapshot parameter to GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast, while others pass SnapshotDirty. With your patch, you check the index twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with SnapshotToast, in the caller.

Hmm. I actually wrote my first patch exactly like that. I am trying to remember why I discarded that approach. Was that to do with the fact that  SnapshotToast can't see all toast tuples either? Yeah, I think so. For example, it won't see tuples with uncommitted-xmin, leading to different issues. Now it's unlikely that we will have a OID conflict where the old tuple has uncommitted-xmin, but not sure if we can completely rule that out. For example, if we did not uncover the redo recovery bug, we could have had a prepared transaction having inserted the old tuple, which now conflicts with new tuple and not detected by SnapshotToast.

 

If I'm reading the rewrite case correctly, it's a bit different and quite special. In the loop with GetNewOidWithIndex(), it needs to check that the OID is unused in two tables, the old TOAST table, and the new one. You can only pass one index to GetNewOidWithIndex(), so it needs to check the second index manually. It's not because of the snapshot issue. Although I wonder if we should be using SnapshotToast in that GetNewOidWithIndex() call, too. I.e. if we should be checking both the old and the new toast table with SnapshotToast.

As I said, I am not sure if checking with just SnapshotToast is enough because it can't see "dirty" tuples. 
 

Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the value in the online checkpoint record is greater than ShmemVariableCache->nextXid. But we don't have such a wraparound-aware concept of "greater than" for OIDs. Ignoring the online checkpoint record's nextOid value seem fine to me.

Ok. Thanks for checking.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:


On Tue, Apr 10, 2018 at 7:24 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
Hi Heikki,

On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:


It would seem more straightforward to add a snapshot parameter to GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast, while others pass SnapshotDirty. With your patch, you check the index twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with SnapshotToast, in the caller.

Hmm. I actually wrote my first patch exactly like that. I am trying to remember why I discarded that approach. Was that to do with the fact that  SnapshotToast can't see all toast tuples either? Yeah, I think so. For example, it won't see tuples with uncommitted-xmin, leading to different issues. Now it's unlikely that we will have a OID conflict where the old tuple has uncommitted-xmin, but not sure if we can completely rule that out. 

Or may be we simply err on the side of caution and scan the toast table with SnapshotAny while looking for a duplicate? That might prevent us from reusing an OID for a known-dead tuple, but should save us a second index scan and still work.

Thanks,
Pavan
 
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Tom Lane
Date:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> Or may be we simply err on the side of caution and scan the toast table
> with SnapshotAny while looking for a duplicate? That might prevent us from
> reusing an OID for a known-dead tuple, but should save us a second index
> scan and still work.

+1.  We really don't want to expend two indexscans on this.

I was worried about changing the signature of GetNewOidWithIndex in
a back-patched fix, but after looking around I think that's probably
safe.  External callers really shouldn't be using anything lower-level
than GetNewOid.

            regards, tom lane


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:


On Wed, Apr 11, 2018 at 8:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> Or may be we simply err on the side of caution and scan the toast table
> with SnapshotAny while looking for a duplicate? That might prevent us from
> reusing an OID for a known-dead tuple, but should save us a second index
> scan and still work.

+1.  We really don't want to expend two indexscans on this.


Ok. I propose attached patches, more polished this time. I also changed toastrel_valueid_exists() to use SnapshotAny, but I don't have any proofs to claim that's a needed change. But it seems harmless and given the number of toast related, hard to chase bugs we have seen over the years, may it's worth making it full proof too.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Tom Lane
Date:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> Ok. I propose attached patches, more polished this time.

I'll take these, unless some other committer is hot to do so?

            regards, tom lane


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Simon Riggs
Date:
On 11 April 2018 at 19:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavan Deolasee <pavan.deolasee@gmail.com> writes:
>> Ok. I propose attached patches, more polished this time.
>
> I'll take these, unless some other committer is hot to do so?

Please go ahead.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Tom Lane
Date:
So while looking at this, it suddenly occurred to me that probing with
SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
either.  SnapshotDirty will consider a row dead the instant the
deleting transaction has committed, but it may remain visible to other
transactions for awhile after that --- and now that we use MVCC for
catalog scans, that applies to them too.  Hence, the existing logic
is capable of creating objects with transiently-conflicting OIDs.
I don't think this could create a conflict that's visible outside
our own transaction, since anyone who can see our commit would also
see the commit of the deleting transaction.  But there's definitely
a hazard inside the transaction that creates a new object.

I propose therefore that the right fix does not require an API change
for GetNewOidWithIndex: we can just make it use SnapshotAny all the
time.

            regards, tom lane


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Michael Paquier
Date:
On Wed, Apr 11, 2018 at 04:28:33PM -0400, Tom Lane wrote:
> I propose therefore that the right fix does not require an API change
> for GetNewOidWithIndex: we can just make it use SnapshotAny all the
> time.

I have not really checked this thread in details, but one thing that
strikes me is that it would be rather easy to add a TAP test based on
the initial script that Pavan has sent.  Would that be worth testing
cycles or not?  We do have tests like 008_fsm_truncation.pl which test
similar failure scenarios, so I think that we should have one.
--
Michael

Attachment

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> I have not really checked this thread in details, but one thing that
> strikes me is that it would be rather easy to add a TAP test based on
> the initial script that Pavan has sent.  Would that be worth testing
> cycles or not?

I doubt it --- that test is so specialized that it'd be unlikely to
catch any other bug.

An example of the sort of bug that could exist in the same area would
be if we'd failed to note the places in tuptoaster.c that also need to
be switched to SnapshotAny.  Only a test covering heap-rewrite could
have found that, which this test case surely doesn't.

What I think would actually be worth spending some time on is taking
a hard look at the remaining uses of SnapshotDirty to see if any of
them look squishy.

            regards, tom lane


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Peter Eisentraut
Date:
On 4/10/18 06:29, Pavan Deolasee wrote:
> One of our 2ndQuadrant support customers recently reported a sudden rush
> of TOAST errors post a crash recovery, nearly causing an outage. Most
> errors read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn

While researching this, I found that the terminology in this code is
quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
numbers, etc. seemingly interchangeably.  The above error is actually
about the chunk_seq, not about the chunk_id, as one might think.

The attached patch is my attempt to clean this up a bit.  Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:


On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So while looking at this, it suddenly occurred to me that probing with
SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
either.  

Yeah it occurred to me as well, but when I looked at the code, I couldn't find a case that is broken. I even tried a few test cases with DDLs etc. But I think what you did is fine and more bullet proof. So +1 to that.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:


On Thu, Apr 12, 2018 at 5:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> I have not really checked this thread in details, but one thing that
> strikes me is that it would be rather easy to add a TAP test based on
> the initial script that Pavan has sent.  Would that be worth testing
> cycles or not?

I doubt it --- that test is so specialized that it'd be unlikely to
catch any other bug.


I agree; it's not worth adding a TAP test except may be as a demonstration to write future tests for catching such issues. This was a very specialised test case written after getting a full grasp on the bug. And it just tests the thing that I knew is broken based on code reading. Also, with OID duplicate issue fixed, hitting more bugs in this area is going to be even more difficult.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Tom Lane
Date:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So while looking at this, it suddenly occurred to me that probing with
>> SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
>> either.

> Yeah it occurred to me as well, but when I looked at the code, I couldn't
> find a case that is broken. I even tried a few test cases with DDLs etc.

I think it probably can't happen for catalog MVCC scans, because we
generally take full new snapshots for those.  The situation that would be
hazardous is where we use an existing snapshot (so that it doesn't see
the just-committed Other Transaction) but advance its command counter
(so that it can see the new object we just made).  So the sort of failure
I'd predict is that a user query just after an object creation could see
duplicate OIDs in the catalogs.  To get to that point, you might need a
stable function (using the troublesome snapshot) calling a volatile one
(which contains the actual DDL).

> But I think what you did is fine and more bullet proof. So +1 to that.

Yeah, even if this isn't actually a reachable case, it's hard to argue
that we should complicate the code and take API-break risks in the back
branches to make a tiny optimization on the assumption that it can never
happen.

            regards, tom lane


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Pavan Deolasee
Date:


On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 4/10/18 06:29, Pavan Deolasee wrote:
> One of our 2ndQuadrant support customers recently reported a sudden rush
> of TOAST errors post a crash recovery, nearly causing an outage. Most
> errors read like this:
>
> ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn

While researching this, I found that the terminology in this code is
quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
numbers, etc. seemingly interchangeably.  The above error is actually
about the chunk_seq, not about the chunk_id, as one might think.

The attached patch is my attempt to clean this up a bit.  Thoughts?

While I agree that we should clean it up, I wonder if changing error text would be a good idea. These errors are being reported by a very long time and if we change the text, we might forget the knowledge about the past reports.

Also, "toast value" is same as "chunk_id". Should we clean up something there too? "chunk_seq number" -- should that be just "chunk_seq"?


Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Bruce Momjian
Date:
On Wed, Apr 18, 2018 at 12:07:52PM +0530, Pavan Deolasee wrote:
> 
> 
> On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> wrote:
> 
>     On 4/10/18 06:29, Pavan Deolasee wrote:
>     > One of our 2ndQuadrant support customers recently reported a sudden rush
>     > of TOAST errors post a crash recovery, nearly causing an outage. Most
>     > errors read like this:
>     >
>     > ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn
> 
>     While researching this, I found that the terminology in this code is
>     quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
>     numbers, etc. seemingly interchangeably.  The above error is actually
>     about the chunk_seq, not about the chunk_id, as one might think.
> 
>     The attached patch is my attempt to clean this up a bit.  Thoughts?
> 
> 
> While I agree that we should clean it up, I wonder if changing error text would
> be a good idea. These errors are being reported by a very long time and if we
> change the text, we might forget the knowledge about the past reports.
> 
> Also, "toast value" is same as "chunk_id". Should we clean up something there
> too? "chunk_seq number" -- should that be just "chunk_seq"?

We can put a comment next to the error message C string if we want to
keep historical knowledge.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Peter Eisentraut
Date:
On 4/18/18 02:37, Pavan Deolasee wrote:
> While I agree that we should clean it up, I wonder if changing error
> text would be a good idea. These errors are being reported by a very
> long time and if we change the text, we might forget the knowledge about
> the past reports.

It's presumably fixed now, so if we see the new error message, we'll
know it's a different issue.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

From
Alvaro Herrera
Date:
On 2018-Apr-11, Peter Eisentraut wrote:

> On 4/10/18 06:29, Pavan Deolasee wrote:
> > One of our 2ndQuadrant support customers recently reported a sudden rush
> > of TOAST errors post a crash recovery, nearly causing an outage. Most
> > errors read like this:
> > 
> > ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn
> 
> While researching this, I found that the terminology in this code is
> quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
> numbers, etc. seemingly interchangeably.  The above error is actually
> about the chunk_seq, not about the chunk_id, as one might think.
> 
> The attached patch is my attempt to clean this up a bit.  Thoughts?

I didn't review your patch carefully, but +1 on the idea.  By all means
let's not forget this for pg12.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services