Thread: Intermittent missing chunk ERRORS
All;
I am working with a client, they have an app that is running into error's like this:
SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.
I assume that if this were a corrupt index issue then the query would continue to fail every time.
Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?
Thanks in advance
All;
I am working with a client, they have an app that is running into error's like this:
SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.
I assume that if this were a corrupt index issue then the query would continue to fail every time.
Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?
Thanks in advance
a lot in this gist is something you can debug to start with.if not the index than the toast table might be corrupt.the bigger problem might be to figure out how it got corrupt. a one off power incident, or bug or storage is having issues etc.once that is clear,a simple start would be to run a pg_dump of the database to /dev/null and check if the dump is clean without errors.On Sat, Apr 26, 2025, 10:39 PM Sbob <sbob@quadratum-braccas.com> wrote:All;
I am working with a client, they have an app that is running into error's like this:
SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.
I assume that if this were a corrupt index issue then the query would continue to fail every time.
Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?
Thanks in advance
On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: > I am working with a client, they have an app that is running into error's like this: > > SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 > > I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, > REINDEX on the table and even a reindexdb on the database. Today they passed me another > error (the one above) and once I was able to login I ran the same query the app team > referenced and it ran without any errors. I asked them to re-run the query and they > informed me that it now works for them as well. > > I assume that if this were a corrupt index issue then the query would continue to fail > every time. > > Has anyone seen this type of intermittent index error before? and thoughts on where to > look /how to debug this? The error is data corruption. If it is transient, that could mean a couple of things: - a TOAST index was corrupted, and REINDEX fixed it - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, the problem was gone - somebody updated or deleted the row, and the corruption is hidden Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >> Has anyone seen this type of intermittent index error before? and thoughts on where to >> look /how to debug this? > The error is data corruption. We've seen transient errors of this sort arise from what are basically timing problems, and fixed at least a few cases. I wonder exactly which PG version the problematic installation is running. regards, tom lane
On 4/26/25 2:28 PM, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: >> On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >>> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >>> Has anyone seen this type of intermittent index error before? and thoughts on where to >>> look /how to debug this? >> The error is data corruption. > We've seen transient errors of this sort arise from what are basically > timing problems, and fixed at least a few cases. I wonder exactly > which PG version the problematic installation is running. > > regards, tom lane We are on V14
On 4/26/25 2:24 PM, Laurenz Albe wrote: > On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >> I am working with a client, they have an app that is running into error's like this: >> >> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >> >> I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, >> REINDEX on the table and even a reindexdb on the database. Today they passed me another >> error (the one above) and once I was able to login I ran the same query the app team >> referenced and it ran without any errors. I asked them to re-run the query and they >> informed me that it now works for them as well. >> >> I assume that if this were a corrupt index issue then the query would continue to fail >> every time. >> >> Has anyone seen this type of intermittent index error before? and thoughts on where to >> look /how to debug this? > The error is data corruption. > > If it is transient, that could mean a couple of things: > > - a TOAST index was corrupted, and REINDEX fixed it > > - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, > the problem was gone > > - somebody updated or deleted the row, and the corruption is hidden > > Yours, > Laurenz Albe I did REINDEX and the error came back, however this last time when they informed me about the error they sent the select statement throwing the error and once I logged in I ran their select statement and it worked, no REINDEX had been run, and then I asked them to run it again and it worked for them as well So, the first option above is not the answer and I dont think anyone updated the row, their process is mostly reads but I will verify, so maybe the RAM issue? the servers are cloud VM's in a private cloud
Sbob <sbob@quadratum-braccas.com> writes: > On 4/26/25 2:28 PM, Tom Lane wrote: >> We've seen transient errors of this sort arise from what are basically >> timing problems, and fixed at least a few cases. I wonder exactly >> which PG version the problematic installation is running. > We are on V14 14-what? We are talking about bug fixes, so it matters. regards, tom lane
On 4/26/25 6:03 PM, Tom Lane wrote: > Sbob <sbob@quadratum-braccas.com> writes: >> On 4/26/25 2:28 PM, Tom Lane wrote: >>> We've seen transient errors of this sort arise from what are basically >>> timing problems, and fixed at least a few cases. I wonder exactly >>> which PG version the problematic installation is running. >> We are on V14 > 14-what? We are talking about bug fixes, so it matters. > > regards, tom lane postgres=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-23), 64-bit (1 row)
Sbob <sbob@quadratum-braccas.com> writes: > On 4/26/25 6:03 PM, Tom Lane wrote: >> 14-what? We are talking about bug fixes, so it matters. > PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 > 20210514 (Red Hat 8.5.0-23), 64-bit OK, well that's current at least... I don't suppose you can build a test case that reproduces this failure? If it's timing-dependent as I suspect, it might only fail once in awhile, but that would be good enough for investigation. regards, tom lane
On Sat, 2025-04-26 at 16:54 -0600, Sbob wrote: > > If it is transient, that could mean a couple of things: > > > > - a TOAST index was corrupted, and REINDEX fixed it > > > > - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, > > the problem was gone > > > > - somebody updated or deleted the row, and the corruption is hidden > > I did REINDEX and the error came back, however this last time when they > informed me about the error they sent the select statement throwing the > error and once I logged in I ran their select statement and it worked, > no REINDEX had been run, and then I asked them to run it again and it > worked for them as well > > So, the first option above is not the answer and I dont think anyone > updated the row, their process is mostly reads but I will verify, so > maybe the RAM issue? the servers are cloud VM's in a private cloud Maybe. Check the hardware. Or maybe it is as Tom says. Yours, Laurenz Albe