Home > mailing lists

Thread: Intermittent missing chunk ERRORS

Intermittent missing chunk ERRORS

From

Sbob

Date:

26 April, 20:09:32

All;

I am working with a client, they have an app that is running into error's like this:

SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896

I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.

I assume that if this were a corrupt index issue then the query would continue to fail every time.

Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?

Thanks in advance

Re: Intermittent missing chunk ERRORS

From

Vijaykumar Jain

Date:

26 April, 23:04:37

a lot in this gist is something you can debug to start with.

https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a

if not the index than the toast table might be corrupt.

the bigger problem might be to figure out how it got corrupt. a one off power incident, or bug or storage is having issues etc.

once that is clear,

a simple start would be to run a pg_dump of the database to /dev/null and check if the dump is clean without errors.

Thanks,

Vijay

Open to work

Resume - Vijaykumar Jain

On Sat, Apr 26, 2025, 10:39 PM Sbob <sbob@quadratum-braccas.com> wrote:

All;

I am working with a client, they have an app that is running into error's like this:
SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896

I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.

I assume that if this were a corrupt index issue then the query would continue to fail every time.
Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?

Thanks in advance

Re: Intermittent missing chunk ERRORS

From

Vijaykumar Jain

Date:

26 April, 23:12:14

forgot about the core tool to check for the problem.

https://www.postgresql.org/docs/current/amcheck.html

this can help find corruption issues much faster.

Thanks,

Vijay

Open to work

Resume - Vijaykumar Jain

On Sun, Apr 27, 2025, 1:34 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

a lot in this gist is something you can debug to start with.

https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a

if not the index than the toast table might be corrupt.
the bigger problem might be to figure out how it got corrupt. a one off power incident, or bug or storage is having issues etc.
once that is clear,
a simple start would be to run a pg_dump of the database to /dev/null and check if the dump is clean without errors.

Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain

On Sat, Apr 26, 2025, 10:39 PM Sbob <sbob@quadratum-braccas.com> wrote:
All;

I am working with a client, they have an app that is running into error's like this:
SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896

I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well.

I assume that if this were a corrupt index issue then the query would continue to fail every time.
Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this?

Thanks in advance

Re: Intermittent missing chunk ERRORS

From

Laurenz Albe

Date:

26 April, 23:24:58

On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote:
> I am working with a client, they have an app that is running into error's like this:
>
> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
>
> I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index,
> REINDEX on the table and even a reindexdb on the database. Today they passed me another
> error (the one above) and once I was able to login I ran the same query the app team
> referenced and it ran without any errors. I asked them to re-run the query and they
> informed me that it now works for them as well.
>
> I assume that if this were a corrupt index issue then the query would continue to fail
> every time.
>
> Has anyone seen this type of intermittent index error before? and thoughts on where to
> look /how to debug this?

The error is data corruption.

If it is transient, that could mean a couple of things:

- a TOAST index was corrupted, and REINDEX fixed it

- faulty RAM corrupted data, but as soon as the corrupted buffer was evicted,
  the problem was gone

- somebody updated or deleted the row, and the corruption is hidden

Yours,
Laurenz Albe

Re: Intermittent missing chunk ERRORS

From

Tom Lane

Date:

26 April, 23:28:42

Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote:
>> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
>> Has anyone seen this type of intermittent index error before? and thoughts on where to
>> look /how to debug this?

> The error is data corruption.

We've seen transient errors of this sort arise from what are basically
timing problems, and fixed at least a few cases.  I wonder exactly
which PG version the problematic installation is running.

            regards, tom lane

Re: Intermittent missing chunk ERRORS

From

Sbob

Date:

27 April, 01:49:50

On 4/26/25 2:28 PM, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
>> On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote:
>>> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
>>> Has anyone seen this type of intermittent index error before? and thoughts on where to
>>> look /how to debug this?
>> The error is data corruption.
> We've seen transient errors of this sort arise from what are basically
> timing problems, and fixed at least a few cases.  I wonder exactly
> which PG version the problematic installation is running.
>
>             regards, tom lane


We are on V14

Re: Intermittent missing chunk ERRORS

From

Sbob

Date:

27 April, 01:54:18

On 4/26/25 2:24 PM, Laurenz Albe wrote:
> On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote:
>> I am working with a client, they have an app that is running into error's like this:
>>
>> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896
>>
>> I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index,
>> REINDEX on the table and even a reindexdb on the database. Today they passed me another
>> error (the one above) and once I was able to login I ran the same query the app team
>> referenced and it ran without any errors. I asked them to re-run the query and they
>> informed me that it now works for them as well.
>>
>> I assume that if this were a corrupt index issue then the query would continue to fail
>> every time.
>>
>> Has anyone seen this type of intermittent index error before? and thoughts on where to
>> look /how to debug this?
> The error is data corruption.
>
> If it is transient, that could mean a couple of things:
>
> - a TOAST index was corrupted, and REINDEX fixed it
>
> - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted,
>    the problem was gone
>
> - somebody updated or deleted the row, and the corruption is hidden
>
> Yours,
> Laurenz Albe


I did REINDEX and the error came back, however this last time when they 
informed me about the error they sent the select statement throwing the 
error and once I logged in I ran their select statement and it worked, 
no REINDEX had been run, and then I asked them to run it again and it 
worked for them as well

So, the first option above is not the answer and I dont think anyone 
updated the row, their process is mostly reads but I will verify, so 
maybe the RAM issue? the servers are cloud VM's in a private cloud

Re: Intermittent missing chunk ERRORS

From

Tom Lane

Date:

27 April, 03:03:32

Sbob <sbob@quadratum-braccas.com> writes:
> On 4/26/25 2:28 PM, Tom Lane wrote:
>> We've seen transient errors of this sort arise from what are basically
>> timing problems, and fixed at least a few cases.  I wonder exactly
>> which PG version the problematic installation is running.

> We are on V14

14-what?  We are talking about bug fixes, so it matters.

            regards, tom lane

Re: Intermittent missing chunk ERRORS

From

Sbob

Date:

27 April, 19:45:12

On 4/26/25 6:03 PM, Tom Lane wrote:
> Sbob <sbob@quadratum-braccas.com> writes:
>> On 4/26/25 2:28 PM, Tom Lane wrote:
>>> We've seen transient errors of this sort arise from what are basically
>>> timing problems, and fixed at least a few cases.  I wonder exactly
>>> which PG version the problematic installation is running.
>> We are on V14
> 14-what?  We are talking about bug fixes, so it matters.
>
>             regards, tom lane


postgres=# select version();
version
----------------------------------------------------------------------------------------------------------
  PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 
20210514 (Red Hat 8.5.0-23), 64-bit
(1 row)

Re: Intermittent missing chunk ERRORS

From

Tom Lane

Date:

27 April, 21:25:55

Sbob <sbob@quadratum-braccas.com> writes:
> On 4/26/25 6:03 PM, Tom Lane wrote:
>> 14-what?  We are talking about bug fixes, so it matters.

>   PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 
> 20210514 (Red Hat 8.5.0-23), 64-bit

OK, well that's current at least...

I don't suppose you can build a test case that reproduces this
failure?  If it's timing-dependent as I suspect, it might only fail
once in awhile, but that would be good enough for investigation.

            regards, tom lane

Re: Intermittent missing chunk ERRORS

From

Laurenz Albe

Date:

27 April, 22:52:52

On Sat, 2025-04-26 at 16:54 -0600, Sbob wrote:
> > If it is transient, that could mean a couple of things:
> >
> > - a TOAST index was corrupted, and REINDEX fixed it
> >
> > - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted,
> >     the problem was gone
> >
> > - somebody updated or deleted the row, and the corruption is hidden
>
> I did REINDEX and the error came back, however this last time when they
> informed me about the error they sent the select statement throwing the
> error and once I logged in I ran their select statement and it worked,
> no REINDEX had been run, and then I asked them to run it again and it
> worked for them as well
>
> So, the first option above is not the answer and I dont think anyone
> updated the row, their process is mostly reads but I will verify, so
> maybe the RAM issue? the servers are cloud VM's in a private cloud

Maybe.  Check the hardware.
Or maybe it is as Tom says.

Yours,
Laurenz Albe

Re: Intermittent missing chunk ERRORS

From

Sbob

Date:

28 April, 17:10:57

On 4/27/25 12:25 PM, Tom Lane wrote:
> Sbob <sbob@quadratum-braccas.com> writes:
>> On 4/26/25 6:03 PM, Tom Lane wrote:
>>> 14-what?  We are talking about bug fixes, so it matters.
>>    PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
>> 20210514 (Red Hat 8.5.0-23), 64-bit
> OK, well that's current at least...
>
> I don't suppose you can build a test case that reproduces this
> failure?  If it's timing-dependent as I suspect, it might only fail
> once in awhile, but that would be good enough for investigation.
>
>             regards, tom lane


Yeah, it only fails once in awhile, it seems like it fairly often but 
building a case to recreate it will likely be difficult. I will continue 
to work with the app team and see if I can find a pattern / process that 
recreates it.