Thread: Shared buffer hash table corrupted

Shared buffer hash table corrupted

From
Mark Fletcher
Date:
Hi All,

Running 9.6.15, this morning we got a 'shared buffer hash table corrupted' error on a query. I reran the query a couple hours later, and it completed without error. This is running in production on a Linode instance which hasn't seen any config changes in months.

I didn't find much on-line about this. How concerned should I be? Would you move the instance to a different physical host?

Thanks,
Mark

Re: Shared buffer hash table corrupted

From
Tom Lane
Date:
Mark Fletcher <markf@corp.groups.io> writes:
> Running 9.6.15, this morning we got a 'shared buffer hash table corrupted'
> error on a query. I reran the query a couple hours later, and it completed
> without error. This is running in production on a Linode instance which
> hasn't seen any config changes in months.

> I didn't find much on-line about this. How concerned should I be? Would you
> move the instance to a different physical host?

Personally, I'd restart the postmaster, but not do more than that unless
the error recurs.

            regards, tom lane



Re: Shared buffer hash table corrupted

From
Mark Fletcher
Date:
On Fri, Feb 21, 2020 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally, I'd restart the postmaster, but not do more than that unless
the error recurs.

Thanks for the response. I did restart the postmaster yesterday. Earlier this morning, a query that normally completes fine started to error out with 'invalid memory alloc request size 18446744073709551613'. Needless to say our database isn't quite that size. This query was against a table in a different database than the one that had the corruption warning yesterday. Restarting the postmaster again fixed the problem. For good measure I restarted the machine as well.

I need to decide what to do next, if anything. We have a hot standby that we also run queries against, and it hasn't shown any errors. I can switch over to that as the primary. Or I can move the main database to a different physical host.

Thoughts appreciated.

Thanks,
Mark

Re: Shared buffer hash table corrupted

From
Tom Lane
Date:
Mark Fletcher <markf@corp.groups.io> writes:
> Thanks for the response. I did restart the postmaster yesterday. Earlier
> this morning, a query that normally completes fine started to error out
> with 'invalid memory alloc request size 18446744073709551613'. Needless to
> say our database isn't quite that size. This query was against a table in a
> different database than the one that had the corruption warning yesterday.
> Restarting the postmaster again fixed the problem. For good measure I
> restarted the machine as well.

Um.  At that point I'd agree with your concern about developing hardware
problems.  Both of these symptoms could be easily explained by dropped
bits in PG's shared memory area.  Do you happen to know if the server
has ECC RAM?

            regards, tom lane



Re: Shared buffer hash table corrupted

From
Mark Fletcher
Date:
On Sat, Feb 22, 2020 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Um.  At that point I'd agree with your concern about developing hardware
problems.  Both of these symptoms could be easily explained by dropped
bits in PG's shared memory area.  Do you happen to know if the server
has ECC RAM?

Yes, it appears that Linode uses ECC and other server grade hardware for their machines.

Thanks,
Mark