Thread: Shared buffer hash table corrupted
Hi All,
Running 9.6.15, this morning we got a 'shared buffer hash table corrupted' error on a query. I reran the query a couple hours later, and it completed without error. This is running in production on a Linode instance which hasn't seen any config changes in months.
I didn't find much on-line about this. How concerned should I be? Would you move the instance to a different physical host?
Thanks,
Mark
Mark Fletcher <markf@corp.groups.io> writes: > Running 9.6.15, this morning we got a 'shared buffer hash table corrupted' > error on a query. I reran the query a couple hours later, and it completed > without error. This is running in production on a Linode instance which > hasn't seen any config changes in months. > I didn't find much on-line about this. How concerned should I be? Would you > move the instance to a different physical host? Personally, I'd restart the postmaster, but not do more than that unless the error recurs. regards, tom lane
On Fri, Feb 21, 2020 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Personally, I'd restart the postmaster, but not do more than that unless
the error recurs.
Thanks for the response. I did restart the postmaster yesterday. Earlier this morning, a query that normally completes fine started to error out with 'invalid memory alloc request size 18446744073709551613'. Needless to say our database isn't quite that size. This query was against a table in a different database than the one that had the corruption warning yesterday. Restarting the postmaster again fixed the problem. For good measure I restarted the machine as well.
I need to decide what to do next, if anything. We have a hot standby that we also run queries against, and it hasn't shown any errors. I can switch over to that as the primary. Or I can move the main database to a different physical host.
Thoughts appreciated.
Thanks,
Mark
Mark Fletcher <markf@corp.groups.io> writes: > Thanks for the response. I did restart the postmaster yesterday. Earlier > this morning, a query that normally completes fine started to error out > with 'invalid memory alloc request size 18446744073709551613'. Needless to > say our database isn't quite that size. This query was against a table in a > different database than the one that had the corruption warning yesterday. > Restarting the postmaster again fixed the problem. For good measure I > restarted the machine as well. Um. At that point I'd agree with your concern about developing hardware problems. Both of these symptoms could be easily explained by dropped bits in PG's shared memory area. Do you happen to know if the server has ECC RAM? regards, tom lane
On Sat, Feb 22, 2020 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um. At that point I'd agree with your concern about developing hardware
problems. Both of these symptoms could be easily explained by dropped
bits in PG's shared memory area. Do you happen to know if the server
has ECC RAM?
Yes, it appears that Linode uses ECC and other server grade hardware for their machines.
Thanks,
Mark