Re: [HACKERS] segfault in hot standby for hash indexes - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: [HACKERS] segfault in hot standby for hash indexes |
Date | |
Msg-id | CAE9k0Pn9zNHgbbd7LCgGT=PzK0S0kiE4aTowv5=1xfyqyUMg4w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] segfault in hot standby for hash indexes (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] segfault in hot standby for hash indexes
(Amit Kapila <amit.kapila16@gmail.com>)
|
List | pgsql-hackers |
Hi, On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Mar 21, 2017 at 11:49 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> >>> I can confirm that that fixes the seg faults for me. >> >> Thanks for confirmation. >> >>> >>> Did you mean you couldn't reproduce the problem in the first place, or that >>> you could reproduce it and now the patch fixes it? If the first of those, I >>> forget to say you do have to wait for hot standby to reach a consistency and >>> open for connections, and then connect to the standby ("psql -p 9874"), >>> before the seg fault will be triggered. >> >> I meant that I was not able to reproduce the issue on HEAD. >> >>> >>> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges >>> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the >>> reason for the divergence. Is there a reason we dropped the PANIC if we >>> have not reached consistency? >> >> Well, I'm not quite sure how would standby allow any backend to >> connect to it until it has reached to a consistent state. If you see >> the definition of btree_xlog_delete_get_latestRemovedXid(), just >> before consistency check there is a if-condition 'if >> (CountDBBackends(InvalidOid) == 0)' which means >> we are checking for consistent state only after knowing that there are >> some backends connected to the standby. So, Is there a possibility of >> having some backend connected to standby server without having it in >> consistent state. >> > > I don't think so, but I think we should have reachedConsistency check > and elog(PANIC,..) similar to btree. If you see other conditions > where we PANIC in btree or hash xlog code, you will notice that those > are also theoretically not possible cases. It seems this is to save > database from getting corrupt or behaving insanely if due to some > reason (like a coding error or others) the check fails. okay, agreed. I have included it in the attached patch. However, I am still able to see the crash reported by Jeff upthread - [1]. I think there are couple of things that needs to be corrected inside hash_xlog_vacuum_get_latestRemovedXid(). 1) As Amit mentioned in his earlier mail [2], the block id used for registering deleted items in XLOG_HASH_VACUUM_ONE_PAGE is different than the block id used for fetching those items. I had fixed this and shared the patch earlier. With this patch I still see the crash Jeff reported yesterday [1]. 2) When a full page image of registered block is taken, the modified data associated with that block is not included in the WAL record. In such a case, we won't be able to fetch the items array that we have tried to include during xlog record (XLOG_HASH_VACUUM_ONE_PAGE) creation using following function. XLogRegisterBufData(0, (char *) deletable, ndeletable * sizeof(OffsetNumber)); If above is not included in the WAL record, then fetching the same using 'XLogRecGetBlockData(record, 0, &len);' will return NULL pointer. ptr = XLogRecGetBlockData(record, 0, &len); unused = (OffsetNumber *) ptr; ............ iitemid = PageGetItemId(ipage, unused[i]); Here, if ptr is NULL, reference to unused will cause a crash. To fix this, I think we should pass 'REGBUF_KEEP_DATA' while registering the buffer. Something like this, - XLogRegisterBuffer(0, buf, REGBUF_STANDARD); + XLogRegisterBuffer(0, buf, REGBUF_STANDARD | REGBUF_KEEP_DATA); Attached is the patch that fixes this issue. Please have a look and let me know if you still have any concerns. Thank you. [1] - https://www.postgresql.org/message-id/CAMkU%3D1w-9Qe%3DFf1o6bSaXpNO9wqpo7_9GL8_CVhw4BoVVHasqg%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1%2BYUedok0%2Bmeynnf8K3fqFsfdMpEFz1JiKLyyNv46hjaA%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: