Re: Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers
From | Mark Kirkwood |
---|---|
Subject | Re: Write Ahead Logging for Hash Indexes |
Date | |
Msg-id | 844a643b-fe3d-7d9a-5040-570674fe9321@catalyst.net.nz Whole thread Raw |
In response to | Re: Write Ahead Logging for Hash Indexes (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Write Ahead Logging for Hash Indexes
(Amit Kapila <amit.kapila16@gmail.com>)
|
List | pgsql-hackers |
On 07/09/16 21:58, Amit Kapila wrote: > On Wed, Aug 24, 2016 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit.kapila16@gmail.com> >> wrote: >>> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >>>> After an intentionally created crash, I get an Assert triggering: >>>> >>>> TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & >>>> (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553) >>>> >>>> freep[0] is zero and bitmapbit is 16. >>>> >>> Here what is happening is that when it tries to clear the bitmapbit, >>> it expects it to be set. Now, I think the reason for why it didn't >>> find the bit as set could be that after the new overflow page is added >>> and the bit corresponding to it is set, you might have crashed the >>> system and the replay would not have set the bit. Then while freeing >>> the overflow page it can hit the Assert as mentioned by you. I think >>> the problem here could be that I am using REGBUF_STANDARD to log the >>> bitmap page updates which seems to be causing the issue. As bitmap >>> page doesn't follow the standard page layout, it would have omitted >>> the actual contents while taking full page image and then during >>> replay, it would not have set the bit, because page doesn't need REDO. >>> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm >>> buffers. >>> >>> If you can send me the detailed steps for how you have produced the >>> problem, then I can verify after fixing whether you are seeing the >>> same problem or something else. >> >> >> The test is rather awkward, it might be easier to just have me test it. >> > Okay, I have fixed this issue as explained above. Apart from that, I > have fixed another issue reported by Mark Kirkwood upthread and few > other issues found during internal testing by Ashutosh Sharma. > > The locking issue reported by Mark and Ashutosh is that the patch > didn't maintain the locking order while adding overflow page as it > maintains in other write operations (lock the bucket pages first and > then metapage to perform the write operation). I have added the > comments in _hash_addovflpage() to explain the locking order used in > modified patch. > > During stress testing with pgbench using master-standby setup, we > found an issue which indicates that WAL replay machinery doesn't > expect completely zeroed pages (See explanation of RBM_NORMAL mode > atop XLogReadBufferExtended). Previously before freeing the overflow > page we were zeroing it, now I have changed it to just initialize the > page such that the page will be empty. > > Apart from above, I have added support for old snapshot threshold in > the hash index code. > > Thanks to Ashutosh Sharma for doing the testing of the patch and > helping me in analyzing some of the above issues. > > I forgot to mention in my initial mail that Robert and I had some > off-list discussions about the design of this patch, many thanks to > him for providing inputs. > > Repeating my tests with these new patches applied points to the hang issue being solved. I tested several 10 minute runs (any of which was enough to elicit the hang previously). I'll do some longer ones, but looks good! regards Mark
pgsql-hackers by date: