Home > mailing lists

Re: [PATCH] Microvacuum for gist. - Mailing list pgsql-hackers

From	Anastasia Lubennikova
Subject	Re: [PATCH] Microvacuum for gist.
Date	September 16, 2015 11:12:54
Msg-id	55F94EAB.9070007@postgrespro.ru Whole thread Raw
In response to	Re: [PATCH] Microvacuum for gist. (Jeff Janes <jeff.janes@gmail.com>)
Responses	Re: [PATCH] Microvacuum for gist.
List	pgsql-hackers

Tree view

16.09.2015 07:30, Jeff Janes:

The commit of this patch seems to have created a bug in which updated tuples can disappear from the index, while remaining in the table.

It looks like the bug depends on going through a crash-recovery cycle, but I am not sure of that yet.

I've looked through the commit diff and don't see anything obviously wrong. I notice index tuples are marked dead with only a buffer content share lock, and the page is defragmented with only a buffer exclusive lock (as opposed to a super-exclusive buffer clean up lock). But as far as I can tell, both of those should be safe on an index. Also, if that was the bug, it should happen without crash-recovery.

The test is pretty simple. I create a 10,000 row table with a unique-by-construction id column with a btree_gist index on it and a counter column, and fire single-row updates of the counter for random ids in high concurrency (8 processes running flat out). I force the server to crash frequently with simulated torn-page writes in which md.c writes a partial page and then PANICs. Eventually (1 to 3 hours) the updates start indicating they updated 0 rows. At that point, a forced table scan will find the row, but the index doesn't.

Any hints on how to proceed with debugging this? If I can't get it to reproduce the problem in the absence of crash-recovery cycles with an overnight run, then I think my next step will be to run it over hot-standby and see if WAL replay in the absence of crashes might be broken as well.

I've found the bug. It's because of mixed calls of
PageIndexMultiDelete() in gistvacuumpage() and
PageIndexTupleDelete() in gistRedoPageUpdateRecord().
These functions are conflicting.

I've fixed my patch by change MultiDelete to TupleDelete in gistvacuumpage(). Patch is attached.
But It seems to me that it would be better to rewrite all mentions of TupleDelete to MultiDelete in gist code.
I'm working on it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company()

Attachment

micro_fix.patch

pgsql-hackers by date:

From: Thomas Munro
Date: 16 September 2015, 11:07:10
Subject: Re: synchronous_commit = apply

From: Thom Brown
Date: 16 September 2015, 11:22:09
Subject: Re: Sequence Access Method WIP

Re: [PATCH] Microvacuum for gist. - Mailing list pgsql-hackers

Attachment

Previous

Next