Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple - Mailing list pgsql-hackers
From | Wood, Dan |
---|---|
Subject | Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple |
Date | |
Msg-id | 8ABEB00F-E19E-4178-A00A-DDA99EA73D94@amazon.com Whole thread Raw |
In response to | Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple |
List | pgsql-hackers |
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I canstill get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks. Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a whileI usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might havea lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuumprocess starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table. On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wong, Yi Wen wrote: >> My interpretationof README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be: >> >> > if (TransactionIdIsValid(priorXmax) && >> > !TransactionIdEquals(priorXmax,HeapTupleHeaderGetRawXmin(htup))) >> > break; >> >> So the differenceis GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened > > I independently arrived at the same conclusion. Since I was trying with > 9.3, the patch differs -- inthe old version we must explicitely test > for the FrozenTransactionId value, instead of using GetRawXmin. > Attachedis the patch I'm using, and my own oneliner test (pretty much > the same I posted earlier) seems to survive dozensof iterations without > showing any problem in REINDEX. Confirmed, the problem goes away with this patch on9.3. > This patch is incomplete, since I think there are other places that need > to be patched in the same way(EvalPlanQualFetch? heap_get_latest_tid?). > Of course, for 9.4 and onwards we need to patch like you described. I have just done a lookup of the source code, and here is an exhaustive list of things in need of surgery: - heap_hot_search_buffer - heap_get_latest_tid - heap_lock_updated_tuple_rec - heap_prune_chain - heap_get_root_tuples - rewrite_heap_tuple - EvalPlanQualFetch (twice) > This bit in EvalPlanQualFetch caught my attention... why is it saying > xmin never changes? It does change with freezing. > > /* > * If xmin isn't what we're expecting, the slot must have been > *recycled and reused for an unrelated tuple. This implies that > * the latest version of therow was deleted, so we need do > * nothing. (Should be safe to examine xmin without getting > * buffer's content lock, since xmin never changes in an existing > * tuple.) > */ > if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), > priorXmax)) Agreed. That's not good. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: