Thread: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Craig Ringer
Date:
Hi all Andres and I were going over a patch yesterday and found an unexpected bug in tqual.c while attempting to trigger a hypothesized bug in that patch. A SELECT ... FOR SHARE will incorrectly block on another open transaction that ran SELECT ... FOR SHARE and still holds the locks if it has to follow a ctid chain from the current snapshot through a committed update to a share-locked tuple. This also affects uniqueness checks in btrees, where it can cause unnecessary waiting. It's also an issue with FOR KEY UPDATE, in that it can cause an update to block when it doesn't have to. The attached test case runs under isolationtester to exersise the problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked over the code and says the underlying bug goes back to the commit of MVCC, it's just become easier to trigger. To reliably test this with isolationtester I had to horribly abuse pg_advisory_lock(...) so that I could force the first SELECT ... FOR UPDATE to acquire its snapshot before the UPDATE runs. A backtrace of the point where it's incorrectly blocked is attached. What's happening is that the test for TransactionIdIsInProgress unconditionally sets snapshot->xmax, even if xmax was only set for locking purposes. This will cause the caller to wait for the xid in xmax when it doesn't need to. It should be testing HEAP_XMAX_IS_LOCKED_ONLY and only setting snapshot->xmax if the tuple is really being deleted by an open tx. Note that HeapTupleSatisfiesDirty tests the infomask for HEAP_XMAX_IS_MULTI and handles multixacts separately, and in that case it _does_ already test HEAP_XMAX_IS_LOCKED_ONLY. When you run the attached test case it should block indefinitely. After applying the attached patch it'll return as expected. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Alvaro Herrera
Date:
Craig Ringer wrote: > A SELECT ... FOR SHARE will incorrectly block on another open > transaction that ran SELECT ... FOR SHARE and still holds the locks if > it has to follow a ctid chain from the current snapshot through a > committed update to a share-locked tuple. > > This also affects uniqueness checks in btrees, where it can cause > unnecessary waiting. It's also an issue with FOR KEY UPDATE, in that it > can cause an update to block when it doesn't have to. Interesting bug. Thanks for the patch. I have applied it all the way back to 8.4 (with adjustments for 9.2 and beyond). > The attached test case runs under isolationtester to exersise the > problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked > over the code and says the underlying bug goes back to the commit of > MVCC, it's just become easier to trigger. To reliably test this with > isolationtester I had to horribly abuse pg_advisory_lock(...) so that I > could force the first SELECT ... FOR UPDATE to acquire its snapshot > before the UPDATE runs. I didn't apply the test case. I think if we want to test tqual.c behavior we will need to introduce a large battery of tests. I would like to see more opinions on whether that's something we want. > A backtrace of the point where it's incorrectly blocked is attached. > What's happening is that the test for TransactionIdIsInProgress > unconditionally sets snapshot->xmax, even if xmax was only set for > locking purposes. This will cause the caller to wait for the xid in xmax > when it doesn't need to. Yeah, after actually going over the code I think the bug is clear. (I was initially unsure about SatisfiesDirty returning true not false for this case; but the return value was correct, only snapshot->xmax was being set incorrectly. If you examine the callers they would misbehave if the return value were changed; for example EvalPlanQualFetch would completely skip the tuple if SatisfiesDirty returned false, which is not what we want; we want the tuple to be processed.) I think the comments on what exactly SatisfiesDirty does about in-progress transactions ought to be more specific. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Robert Haas
Date:
On Fri, Aug 2, 2013 at 5:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> The attached test case runs under isolationtester to exersise the >> problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked >> over the code and says the underlying bug goes back to the commit of >> MVCC, it's just become easier to trigger. To reliably test this with >> isolationtester I had to horribly abuse pg_advisory_lock(...) so that I >> could force the first SELECT ... FOR UPDATE to acquire its snapshot >> before the UPDATE runs. > > I didn't apply the test case. I think if we want to test tqual.c > behavior we will need to introduce a large battery of tests. I would > like to see more opinions on whether that's something we want. I haven't read this particular test, but I do think we could get a lot of mileage out of applying the isolation tester stuff to more things, and am generally in favor of that. It has the advantages of (1) allowing tests that require more than one session and (2) being run regularly the buildfarm; but it's not something developers typically run before every commit, so the run time of the test suite shouldn't be a big issue for anyone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Andres Freund
Date:
On 2013-08-05 14:37:34 -0400, Robert Haas wrote: > On Fri, Aug 2, 2013 at 5:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> The attached test case runs under isolationtester to exersise the > >> problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked > >> over the code and says the underlying bug goes back to the commit of > >> MVCC, it's just become easier to trigger. To reliably test this with > >> isolationtester I had to horribly abuse pg_advisory_lock(...) so that I > >> could force the first SELECT ... FOR UPDATE to acquire its snapshot > >> before the UPDATE runs. > > > > I didn't apply the test case. I think if we want to test tqual.c > > behavior we will need to introduce a large battery of tests. I would > > like to see more opinions on whether that's something we want. > > I haven't read this particular test, but I do think we could get a lot > of mileage out of applying the isolation tester stuff to more things, > and am generally in favor of that. The "problem" with the specific test is that it uses row level locks to get to the situation where EvalPlanQualFetch has to chase down a ctid chain by making a READ COMITTED transaction acquire a snapshot and only after that wait. Not sure if that's not too involved. > It has the advantages of (1) > allowing tests that require more than one session and (2) being run > regularly the buildfarm; but it's not something developers typically > run before every commit, so the run time of the test suite shouldn't > be a big issue for anyone. Agreed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Andres Freund
Date:
On 2013-08-06 04:48:09 +0200, Andres Freund wrote: > On 2013-08-05 14:37:34 -0400, Robert Haas wrote: > > On Fri, Aug 2, 2013 at 5:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > >> The attached test case runs under isolationtester to exersise the > > >> problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked > > >> over the code and says the underlying bug goes back to the commit of > > >> MVCC, it's just become easier to trigger. To reliably test this with > > >> isolationtester I had to horribly abuse pg_advisory_lock(...) so that I > > >> could force the first SELECT ... FOR UPDATE to acquire its snapshot > > >> before the UPDATE runs. > > > > > > I didn't apply the test case. I think if we want to test tqual.c > > > behavior we will need to introduce a large battery of tests. I would > > > like to see more opinions on whether that's something we want. > > > > I haven't read this particular test, but I do think we could get a lot > > of mileage out of applying the isolation tester stuff to more things, > > and am generally in favor of that. > > The "problem" with the specific test is that it uses row level locks to > get to the situation where EvalPlanQualFetch has to chase down a ctid > chain by making a READ COMITTED transaction acquire a snapshot and only > after that wait. > Not sure if that's not too involved. Errr, s/row level locks/advisory locks/ Thanks Craig, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Craig Ringer
Date:
On 08/06/2013 02:37 AM, Robert Haas wrote: > I haven't read this particular test, but I do think we could get a lot > of mileage out of applying the isolation tester stuff to more things, > and am generally in favor of that. It has the advantages of (1) > allowing tests that require more than one session and (2) being run > regularly the buildfarm; but it's not something developers typically > run before every commit, so the run time of the test suite shouldn't > be a big issue for anyone. The main issue with the test is that it's a dirty hack. What I really want is a way to block a statement at a certain point - to say "block after a snapshot is acquired" for example - and release that at a time of my choosing. That would require hooks (either callable from SQL in debug builds or only accessible via C extensions) that can block until released by action from another backend or after a timeout. The hook call sites would be macros that're defined away unless the hooks were enabled at compile time, in which case a shared_preload_library could register a callback on a hook to do whatever it needs to do. The library would probably expose some SQL functions as an extension to allow SQL-level manipulation of the blocking/timeout behaviour. Such hooks would be really useful for testing complex concurrency cases that can't be expressed at the statement level. I usually just use a "LOCK TABLE blah IN ACCESS EXCLUSIVE MODE" to set up a race, but that's only useful when your issue is at the statement level. It's also not possible to do this with isolationchecker, because it seems it can only cope with one blocked session at a time and you need (at least) three for this approach. Opinions? Is this something that's even worth thinking about further or a non-starter? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: HeapTupleSatisfiesDirty fails to test HEAP_XMAX_IS_LOCKED_ONLY for TransactionIdIsInProgress(...)
From
Jim Nasby
Date:
On 8/5/13 10:18 PM, Craig Ringer wrote: > The main issue with the test is that it's a dirty hack. What I really > want is a way to block a statement at a certain point - to say "block > after a snapshot is acquired" for example - and release that at a time > of my choosing. ... > Opinions? Is this something that's even worth thinking about further or > a non-starter? +1 Something that sets Enova apart from many other places is we have extensive database unit tests. One of the reasons for thatis you can do way more testing at the database level than you can trying to go through the application. This seems like just another case of that. We could either try to come up with some C level unit tests (which I suspect wouldn'twork in this case), or expose some hooks. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net