Thread: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
Hi, while looking at trigger.c from the POV of foreign key locks I noticed that GetTupleForTrigger commentless assumes it can just look at a pages content without a LockBuffer(buffer, BUFFER_LOCK_SHARE); That code path is only reached for AFTER ... FOR EACH ROW ... triggers, so its fine it's not locking the tuple itself. That already needs to have happened before. The code in question: if (newslot_which_is_passed_by_before_triggers)... else {Page page;ItemId lp; buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); page = BufferGetPage(buffer);lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid)); Assert(ItemIdIsNormal(lp)); tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp);tuple.t_len = ItemIdGetLength(lp);tuple.t_self = *tid;tuple.t_tableOid= RelationGetRelid(relation); } result = heap_copytuple(&tuple); ReleaseBuffer(buffer); As can be seen in the excerpt above this is basically a very stripped down heap_fetch(). But without any locking on the buffer we just read. I can't believe this is safe? Just about anything but eviction could happen to that buffer at that moment? Am I missing something? Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Pavan Deolasee
Date:
On Fri, Nov 30, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:
-- Hi,
while looking at trigger.c from the POV of foreign key locks I noticed
that GetTupleForTrigger commentless assumes it can just look at a pages
content without a
LockBuffer(buffer, BUFFER_LOCK_SHARE);
That code path is only reached for AFTER ... FOR EACH ROW ... triggers,
so its fine it's not locking the tuple itself. That already needs to
have happened before.
The code in question:
if (newslot_which_is_passed_by_before_triggers)
...
else
{
Page page;
ItemId lp;
buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
page = BufferGetPage(buffer);
lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
Assert(ItemIdIsNormal(lp));
tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tuple.t_len = ItemIdGetLength(lp);
tuple.t_self = *tid;
tuple.t_tableOid = RelationGetRelid(relation);
}
result = heap_copytuple(&tuple);
ReleaseBuffer(buffer);
As can be seen in the excerpt above this is basically a very stripped
down heap_fetch(). But without any locking on the buffer we just read.
I can't believe this is safe? Just about anything but eviction could
happen to that buffer at that moment?
Am I missing something?
That seems to be safe to me. Anything thats been read above can't really change. The tuple is already locked, so a concurrent update/delete has to wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't happen either. I can't see any other operation that can really change those fields.
heap_fetch() though looks very similar has an important difference. It might be reading the tuple without lock on it and we need the buffer lock to protect against concurrent update/delete to the tuple. AFAIK once the tuple is passed through qualification checks etc, even callers of heap_fetch() can read the tuple data without any lock on the buffer itself.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 08:52:18 +0530, Pavan Deolasee wrote: > On Fri, Nov 30, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com>wrote: > > > Hi, > > > > while looking at trigger.c from the POV of foreign key locks I noticed > > that GetTupleForTrigger commentless assumes it can just look at a pages > > content without a > > LockBuffer(buffer, BUFFER_LOCK_SHARE); > > > > That code path is only reached for AFTER ... FOR EACH ROW ... triggers, > > so its fine it's not locking the tuple itself. That already needs to > > have happened before. > > > > The code in question: > > > > if (newslot_which_is_passed_by_before_triggers) > > ... > > else > > { > > Page page; > > ItemId lp; > > > > buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); > > > > page = BufferGetPage(buffer); > > lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid)); > > > > Assert(ItemIdIsNormal(lp)); > > > > tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp); > > tuple.t_len = ItemIdGetLength(lp); > > tuple.t_self = *tid; > > tuple.t_tableOid = RelationGetRelid(relation); > > } > > > > result = heap_copytuple(&tuple); > > ReleaseBuffer(buffer); > > > > As can be seen in the excerpt above this is basically a very stripped > > down heap_fetch(). But without any locking on the buffer we just read. > > > > I can't believe this is safe? Just about anything but eviction could > > happen to that buffer at that moment? > > > > Am I missing something? > > > > > That seems to be safe to me. Anything thats been read above can't really > change. The tuple is already locked, so a concurrent update/delete has to > wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't > happen either. I can't see any other operation that can really change those > fields. We only get the pin right there, I don't see any preexisting pin. Which means we might have just opened a page thats in the process of being pruned/vacuumed by another backend. I think a concurrent heap_(insert|update)/PageAddItem might actually be already dangerous because it might move line pointers around > heap_fetch() though looks very similar has an important difference. It > might be reading the tuple without lock on it and we need the buffer lock > to protect against concurrent update/delete to the tuple. AFAIK once the > tuple is passed through qualification checks etc, even callers of > heap_fetch() can read the tuple data without any lock on the buffer itself. Sure, but the page header isn't accessed there, just the tuple data itself which always stays at the same place on the page. (A normal heap_fetch shouldn't be worried about updates/deletions to its tuple, MVCC to the rescue.) Even if it turns out to be safe, I think this deserves at least a comment... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Alvaro Herrera
Date:
Andres Freund escribió: > On 2012-11-30 08:52:18 +0530, Pavan Deolasee wrote: > > On Fri, Nov 30, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com>wrote: > > > I can't believe this is safe? Just about anything but eviction could > > > happen to that buffer at that moment? Yeah, it does seem fishy to me as well. > Even if it turns out to be safe, I think this deserves at least a > comment... No doubt that trigger.c is hugely underdocumented in some places. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Pavan Deolasee
Date:
On Fri, Nov 30, 2012 at 5:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Thanks,
We only get the pin right there, I don't see any preexisting pin. Which
> >
> That seems to be safe to me. Anything thats been read above can't really
> change. The tuple is already locked, so a concurrent update/delete has to
> wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't
> happen either. I can't see any other operation that can really change those
> fields.
means we might have just opened a page thats in the process of being
pruned/vacuumed by another backend.
Hmm. Yeah, you're right. That is a possible risky scenario. Even though cleanup lock waits for all pins to go away, it will work only if every reader takes at least a SHARE lock unless it was continuously holding a pin on a buffer (in which case its OK to drop lock and read a tuple body without reacquiring it again). Otherwise, as you rightly pointed out, we could actually be reading a page which being actively cleaned up and tuples are being moved around.
I think a concurrent heap_(insert|update)/PageAddItem might actually be
already dangerous because it might move line pointers around
I don't we move the line pointers around ever because the indexes will be pointing to them. But the vacuum/prune is dangerous enough to require a SHARE lock here in any case.
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Simon Riggs
Date:
On 30 November 2012 11:58, Andres Freund <andres@2ndquadrant.com> wrote: > We only get the pin right there, I don't see any preexisting pin. Seems easy enough to test with an Assert patch. If the Assert doesn't fail, we apply it as "documentation" of the requirement for a pin. If it fails, we fix the bug. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 18:09:49 +0530, Pavan Deolasee wrote: > On Fri, Nov 30, 2012 at 5:28 PM, Andres Freund <andres@2ndquadrant.com>wrote: > > > > > > > > > > That seems to be safe to me. Anything thats been read above can't really > > > change. The tuple is already locked, so a concurrent update/delete has to > > > wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't > > > happen either. I can't see any other operation that can really change > > those > > > fields. > > > > We only get the pin right there, I don't see any preexisting pin. Which > > means we might have just opened a page thats in the process of being > > pruned/vacuumed by another backend. > > > > Hmm. Yeah, you're right. That is a possible risky scenario. Even though > cleanup lock waits for all pins to go away, it will work only if every > reader takes at least a SHARE lock unless it was continuously holding a pin > on a buffer (in which case its OK to drop lock and read a tuple body > without reacquiring it again). Otherwise, as you rightly pointed out, we > could actually be reading a page which being actively cleaned up and tuples > are being moved around. Well, live (or recently dead) tuples can't be just moved arround unless I miss something. That would cause problems with with HeapTuples directly pointing into the page as youve noticed. > > I think a concurrent heap_(insert|update)/PageAddItem might actually be > > already dangerous because it might move line pointers around > > > > > I don't we move the line pointers around ever because the indexes will be > pointing to them. The indexes point to a tid, not to a line pointer. So reshuffling the linepointer array - while keeping the tids valid - is entirely fine. Right? Also, PageAddItem does that all the time, so I think we would have noticed if not ;) Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 12:50:06 +0000, Simon Riggs wrote: > On 30 November 2012 11:58, Andres Freund <andres@2ndquadrant.com> wrote: > > > We only get the pin right there, I don't see any preexisting pin. > > Seems easy enough to test with an Assert patch. > > If the Assert doesn't fail, we apply it as "documentation" of the > requirement for a pin. > > If it fails, we fix the bug. I think its wrong even if we were holding a pin all the time due the the aforementioned PageAddItem reshuffling of line pointers. So that Assert wouldn't proof enough. I can try to proof corruption there, but I would rather see somebody coming along telling me why its safe and that I am dumb for not realizing it. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Pavan Deolasee
Date:
On Fri, Nov 30, 2012 at 6:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
-- On 2012-11-30 18:09:49 +0530, Pavan Deolasee wrote:Well, live (or recently dead) tuples can't be just moved arround unless
> On Fri, Nov 30, 2012 at 5:28 PM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> >
> > > >
> > > That seems to be safe to me. Anything thats been read above can't really
> > > change. The tuple is already locked, so a concurrent update/delete has to
> > > wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't
> > > happen either. I can't see any other operation that can really change
> > those
> > > fields.
> >
> > We only get the pin right there, I don't see any preexisting pin. Which
> > means we might have just opened a page thats in the process of being
> > pruned/vacuumed by another backend.
> >
>
> Hmm. Yeah, you're right. That is a possible risky scenario. Even though
> cleanup lock waits for all pins to go away, it will work only if every
> reader takes at least a SHARE lock unless it was continuously holding a pin
> on a buffer (in which case its OK to drop lock and read a tuple body
> without reacquiring it again). Otherwise, as you rightly pointed out, we
> could actually be reading a page which being actively cleaned up and tuples
> are being moved around.
I miss something. That would cause problems with with HeapTuples
directly pointing into the page as youve noticed.
I think we confusing with the terminology. The tuple headers and bodies (live or dead) can be moved around freely as long as you have a cleanup lock on the page. Thats how HOT-prune frees up fragmented space.
> > I think a concurrent heap_(insert|update)/PageAddItem might actually beThe indexes point to a tid, not to a line pointer. So reshuffling the
> > already dangerous because it might move line pointers around
> >
> >
> I don't we move the line pointers around ever because the indexes will be
> pointing to them.
linepointer array - while keeping the tids valid - is entirely
fine. Right?
I think its again terminology confusion :-) I thought TID and Line Pointers are the same and used interchangeably. Or at least I've done so for long. But I think you are reading line pointer as the thing stored in the ItemId structure and not the ItemId itself.
Also, PageAddItem() doesn't move things around normally. It does that only during recovery, at least for heao pages. Elsewhere it will either reuse an UNUSED pointer or add at the end. Isn't that how it is ?
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Pavan Deolasee
Date:
On Fri, Nov 30, 2012 at 5:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:
-- Sure, but the page header isn't accessed there, just the tuple data
> heap_fetch() though looks very similar has an important difference. It
> might be reading the tuple without lock on it and we need the buffer lock
> to protect against concurrent update/delete to the tuple. AFAIK once the
> tuple is passed through qualification checks etc, even callers of
> heap_fetch() can read the tuple data without any lock on the buffer itself.
itself which always stays at the same place on the page.
(A normal heap_fetch shouldn't be worried about updates/deletions to its
tuple, MVCC to the rescue.)
heap_fetch() reads the tuple header though to apply snapshot visibility rules. So a SHARE lock is must for that purpose because a concurrent update/delete may set XMAX or other visibility related fields. On the other hand, you can read the tuple body without a page lock if you were holding a pin on the buffer continuously since you last dropped the lock.
In any case, a lock is needed in GetTupleForTrigger unless someone can argue that a pin is continuously held on the buffer since the lock was last dropped, something I don't see happening in this case.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 18:35:05 +0530, Pavan Deolasee wrote: > On Fri, Nov 30, 2012 at 6:21 PM, Andres Freund <andres@2ndquadrant.com>wrote: > > > On 2012-11-30 18:09:49 +0530, Pavan Deolasee wrote: > > > On Fri, Nov 30, 2012 at 5:28 PM, Andres Freund <andres@2ndquadrant.com > > >wrote: > > > > > > > > > > > > > > > > > > That seems to be safe to me. Anything thats been read above can't > > really > > > > > change. The tuple is already locked, so a concurrent update/delete > > has to > > > > > wait on us. We have a pin on the buffer, so VACUUM or HOT-prune can't > > > > > happen either. I can't see any other operation that can really change > > > > those > > > > > fields. > > > > > > > > We only get the pin right there, I don't see any preexisting pin. Which > > > > means we might have just opened a page thats in the process of being > > > > pruned/vacuumed by another backend. > > > > > > > > > > Hmm. Yeah, you're right. That is a possible risky scenario. Even though > > > cleanup lock waits for all pins to go away, it will work only if every > > > reader takes at least a SHARE lock unless it was continuously holding a > > pin > > > on a buffer (in which case its OK to drop lock and read a tuple body > > > without reacquiring it again). Otherwise, as you rightly pointed out, we > > > could actually be reading a page which being actively cleaned up and > > tuples > > > are being moved around. > > > > Well, live (or recently dead) tuples can't be just moved arround unless > > I miss something. That would cause problems with with HeapTuples > > directly pointing into the page as youve noticed. > I think we confusing with the terminology. The tuple headers and bodies > (live or dead) can be moved around freely as long as you have a cleanup > lock on the page. Thats how HOT-prune frees up fragmented space. Need to read more code here. This is nothing I really have looked into before.Youre probably right. Up to now I thought hot pruning only removed intermediate & surely dead tuples but didn't move stuff arround. And that page fragementation was repaired separately. But it looks like I am wrong. > > > > I think a concurrent heap_(insert|update)/PageAddItem might actually be > > > > already dangerous because it might move line pointers around > > > > > > > > > > > I don't we move the line pointers around ever because the indexes will be > > > pointing to them. > > > > The indexes point to a tid, not to a line pointer. So reshuffling the > > linepointer array - while keeping the tids valid - is entirely > > fine. Right? > I think its again terminology confusion :-) I thought TID and Line Pointers > are the same and used interchangeably. Or at least I've done so for long. > But I think you are reading line pointer as the thing stored in the ItemId > structure and not the ItemId itself. I always understood ItemIds to be line pointers and ItemPointers to be tids. Line pointers are only meaningful within a single page, store the actual offset within the page and so on. ItemPointers (cids) are longer lasting and store (page, offset_number)… > Also, PageAddItem() doesn't move things around normally. It does that only > during recovery, at least for heao pages. Elsewhere it will either reuse an > UNUSED pointer or add at the end. Isn't that how it is ? Hm? It doesn't move the page contents around but it moves the ItemId array during completely normal operation (c.f. needshuffle in PageAddItem). Or am I missing something? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Pavan Deolasee
Date:
On Fri, Nov 30, 2012 at 6:55 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hm? It doesn't move the page contents around but it moves the ItemId
array during completely normal operation (c.f. needshuffle in
PageAddItem). Or am I missing something?
I think that probably only used for non-heap pages. For heap pages, it just doesn't make sense to shuffle the ItemId array. That would defeat the entire purpose of having them in the first place.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 13:57:46 +0100, Andres Freund wrote: > On 2012-11-30 12:50:06 +0000, Simon Riggs wrote: > > On 30 November 2012 11:58, Andres Freund <andres@2ndquadrant.com> wrote: > > > > > We only get the pin right there, I don't see any preexisting pin. > > > > Seems easy enough to test with an Assert patch. > > > > If the Assert doesn't fail, we apply it as "documentation" of the > > requirement for a pin. > > > > If it fails, we fix the bug. > > I think its wrong even if we were holding a pin all the time due the the > aforementioned PageAddItem reshuffling of line pointers. So that Assert > wouldn't proof enough. But a failing Assert obviously proofs something. Stupid me. So here we go: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 98b8207..3b61d06 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -2662,6 +2662,16 @@ ltrmark:; buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); +#ifdef USE_ASSERT_CHECKING + if (!BufferIsLocal(buffer)) + { + /* Only pinned by the above ReadBuffer */ + if (PrivateRefCount[buffer - 1] <= 1) + elog(ERROR, "too low local pin count: %d", + PrivateRefCount[buffer - 1]); + } +#endif + page = BufferGetPage(buffer); lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid)); CREATE OR REPLACE FUNCTION raise_notice_id() RETURNS trigger LANGUAGE plpgsql AS $body$ BEGIN RAISE NOTICE 'id: %', OLD.id; RETURN NULL; END $body$; postgres=# CREATE TABLE crashdummy(id serial primary key, data int); postgres=# CREATE TRIGGER crashdummy_after_delete AFTER DELETE ON crashdummy FOR EACH ROW EXECUTE PROCEDURE raise_notice_id(); postgres=# INSERT INTO crashdummy(data) SELECT * FROM generate_series(1, 1000); postgres=# DELETE FROM crashdummy WHERE ctid IN (SELECT ctid FROM crashdummy WHERE data < 1000); ERROR: too low local pin count: 1 Time: 4.515 ms A plain DELETE without the subselect doesn't trigger the problem though, thats probably the reason we haven't seen any problems so far. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > But a failing Assert obviously proofs something. It doesn't prove that the code is unsafe though. I am suspicious that it is safe, because it's pretty darn hard to believe we'd not have seen field reports of problems with triggers otherwise. It's not like this is a seldom-executed code path. But I concede that I don't see *why* it's safe --- it looks like it ought to be possible for a VACUUM to move the tuple while the trigger code is fetching it. You'd need the timing to be just so... regards, tom lane
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 10:42:21 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > But a failing Assert obviously proofs something. > > It doesn't prove that the code is unsafe though. Hehe. > I am suspicious that it is safe, because it's pretty darn hard to > believe we'd not have seen field reports of problems with triggers > otherwise. It's not like this is a seldom-executed code path. Thats true, but I think due to the fact that the plain DELETEs do *not* trigger the Assert we might just not have noticed it. A make check with the error checking in place doesn't error out in fact. Also I think the wrong data caused by it aren't that likely to be noticed. Its just the OLD in triggers so its not going to be looked at all the time all too closely and we might return data thats somewhat validly looking. I think most executor paths actually hold a separate pin "by accident" while this is executing. I think that my example is hitting that case is due to the ReleaseBuffer() after ExecStoreTuple() in TidNext(). Seqscans have another pin via scan->rs_cbuf, indexscans one via ->xs_cbuf. Many of the other nodes that are likely below ExecUpdate/Delete probably work similar. I think most cases with high throughput (which you probably need to actually hit the relatively small window) won't use plans that are complex enough to trigger the bug. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-11-30 10:42:21 -0500, Tom Lane wrote: >> I am suspicious that it is safe, because it's pretty darn hard to >> believe we'd not have seen field reports of problems with triggers >> otherwise. It's not like this is a seldom-executed code path. > Thats true, but I think due to the fact that the plain DELETEs do *not* > trigger the Assert we might just not have noticed it. OK, I've managed to reproduce an actual failure, ie VACUUM moving the tuple underneath the fetch. It's not easy to do though, and the timing has to be *really* tight. The reason that simple update/delete queries don't show the issue is that the tuple being fetched is the "old" one that was just updated/deleted, and in a simple query that one was just returned by a relation-scan plan node that's underneath the ModifyTuple node, and *that scan node still has pin* on the table page; or more accurately the TupleTableSlot it's returned the scan tuple in has the pin. This pin's been held since we did a LockBuffer to examine the tuple's liveness, so it's sufficient to prevent anyone from getting a cleanup lock. However, in a more complex plan such as a join, the ModifyTable node could be receiving tuples that aren't the current tuple of a scan anymore. (Your example case passes the tuples through a Sort, so none of them are pinned by the time the ModifyTable node gets them.) Another thing that reduces the odds of seeing this, in recent versions, is that when we first scanned the page containing the "old" tuple, we'll have (tried to) do a page prune. So even if a VACUUM comes along now, the odds are that it will not find any newly-reclaimable space. To reproduce the problem, I had to arrange for another tuple on the same page to become reclaimable between the relation scan and the GetTupleForTrigger fetch (which I did by having another, serializable transaction scan the table, then deleting the other tuple, then allowing the serializable transaction to commit after the page prune step). Lastly, the only way you see a problem is if VACUUM acquires cleanup lock before GetTupleForTrigger pins the page, finds some reclaimable space, and moves the target tuple on the page in order to compact the free space, and that data movement happens in the narrow time window between where GetTupleForTrigger does PageGetItem() and where it finishes the heap_copytuple() call just below that. Even then, you might escape seeing a problem, unless the data movement also shifts some *other* tuple into the space formerly occupied by the target. So that's why nobody's reported the problem --- it's not happening often enough for anyone to see it as a repeatable issue. I'll go insert a LockBuffer call. Good catch! regards, tom lane
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 12:53:27 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2012-11-30 10:42:21 -0500, Tom Lane wrote: > >> I am suspicious that it is safe, because it's pretty darn hard to > >> believe we'd not have seen field reports of problems with triggers > >> otherwise. It's not like this is a seldom-executed code path. > > > Thats true, but I think due to the fact that the plain DELETEs do *not* > > trigger the Assert we might just not have noticed it. > > OK, I've managed to reproduce an actual failure, ie VACUUM moving the > tuple underneath the fetch. It's not easy to do though, and the timing > has to be *really* tight. > > So that's why nobody's reported the problem --- it's not happening > often enough for anyone to see it as a repeatable issue. Yea. I have been playing with trying to reproduce the issue as well and I needed 3 sessions and two gdb's to cause any problem... And even then it took me several tries to get all conditions right. We call heap_prune* surprisingly often... > I'll go insert a LockBuffer call. Good catch! Thanks. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Tom Lane
Date:
BTW, I went looking for other places that might have a similar mistake. I found that contrib/pageinspect/btreefuncs.c pokes around in btree pages without any buffer lock, which seems pretty broken --- don't we want it to take share lock? regards, tom lane
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Andres Freund
Date:
On 2012-11-30 14:08:05 -0500, Tom Lane wrote: > BTW, I went looking for other places that might have a similar mistake. > I found that contrib/pageinspect/btreefuncs.c pokes around in btree > pages without any buffer lock, which seems pretty broken --- don't we > want it to take share lock? I seem to remember comments somewhere indicating that pageinspect (?) doesn't take locks by intention to make debugging of locking problems easier. Not sure whether thats really realistic, but ... Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-11-30 14:08:05 -0500, Tom Lane wrote: >> BTW, I went looking for other places that might have a similar mistake. >> I found that contrib/pageinspect/btreefuncs.c pokes around in btree >> pages without any buffer lock, which seems pretty broken --- don't we >> want it to take share lock? > I seem to remember comments somewhere indicating that pageinspect (?) > doesn't take locks by intention to make debugging of locking problems > easier. Not sure whether thats really realistic, but ... Dunno, that seems like a pretty lame argument when compared to the very real possibility of crashing due to processing corrupt data. Besides, the heap-page examination functions in the same module take buffer lock, so why shouldn't these? regards, tom lane
Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
From
Tom Lane
Date:
BTW, on further inspection, there's yet another reason why we've not heard about this from the field: even if all the wrong things happen and GetTupleForTrigger manages to copy bogus data for the old tuple, that data *is not passed to the trigger function*. The only thing we do with it is decide whether to queue an event for the trigger. So if you've got a WHEN condition for the trigger, that expression would see the bad data, or if it's an RI trigger the bad data is passed to RI_FKey_pk_upd_check_required or RI_FKey_fk_upd_check_required. But unless the data is corrupt enough to cause a crash, the worst outcome would be a wrong decision about whether the trigger needs to be run. It's possible this explains some of the reports we've seen about RI constraints being violated. regards, tom lane