Thread: How to check whether the row was modified by this transaction before?

How to check whether the row was modified by this transaction before?

From
Vlad Arkhipov
Date:
In a BEFORE UPDATE trigger I need to know whether the row was previously 
modified by this transaction. Is it safe to use xmin and txid_current() 
for this purpose (xmin is 32-bit txid type but txid_current() returns 
64-bit bigint).

CREATE FUNCTION test_trigger()
RETURNS TRIGGER AS $$
BEGIN  IF OLD.xmin = txid_current() THEN    -- Do something.  ELSE    -- Do something else.  END IF;
END;
$$ LANGUAGE plpgsql;



Re: How to check whether the row was modified by this transaction before?

From
Robert Haas
Date:
On Thu, Dec 6, 2012 at 3:58 AM, Vlad Arkhipov <arhipov@dc.baikal.ru> wrote:
> In a BEFORE UPDATE trigger I need to know whether the row was previously
> modified by this transaction. Is it safe to use xmin and txid_current() for
> this purpose (xmin is 32-bit txid type but txid_current() returns 64-bit
> bigint).
>
> CREATE FUNCTION test_trigger()
> RETURNS TRIGGER AS $$
> BEGIN
>   IF OLD.xmin = txid_current() THEN
>     -- Do something.
>   ELSE
>     -- Do something else.
>   END IF;
> END;
> $$ LANGUAGE plpgsql;

txid_current() will return a different value from xmin after the XID
space has wrapped around at least once; also, you might need to
consider subtransactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to check whether the row was modified by this transaction before?

From
Tom Lane
Date:
Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> In a BEFORE UPDATE trigger I need to know whether the row was previously 
> modified by this transaction. Is it safe to use xmin and txid_current() 
> for this purpose (xmin is 32-bit txid type but txid_current() returns 
> 64-bit bigint).

>    IF OLD.xmin = txid_current() THEN

Comparing to txid_current() mod 2^32 would probably work, but note this
will not think that subtransactions or parent transactions are "this
transaction", so any use of savepoints or plpgsql exception blocks is
likely to cause headaches.  Why do you think you need to know this?
        regards, tom lane



Re: How to check whether the row was modified by this transaction before?

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> > In a BEFORE UPDATE trigger I need to know whether the row was previously
> > modified by this transaction. Is it safe to use xmin and txid_current()
> > for this purpose (xmin is 32-bit txid type but txid_current() returns
> > 64-bit bigint).
>
> >    IF OLD.xmin = txid_current() THEN
>
> Comparing to txid_current() mod 2^32 would probably work,

I think we should be setting the initial epoch to something other than
zero.  That way, some quick testing would have revealed this problem
immediately.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: How to check whether the row was modified by this transaction before?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I think we should be setting the initial epoch to something other than
> zero.  That way, some quick testing would have revealed this problem
> immediately.

Yeah, having initdb start the epoch at 1 doesn't seem unreasonable.
        regards, tom lane



Re: How to check whether the row was modified by this transaction before?

From
Andres Freund
Date:
On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> > > In a BEFORE UPDATE trigger I need to know whether the row was previously
> > > modified by this transaction. Is it safe to use xmin and txid_current()
> > > for this purpose (xmin is 32-bit txid type but txid_current() returns
> > > 64-bit bigint).

I wonder if we shouldn't have a function txid_is_current(xid); I could
have used that previously to avoid queuing multiple external
cache-invalidations if something gets repeatedly updated in the same
transaction. And I seem to remember some people here asking this
question before on the lists.

> >
> > >    IF OLD.xmin = txid_current() THEN
> >
> > Comparing to txid_current() mod 2^32 would probably work,
>
> I think we should be setting the initial epoch to something other than
> zero.  That way, some quick testing would have revealed this problem
> immediately.

+1, currently the difference of xid vs bigint is hard to spot.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How to check whether the row was modified by this transaction before?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
>> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
>>> In a BEFORE UPDATE trigger I need to know whether the row was previously
>>> modified by this transaction. Is it safe to use xmin and txid_current()
>>> for this purpose (xmin is 32-bit txid type but txid_current() returns
>>> 64-bit bigint).

> I wonder if we shouldn't have a function txid_is_current(xid);

Yeah, I was wondering that too, and wanted to know if the OP had a
use-case that was mainstream enough to justify adding such a function.
        regards, tom lane



Re: How to check whether the row was modified by this transaction before?

From
Andres Freund
Date:
On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> >> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> >>> In a BEFORE UPDATE trigger I need to know whether the row was previously
> >>> modified by this transaction. Is it safe to use xmin and txid_current()
> >>> for this purpose (xmin is 32-bit txid type but txid_current() returns
> >>> 64-bit bigint).
>
> > I wonder if we shouldn't have a function txid_is_current(xid);
>
> Yeah, I was wondering that too, and wanted to know if the OP had a
> use-case that was mainstream enough to justify adding such a function.

I think architectures with an invalidation-queue to external caches (be
it web-proxies or something lower-level) are quite popular. And with the
new NOTIFY or pgq relatively simple. Ad to those its sensible not to
post a single primary key more than once.

Magnus had talks about specifically that on various conferences if that
counts as anything ;)

Mainstreamy enough?

Andres
--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
>>> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
>>>> In a BEFORE UPDATE trigger I need to know whether the row was previously
>>>> modified by this transaction. Is it safe to use xmin and txid_current()
>>>> for this purpose (xmin is 32-bit txid type but txid_current() returns
>>>> 64-bit bigint).

>>> I wonder if we shouldn't have a function txid_is_current(xid);

> I think architectures with an invalidation-queue to external caches (be
> it web-proxies or something lower-level) are quite popular. And with the
> new NOTIFY or pgq relatively simple. Ad to those its sensible not to
> post a single primary key more than once.

It strikes me that the notion of "this row was previously modified by
the current transaction" is squishier than it might look, and we'd do
well to clarify it before we consider exporting anything.  I think there
are three ways you might define such a function:

1. xmin is exactly equal to current (sub)transaction's XID.

2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction of it.

3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction, or the XID of any open parent transaction or
subcommitted subtransaction thereof.

If I've got my head screwed on straight, test #2 gives you the semantics
that "the previous row update cannot commit unless the action you are
about to take (with the current XID) commits".  Test #3 gives you the
semantics "the action you are about to take cannot commit unless the
previous row update does".  And test #1 doesn't have much to recommend
it except simplicity; while it might appear to have the semantics "the
previous row update will commit if and only if the action you are about
to take commits", it's actually narrower than that, because the same
could be said for already-subcommitted subtransactions.

In a cache invalidation context you probably want test #2, but
TransactionIdIsCurrentTransactionId() presently performs test #3.

(We discussed this point in connection with commit 7b90469b, and I've
been intending since then to take a closer look at all the callers of
TransactionIdIsCurrentTransactionId to see if these semantics are in
fact what they need.  We might have some bugs associated with confusion
on this.)

I'm not sure which of these semantics we might wish to expose to users.
        regards, tom lane



Re: Re: How to check whether the row was modified by this transaction before?

From
Andres Freund
Date:
On 2012-12-06 16:26:50 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
> >> Andres Freund <andres@2ndquadrant.com> writes:
> >>> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> >>> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> >>>> In a BEFORE UPDATE trigger I need to know whether the row was previously
> >>>> modified by this transaction. Is it safe to use xmin and txid_current()
> >>>> for this purpose (xmin is 32-bit txid type but txid_current() returns
> >>>> 64-bit bigint).
>
> >>> I wonder if we shouldn't have a function txid_is_current(xid);
>
> > I think architectures with an invalidation-queue to external caches (be
> > it web-proxies or something lower-level) are quite popular. And with the
> > new NOTIFY or pgq relatively simple. Ad to those its sensible not to
> > post a single primary key more than once.
>
> It strikes me that the notion of "this row was previously modified by
> the current transaction" is squishier than it might look, and we'd do
> well to clarify it before we consider exporting anything.

You're right.

I am also wondering if we can assume that for all interesting purposes
enough context is available to determine if the row was just inserted or
updated or whether we need to make a test for HEAP_UPDATED available in
some form.

> I think there
> are three ways you might define such a function:
>
> 1. xmin is exactly equal to current (sub)transaction's XID.
>
> 2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction of it.
>
> 3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction, or the XID of any open parent transaction or
> subcommitted subtransaction thereof.
>
> If I've got my head screwed on straight, test #2 gives you the semantics
> that "the previous row update cannot commit unless the action you are
> about to take (with the current XID) commits".  Test #3 gives you the
> semantics "the action you are about to take cannot commit unless the
> previous row update does".  And test #1 doesn't have much to recommend
> it except simplicity; while it might appear to have the semantics "the
> previous row update will commit if and only if the action you are about
> to take commits", it's actually narrower than that, because the same
> could be said for already-subcommitted subtransactions.
>
> In a cache invalidation context you probably want test #2, but
> TransactionIdIsCurrentTransactionId() presently performs test #3.

I agree that 1) isn't all that interesting but after that it really gets
a bit harder...

2) would be useful to do stuff like avoiding to queue any invalidations
if the table is newly truncated, but that seems like a fragile thing to
do from userspace. A table rewrite seems to be something fundamentally
different from a truncation here because that would need more detail
that txid_is_current(xid) would give you.
Maybe something like 'pg_relation_is_new()' would be interesting, but
that seems like a different thing, and I am not sure the use-case for
that is clear enough.
From a userlevel perspective 3) seems to be enough if all you want to
test whether the relation is new, but again, you couldn't test for that
sensibly because there's no access to HEAP_UPDATED.

For which cases do you think 2) is interesting wrt. cache invalidations?


I think 3) might be more interesting for the (for me) common case where
you just want to avoid queuing duplicate invalidations. Because all
youre interested in that case is: "Can I be sure that if I commit
another invalidation has already been queued for this tuple.". And 3)
seems to gives you that.

> (We discussed this point in connection with commit 7b90469b, and I've
> been intending since then to take a closer look at all the callers of
> TransactionIdIsCurrentTransactionId to see if these semantics are in
> fact what they need.  We might have some bugs associated with confusion
> on this.)

Its an easy error to make I think, so I wouldn't be too surprised
somebody made it before me. On the other hand, it seems to me that to be
dangerous you need a part of the system thats not obeying transactional
semantics like the indexes which still contain enum oids that aren't in
the catalogs anymore. So I hope there aren't too many places where
people could have made that mistake.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How to check whether the row was modified by this transaction before?

From
Vlad Arkhipov
Date:
On 12/07/2012 02:53 AM, Tom Lane wrote:
> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
>> In a BEFORE UPDATE trigger I need to know whether the row was previously
>> modified by this transaction. Is it safe to use xmin and txid_current()
>> for this purpose (xmin is 32-bit txid type but txid_current() returns
>> 64-bit bigint).
>>     IF OLD.xmin = txid_current() THEN
> Comparing to txid_current() mod 2^32 would probably work, but note this
> will not think that subtransactions or parent transactions are "this
> transaction", so any use of savepoints or plpgsql exception blocks is
> likely to cause headaches.  Why do you think you need to know this?
>
>             regards, tom lane
>
The use case is quite simple. I'm trying to rewrite our internal system 
versioning extension (SQL feature T180) in more abstract way. Any 
temporal versioned table uses its associated history table to store 
updated and deleted data rows. For this purpose the extension adds AFTER 
UPDATE/DELETE triggers to the table that insert OLD row in the history 
table for updated and deleted rows. But if there are multiple changes to 
a row in the same transaction the trigger should generate a history row 
only for the first change.

On 12/07/2012 06:26 AM, Tom Lane wrote:
> It strikes me that the notion of "this row was previously modified by
> the current transaction" is squishier than it might look, and we'd do
> well to clarify it before we consider exporting anything.  I think there
> are three ways you might define such a function:
>
> 1. xmin is exactly equal to current (sub)transaction's XID.
>
> 2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction of it.
>
> 3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction, or the XID of any open parent transaction or
> subcommitted subtransaction thereof.
If I understand you correctly, what I'm looking for is described by the 
3rd case and I may use TransactionIdIsCurrentTransactionId() for this 
purpose?