Thread: invalid tid errors in latest 7.3.4 stable.

invalid tid errors in latest 7.3.4 stable.

From
Wade Klaver
Date:
Hello folks,

Stumbled across an odd problem while cleaning data out of a database.  I am 
getting these "invalid tid" errors.  I tried the upgrade from 7.3.2 to 7.3.4.  
I tried a dumpall/initdb/restore... nadda.  Nothing really usefull is coming 
from the logs either, even though logging is cranked up.  If anyone can 
suggest a method to track down the cause of the following dialog with the db, 
I would greatly appreciate it.  If you need any more info, please just ask.
Thank you in advance.-Wade

                              version                               
---------------------------------------------------------------------PostgreSQL 7.3.4 on i386-unknown-freebsd4.6,
compiledby GCC 2.95.4
 
(-STABLE cvs from today)

dropsites=# begin;
BEGIN
dropsites=# delete from te_users where id = 954;
WARNING:  Error occurred while executing PL/pgSQL function c_delete_categories
WARNING:  line 14 at SQL statement
ERROR:  heap_mark4update: (am)invalid tid
dropsites=# rollback;  
ROLLBACK
                                       Table "public.te_users"     Column       |            Type             |
            
 
Modifiers                      
-------------------+-----------------------------+-----------------------------------------------------id
| integer                     | not null default 
 
nextval('"te_users_id_seq"'::text)username          | text                        | not nullpassword          | text
                   | reseller          | integer                     | not null default 0directory         | text
                | contact           | integer                     | creation_date     | timestamp with time zone    |
defaultnow()active            | boolean                     | not null default 'f'domain            | integer
         | not null default 0has_domain        | boolean                     | not null default 'f'tutorial_type     |
integer                    | default -1tutorial_step     | integer                     | default -1license_agreement |
boolean                    | default 'f'use_header        | integer                     | default 0promo             |
boolean                    | not null default 'f'last_billed       | timestamp without time zone | default now()
 
Indexes: primary_fk primary key btree (username, "domain"),        te_users_id_key unique btree (id),
te_users_username_lower_idxbtree (lower(username))
 

dropsites=# \d c_categories                             Table "public.c_categories"  Column    |  Type   |
           Modifiers                           
 
-------------+---------+--------------------------------------------------------------id          | integer | not null
default
 
nextval('public.c_categories_id_seq'::text)category    | integer | not null default 0userid      | integer | not
nullform       | integer | not nullname        | text    | description | text    | lft         | integer | rgt
|integer | level       | integer | parentid    | integer | 
 
Indexes: c_categories_id_key unique btree (id)
Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES te_users(id) ON 
UPDATE NO ACTION ON DELETE CASCADE,                        $2 FOREIGN KEY (form) REFERENCES c_forms(id) ON 
UPDATE NO ACTION ON DELETE CASCADE,                        c_categories_fk FOREIGN KEY (parentid) REFERENCES 
c_categories(id) ON UPDATE NO ACTION ON DELETE SET DEFAULT,                        c_categories_cat_fk FOREIGN KEY
(category)REFERENCES 
 
c_categories(id) ON UPDATE NO ACTION ON DELETE NO ACTION



--- Source of c_delete_categories ---
CREATE OR REPLACE FUNCTION c_delete_categories() returns TRIGGER AS '
 begin   IF c_category_mutex() THEN     -- delete entry     DELETE FROM c_categories WHERE ID = old.id;          IF
(old.rgt- old.lft) > 1 THEN       -- update children       UPDATE c_categories SET ParentID = old.parentid WHERE
ParentID= 
 
old.id;       UPDATE c_categories SET lft = lft - 1, rgt = rgt - 1, level = level - 
1 WHERE lf
t > old.lft AND lft < old.rgt;     END IF;
     -- remove extra space     UPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;     UPDATE c_categories SET
rgt= rgt - 2 WHERE rgt > old.rgt;     PERFORM c_category_clear_mutex();     return NULL;   else     return old;   END
IF;end;
 
' language 'plpgsql';

--- source of c_category_mutex ---
CREATE OR REPLACE FUNCTION c_category_mutex() returns BOOL AS ' DECLARE   mutex_count integer; BEGIN 
   SELECT INTO mutex_count COUNT(*) FROM pg_class c, pg_attribute a     WHERE a.attrelid = c.oid       AND c.relname =
''___c_category_mutex___''      AND a.attname = ''___c_category_mutex___''       AND pg_catalog.pg_table_is_visible (
c.oid);     IF mutex_count > 0 THEN     RETURN ''f'';   ELSE     CREATE TEMP TABLE ___c_category_mutex___
(___c_category_mutex___INT2);     RETURN ''t'';   END IF; END;' LANGUAGE 'plpgsql';
 


--- source of c_category_clear_mutex ---
CREATE OR REPLACE FUNCTION c_category_clear_mutex() returns BOOL AS ' DECLARE   mutex_count         INT4; BEGIN
   SELECT INTO mutex_count COUNT(*) FROM pg_class c, pg_attribute a     WHERE a.attrelid = c.oid       AND c.relname =
''___c_category_mutex___''      AND a.attname = ''___c_category_mutex___''       AND pg_catalog.pg_table_is_visible (
c.oid);     IF mutex_count > 0 THEN     DROP TABLE ___c_category_mutex___;     RETURN ''t'';   ELSE     RETURN ''f'';
ENDIF;
 
 END;' LANGUAGE 'plpgsql';


-- 
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com

/"\   ASCII Ribbon Campaign  .
\ / - NO HTML/RTF in e-mail  .X  - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------



Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Wade Klaver <archeron@wavefire.com> writes:
> Stumbled across an odd problem while cleaning data out of a database.  I am 
> getting these "invalid tid" errors.  I tried the upgrade from 7.3.2 to
> 7.3.4.

Hm.  We fixed something with a similar symptom as of 7.3.3:
http://archives.postgresql.org/pgsql-hackers/2003-03/msg01099.php
If you are still seeing it in 7.3.4 then maybe there's another related
problem.  Could you work up a self-contained test case?
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Wade Klaver
Date:
Hello Tom, In trying to come up with a test scenario, I loaded this db into a 7.4 db 
and got a similar message.  It shows up as follows:

dropsites=> begin;
BEGIN
dropsites=> delete from te_users where reseller = 21;
ERROR:  attempted to mark4update invisible tuple
CONTEXT:  PL/pgSQL function "c_delete_categories" line 14 at SQL statement
dropsites=>

Is this the same message using the new error reporting framework?-Wade

On September 23, 2003 09:44 pm, Tom Lane wrote:
> Wade Klaver <archeron@wavefire.com> writes:
> > Stumbled across an odd problem while cleaning data out of a database.  I
> > am getting these "invalid tid" errors.  I tried the upgrade from 7.3.2 to
> > 7.3.4.
>
> Hm.  We fixed something with a similar symptom as of 7.3.3:
> http://archives.postgresql.org/pgsql-hackers/2003-03/msg01099.php
> If you are still seeing it in 7.3.4 then maybe there's another related
> problem.  Could you work up a self-contained test case?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

-- 
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com

/"\   ASCII Ribbon Campaign  .
\ / - NO HTML/RTF in e-mail  .X  - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------



Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Wade Klaver <archeron@wavefire.com> writes:
>   In trying to come up with a test scenario, I loaded this db into a 7.4 db 
> and got a similar message.  It shows up as follows:

> ERROR:  attempted to mark4update invisible tuple
> CONTEXT:  PL/pgSQL function "c_delete_categories" line 14 at SQL statement

> Is this the same message using the new error reporting framework?

Yes, I believe so.
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Wade Klaver <archeron@wavefire.com> writes:
> OK, I set you up a login on arch.wavefire.com

Okay, what I find is this sequence of events:

1. delete from te_users where id = 954;

2. The ON DELETE CASCADE RI constraint propagates this to a delete of  some row(s) in c_categories.

3. That fires the c_delete_categories BEFORE DELETE trigger.

4. That does several things includingUPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;

5. This update command suffers a Halloween problem, namely trying to  update rows it's already updated.

Why does it do that, you ask?  Because ReferentialIntegritySnapshotOverride
is true, since we are inside the ON DELETE CASCADE RI trigger and
haven't yet returned from any trigger.  So instead of using the correct
snapshot for the UPDATE command, tqual.c mistakenly uses SnapshotNow
rules.  We have successfully executed a select or two inside the trigger
function already, so CurrentCommandId is greater than the command ID
associated with the UPDATE command, making the updated rows visible.
Oops.

I think this is proof of something I've felt since day one, namely that
a global ReferentialIntegritySnapshotOverride flag is an unusable hack.
How can we get rid of it?  Why did we need it in the first place?

(I suspect the proper answer for "how can we get rid of it" will be to
extend the Executor API so that the RI functions can tell the executor
to use SnapshotNow as es_snapshot, instead of a standard query snapshot.
But I'm wondering why we have to do this at all.)
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Wed, 24 Sep 2003, Tom Lane wrote:

> Wade Klaver <archeron@wavefire.com> writes:
> > OK, I set you up a login on arch.wavefire.com
>
> Okay, what I find is this sequence of events:
>
> 1. delete from te_users where id = 954;
>
> 2. The ON DELETE CASCADE RI constraint propagates this to a delete of
>    some row(s) in c_categories.
>
> 3. That fires the c_delete_categories BEFORE DELETE trigger.
>
> 4. That does several things including
>     UPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;
>
> 5. This update command suffers a Halloween problem, namely trying to
>    update rows it's already updated.
>
> Why does it do that, you ask?  Because ReferentialIntegritySnapshotOverride
> is true, since we are inside the ON DELETE CASCADE RI trigger and
> haven't yet returned from any trigger.  So instead of using the correct
> snapshot for the UPDATE command, tqual.c mistakenly uses SnapshotNow
> rules.  We have successfully executed a select or two inside the trigger
> function already, so CurrentCommandId is greater than the command ID
> associated with the UPDATE command, making the updated rows visible.
> Oops.
>
> I think this is proof of something I've felt since day one, namely that
> a global ReferentialIntegritySnapshotOverride flag is an unusable hack.
> How can we get rid of it?  Why did we need it in the first place?
>
> (I suspect the proper answer for "how can we get rid of it" will be to
> extend the Executor API so that the RI functions can tell the executor
> to use SnapshotNow as es_snapshot, instead of a standard query snapshot.
> But I'm wondering why we have to do this at all.)

I think if you have something like:
create table test1 (id int primary key, otherid int references test1);
insert into test1 values (4,4);

T1: begin;
T1: set transaction isolation level serializable;
T1: select * from test1;
T2: begin;
T2: insert into test1 values (5,4);
T2: end;
T1: delete from test1 where id=4;-- I think the standard snapshot rules would mean that the row 5,4 would   be hidden
fromthe select in the trigger, which means that the delete   would be allowed, where it should fail since that'd leave
anorphaned   child row.
 

Or at least I've commented out the updates to
ReferentialIntegritySnapshotOverride in my local ri_triggers.c and see
behavior consistent with that (that the delete succeeds and the child
row is orphaned).


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Wed, 24 Sep 2003, Tom Lane wrote:
>> But I'm wondering why we have to do this at all.)

> I think if you have something like:
> create table test1 (id int primary key, otherid int references test1);
> insert into test1 values (4,4);

> T1: begin;
> T1: set transaction isolation level serializable;
> T1: select * from test1;
> T2: begin;
> T2: insert into test1 values (5,4);
> T2: end;
> T1: delete from test1 where id=4;
>  -- I think the standard snapshot rules would mean that the row 5,4 would
>     be hidden from the select in the trigger, which means that the delete
>     would be allowed, where it should fail since that'd leave an orphaned
>     child row.

Ah, I see.  And the reason there's no race condition with SnapshotNow is
that T2 took a SELECT FOR UPDATE lock on the id=4 row, so the DELETE
couldn't succeed until T2 commits.

Okay, I'll work out some extension of the APIs to let us propagate the
snapshot request down through SPI and into the Executor, rather than
using a global variable for it.  (Unless someone has a better idea...)
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
I said:
> Okay, I'll work out some extension of the APIs to let us propagate the
> snapshot request down through SPI and into the Executor, rather than
> using a global variable for it.  (Unless someone has a better idea...)

I've committed the attached patch into CVS HEAD.  I am now wondering
whether to back-patch it to the 7.3 branch or not.  It's a bit larger
than I would have liked, and really needs more testing before being
shoved into a stable branch.

The simplest test case I was able to generate for Wade's bug is this:

-----------
create table t1 (f1 int primary key);
create table t2 (f1 int references t1 on delete cascade);
create table t3 (f1 int);

create or replace function t2del() returns trigger as '
begin
  update t3 set f1 = f1 + 1;
  return old;
end' language plpgsql;

create trigger t2del before delete on t2 for each row
execute procedure t2del();

create or replace function t3upd() returns trigger as '
begin
  perform count(*) from t3;
  return new;
end' language plpgsql;

create trigger t3upd before update on t3 for each row
execute procedure t3upd();

insert into t1 values(1);
insert into t2 values(1);
insert into t3 values(1);

delete from t1;
-----------

Until this commit, CVS HEAD generated
    ERROR:  attempted to mark4update invisible tuple
    CONTEXT:  PL/pgSQL function "t2del" line 2 at SQL statement
7.3 branch generates a different spelling of the same error:
    WARNING:  Error occurred while executing PL/pgSQL function t2del
    WARNING:  line 2 at SQL statement
    ERROR:  heap_mark4update: (am)invalid tid

AFAICT you need a minimum of two levels of triggers invoked by an RI
trigger to make this happen, so it may be a corner case best left
unfixed in the 7.3 branch.

Opinions anyone?

            regards, tom lane

*** src/backend/commands/explain.c.orig    Mon Aug 11 16:46:46 2003
--- src/backend/commands/explain.c    Thu Sep 25 12:51:27 2003
***************
*** 207,213 ****
      gettimeofday(&starttime, NULL);

      /* call ExecutorStart to prepare the plan for execution */
!     ExecutorStart(queryDesc, !stmt->analyze);

      /* Execute the plan for statistics if asked for */
      if (stmt->analyze)
--- 207,213 ----
      gettimeofday(&starttime, NULL);

      /* call ExecutorStart to prepare the plan for execution */
!     ExecutorStart(queryDesc, false, !stmt->analyze);

      /* Execute the plan for statistics if asked for */
      if (stmt->analyze)
*** src/backend/commands/trigger.c.orig    Thu Sep 25 10:22:57 2003
--- src/backend/commands/trigger.c    Thu Sep 25 12:51:27 2003
***************
*** 1863,1874 ****
          heap_freetuple(rettuple);

      /*
-      * Might have been a referential integrity constraint trigger. Reset
-      * the snapshot overriding flag.
-      */
-     ReferentialIntegritySnapshotOverride = false;
-
-     /*
       * Release buffers
       */
      if (ItemPointerIsValid(&(event->dte_oldctid)))
--- 1863,1868 ----
*** src/backend/executor/execMain.c.orig    Thu Sep 25 10:22:59 2003
--- src/backend/executor/execMain.c    Thu Sep 25 14:00:34 2003
***************
*** 104,109 ****
--- 104,112 ----
   * field of the QueryDesc is filled in to describe the tuples that will be
   * returned, and the internal fields (estate and planstate) are set up.
   *
+  * If useSnapshotNow is true, run the query with SnapshotNow time qual rules
+  * instead of the normal use of QuerySnapshot.
+  *
   * If explainOnly is true, we are not actually intending to run the plan,
   * only to set up for EXPLAIN; so skip unwanted side-effects.
   *
***************
*** 112,118 ****
   * ----------------------------------------------------------------
   */
  void
! ExecutorStart(QueryDesc *queryDesc, bool explainOnly)
  {
      EState       *estate;
      MemoryContext oldcontext;
--- 115,121 ----
   * ----------------------------------------------------------------
   */
  void
! ExecutorStart(QueryDesc *queryDesc, bool useSnapshotNow, bool explainOnly)
  {
      EState       *estate;
      MemoryContext oldcontext;
***************
*** 154,160 ****
       * the life of this query, even if it outlives the current command and
       * current snapshot.
       */
!     estate->es_snapshot = CopyQuerySnapshot();

      /*
       * Initialize the plan state tree
--- 157,172 ----
       * the life of this query, even if it outlives the current command and
       * current snapshot.
       */
!     if (useSnapshotNow)
!     {
!         estate->es_snapshot = SnapshotNow;
!         estate->es_snapshot_cid = GetCurrentCommandId();
!     }
!     else
!     {
!         estate->es_snapshot = CopyQuerySnapshot();
!         estate->es_snapshot_cid = estate->es_snapshot->curcid;
!     }

      /*
       * Initialize the plan state tree
***************
*** 1106,1112 ****

                      tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
                      test = heap_mark4update(erm->relation, &tuple, &buffer,
!                                             estate->es_snapshot->curcid);
                      ReleaseBuffer(buffer);
                      switch (test)
                      {
--- 1118,1124 ----

                      tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
                      test = heap_mark4update(erm->relation, &tuple, &buffer,
!                                             estate->es_snapshot_cid);
                      ReleaseBuffer(buffer);
                      switch (test)
                      {
***************
*** 1266,1272 ****
      if (estate->es_into_relation_descriptor != NULL)
      {
          heap_insert(estate->es_into_relation_descriptor, tuple,
!                     estate->es_snapshot->curcid);
          IncrAppended();
      }

--- 1278,1284 ----
      if (estate->es_into_relation_descriptor != NULL)
      {
          heap_insert(estate->es_into_relation_descriptor, tuple,
!                     estate->es_snapshot_cid);
          IncrAppended();
      }

***************
*** 1342,1348 ****
       * insert the tuple
       */
      newId = heap_insert(resultRelationDesc, tuple,
!                         estate->es_snapshot->curcid);

      IncrAppended();
      (estate->es_processed)++;
--- 1354,1360 ----
       * insert the tuple
       */
      newId = heap_insert(resultRelationDesc, tuple,
!                         estate->es_snapshot_cid);

      IncrAppended();
      (estate->es_processed)++;
***************
*** 1394,1400 ****
          bool        dodelete;

          dodelete = ExecBRDeleteTriggers(estate, resultRelInfo, tupleid,
!                                         estate->es_snapshot->curcid);

          if (!dodelete)            /* "do nothing" */
              return;
--- 1406,1412 ----
          bool        dodelete;

          dodelete = ExecBRDeleteTriggers(estate, resultRelInfo, tupleid,
!                                         estate->es_snapshot_cid);

          if (!dodelete)            /* "do nothing" */
              return;
***************
*** 1406,1412 ****
  ldelete:;
      result = heap_delete(resultRelationDesc, tupleid,
                           &ctid,
!                          estate->es_snapshot->curcid,
                           true /* wait for commit */);
      switch (result)
      {
--- 1418,1424 ----
  ldelete:;
      result = heap_delete(resultRelationDesc, tupleid,
                           &ctid,
!                          estate->es_snapshot_cid,
                           true /* wait for commit */);
      switch (result)
      {
***************
*** 1505,1511 ****

          newtuple = ExecBRUpdateTriggers(estate, resultRelInfo,
                                          tupleid, tuple,
!                                         estate->es_snapshot->curcid);

          if (newtuple == NULL)    /* "do nothing" */
              return;
--- 1517,1523 ----

          newtuple = ExecBRUpdateTriggers(estate, resultRelInfo,
                                          tupleid, tuple,
!                                         estate->es_snapshot_cid);

          if (newtuple == NULL)    /* "do nothing" */
              return;
***************
*** 1541,1547 ****
       */
      result = heap_update(resultRelationDesc, tupleid, tuple,
                           &ctid,
!                          estate->es_snapshot->curcid,
                           true /* wait for commit */);
      switch (result)
      {
--- 1553,1559 ----
       */
      result = heap_update(resultRelationDesc, tupleid, tuple,
                           &ctid,
!                          estate->es_snapshot_cid,
                           true /* wait for commit */);
      switch (result)
      {
***************
*** 2027,2032 ****
--- 2039,2045 ----
       */
      epqstate->es_direction = ForwardScanDirection;
      epqstate->es_snapshot = estate->es_snapshot;
+     epqstate->es_snapshot_cid = estate->es_snapshot_cid;
      epqstate->es_range_table = estate->es_range_table;
      epqstate->es_result_relations = estate->es_result_relations;
      epqstate->es_num_result_relations = estate->es_num_result_relations;
*** src/backend/executor/execUtils.c.orig    Wed Sep 24 14:54:01 2003
--- src/backend/executor/execUtils.c    Thu Sep 25 14:00:35 2003
***************
*** 178,183 ****
--- 178,184 ----
       */
      estate->es_direction = ForwardScanDirection;
      estate->es_snapshot = SnapshotNow;
+     estate->es_snapshot_cid = FirstCommandId;
      estate->es_range_table = NIL;

      estate->es_result_relations = NULL;
*** src/backend/executor/functions.c.orig    Thu Sep 25 10:22:59 2003
--- src/backend/executor/functions.c    Thu Sep 25 12:51:10 2003
***************
*** 291,297 ****

      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
!         ExecutorStart(es->qd, false);

      es->status = F_EXEC_RUN;
  }
--- 291,297 ----

      /* Utility commands don't need Executor. */
      if (es->qd->operation != CMD_UTILITY)
!         ExecutorStart(es->qd, false, false);

      es->status = F_EXEC_RUN;
  }
*** src/backend/executor/nodeSubplan.c.orig    Thu Sep 25 10:22:59 2003
--- src/backend/executor/nodeSubplan.c    Thu Sep 25 14:00:35 2003
***************
*** 709,714 ****
--- 709,715 ----
      sp_estate->es_tupleTable =
          ExecCreateTupleTable(ExecCountSlotsNode(subplan->plan) + 10);
      sp_estate->es_snapshot = estate->es_snapshot;
+     sp_estate->es_snapshot_cid = estate->es_snapshot_cid;
      sp_estate->es_instrument = estate->es_instrument;

      /*
*** src/backend/executor/nodeSubqueryscan.c.orig    Sun Aug  3 23:00:35 2003
--- src/backend/executor/nodeSubqueryscan.c    Thu Sep 25 14:00:35 2003
***************
*** 177,182 ****
--- 177,183 ----
      sp_estate->es_tupleTable =
          ExecCreateTupleTable(ExecCountSlotsNode(node->subplan) + 10);
      sp_estate->es_snapshot = estate->es_snapshot;
+     sp_estate->es_snapshot_cid = estate->es_snapshot_cid;
      sp_estate->es_instrument = estate->es_instrument;

      /*
*** src/backend/executor/spi.c.orig    Tue Sep 23 11:11:33 2003
--- src/backend/executor/spi.c    Thu Sep 25 12:51:11 2003
***************
*** 32,41 ****
  static int    _SPI_curid = -1;

  static int    _SPI_execute(const char *src, int tcount, _SPI_plan *plan);
! static int    _SPI_pquery(QueryDesc *queryDesc, bool runit, int tcount);

  static int _SPI_execute_plan(_SPI_plan *plan,
!                   Datum *Values, const char *Nulls, int tcount);

  static void _SPI_cursor_operation(Portal portal, bool forward, int count,
                        DestReceiver *dest);
--- 32,43 ----
  static int    _SPI_curid = -1;

  static int    _SPI_execute(const char *src, int tcount, _SPI_plan *plan);
! static int    _SPI_pquery(QueryDesc *queryDesc, bool runit,
!                         bool useSnapshotNow, int tcount);

  static int _SPI_execute_plan(_SPI_plan *plan,
!                              Datum *Values, const char *Nulls,
!                              bool useSnapshotNow, int tcount);

  static void _SPI_cursor_operation(Portal portal, bool forward, int count,
                        DestReceiver *dest);
***************
*** 236,242 ****
      if (res < 0)
          return res;

!     res = _SPI_execute_plan((_SPI_plan *) plan, Values, Nulls, tcount);

      _SPI_end_call(true);
      return res;
--- 238,270 ----
      if (res < 0)
          return res;

!     res = _SPI_execute_plan((_SPI_plan *) plan, Values, Nulls, false, tcount);
!
!     _SPI_end_call(true);
!     return res;
! }
!
! /*
!  * SPI_execp_now -- identical to SPI_execp, except that we use SnapshotNow
!  * instead of the normal QuerySnapshot.  This is currently not documented
!  * in spi.sgml because it is only intended for use by RI triggers.
!  */
! int
! SPI_execp_now(void *plan, Datum *Values, const char *Nulls, int tcount)
! {
!     int            res;
!
!     if (plan == NULL || tcount < 0)
!         return SPI_ERROR_ARGUMENT;
!
!     if (((_SPI_plan *) plan)->nargs > 0 && Values == NULL)
!         return SPI_ERROR_PARAM;
!
!     res = _SPI_begin_call(true);
!     if (res < 0)
!         return res;
!
!     res = _SPI_execute_plan((_SPI_plan *) plan, Values, Nulls, true, tcount);

      _SPI_end_call(true);
      return res;
***************
*** 1068,1074 ****
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          NULL, false);
!                 res = _SPI_pquery(qdesc, true,
                                    queryTree->canSetTag ? tcount : 0);
                  if (res < 0)
                      return res;
--- 1096,1102 ----
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          NULL, false);
!                 res = _SPI_pquery(qdesc, true, false,
                                    queryTree->canSetTag ? tcount : 0);
                  if (res < 0)
                      return res;
***************
*** 1078,1084 ****
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          NULL, false);
!                 res = _SPI_pquery(qdesc, false, 0);
                  if (res < 0)
                      return res;
              }
--- 1106,1112 ----
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          NULL, false);
!                 res = _SPI_pquery(qdesc, false, false, 0);
                  if (res < 0)
                      return res;
              }
***************
*** 1096,1102 ****

  static int
  _SPI_execute_plan(_SPI_plan *plan, Datum *Values, const char *Nulls,
!                   int tcount)
  {
      List       *query_list_list = plan->qtlist;
      List       *plan_list = plan->ptlist;
--- 1124,1130 ----

  static int
  _SPI_execute_plan(_SPI_plan *plan, Datum *Values, const char *Nulls,
!                   bool useSnapshotNow, int tcount)
  {
      List       *query_list_list = plan->qtlist;
      List       *plan_list = plan->ptlist;
***************
*** 1167,1173 ****
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          paramLI, false);
!                 res = _SPI_pquery(qdesc, true,
                                    queryTree->canSetTag ? tcount : 0);
                  if (res < 0)
                      return res;
--- 1195,1201 ----
              {
                  qdesc = CreateQueryDesc(queryTree, planTree, dest,
                                          paramLI, false);
!                 res = _SPI_pquery(qdesc, true, useSnapshotNow,
                                    queryTree->canSetTag ? tcount : 0);
                  if (res < 0)
                      return res;
***************
*** 1180,1186 ****
  }

  static int
! _SPI_pquery(QueryDesc *queryDesc, bool runit, int tcount)
  {
      int            operation = queryDesc->operation;
      int            res;
--- 1208,1214 ----
  }

  static int
! _SPI_pquery(QueryDesc *queryDesc, bool runit, bool useSnapshotNow, int tcount)
  {
      int            operation = queryDesc->operation;
      int            res;
***************
*** 1217,1223 ****
          ResetUsage();
  #endif

!     ExecutorStart(queryDesc, false);

      ExecutorRun(queryDesc, ForwardScanDirection, (long) tcount);

--- 1245,1251 ----
          ResetUsage();
  #endif

!     ExecutorStart(queryDesc, useSnapshotNow, false);

      ExecutorRun(queryDesc, ForwardScanDirection, (long) tcount);

*** src/backend/tcop/pquery.c.orig    Tue Aug 12 14:23:21 2003
--- src/backend/tcop/pquery.c    Thu Sep 25 12:50:59 2003
***************
*** 131,137 ****
      /*
       * Call ExecStart to prepare the plan for execution
       */
!     ExecutorStart(queryDesc, false);

      /*
       * Run the plan to completion.
--- 131,137 ----
      /*
       * Call ExecStart to prepare the plan for execution
       */
!     ExecutorStart(queryDesc, false, false);

      /*
       * Run the plan to completion.
***************
*** 269,275 ****
              /*
               * Call ExecStart to prepare the plan for execution
               */
!             ExecutorStart(queryDesc, false);

              /*
               * This tells PortalCleanup to shut down the executor
--- 269,275 ----
              /*
               * Call ExecStart to prepare the plan for execution
               */
!             ExecutorStart(queryDesc, false, false);

              /*
               * This tells PortalCleanup to shut down the executor
*** src/backend/utils/adt/ri_triggers.c.orig    Thu Sep 25 10:23:14 2003
--- src/backend/utils/adt/ri_triggers.c    Thu Sep 25 12:50:46 2003
***************
*** 187,194 ****
      int            i;
      int            match_type;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 187,192 ----
***************
*** 627,634 ****
      int            i;
      int            match_type;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 625,630 ----
***************
*** 807,814 ****
      int            i;
      int            match_type;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 803,808 ----
***************
*** 995,1002 ****
      void       *qplan;
      int            i;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 989,994 ----
***************
*** 1159,1166 ****
      int            i;
      int            j;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 1151,1156 ----
***************
*** 1349,1356 ****
      void       *qplan;
      int            i;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 1339,1344 ----
***************
*** 1520,1527 ****
      void       *qplan;
      int            i;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 1508,1513 ----
***************
*** 1694,1701 ****
      void       *qplan;
      int            i;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 1680,1685 ----
***************
*** 1868,1875 ****
      int            match_type;
      bool        use_cached_query;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 1852,1857 ----
***************
*** 2083,2090 ****
      RI_QueryKey qkey;
      void       *qplan;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 2065,2070 ----
***************
*** 2296,2303 ****
      void       *qplan;
      int            match_type;

-     ReferentialIntegritySnapshotOverride = true;
-
      /*
       * Check that this is a valid trigger call on the right time and
       * event.
--- 2276,2281 ----
***************
*** 2936,2950 ****
       */
      limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;

!     /* Run the plan */
!     spi_result = SPI_execp(qplan, vals, nulls, limit);

      /* Restore UID */
      SetUserId(save_uid);

      /* Check result */
      if (spi_result < 0)
!         elog(ERROR, "SPI_execp failed");

      if (expect_OK >= 0 && spi_result != expect_OK)
          ri_ReportViolation(qkey, constrname ? constrname : "",
--- 2914,2932 ----
       */
      limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;

!     /*
!      * Run the plan, using SnapshotNow time qual rules so that we can see
!      * all committed tuples, even those committed after our own transaction
!      * or query started.
!      */
!     spi_result = SPI_execp_now(qplan, vals, nulls, limit);

      /* Restore UID */
      SetUserId(save_uid);

      /* Check result */
      if (spi_result < 0)
!         elog(ERROR, "SPI_execp_now returned %d", spi_result);

      if (expect_OK >= 0 && spi_result != expect_OK)
          ri_ReportViolation(qkey, constrname ? constrname : "",
*** src/backend/utils/time/tqual.c.orig    Sun Sep 21 20:47:23 2003
--- src/backend/utils/time/tqual.c    Thu Sep 25 12:50:38 2003
***************
*** 39,46 ****
  TransactionId RecentXmin = InvalidTransactionId;
  TransactionId RecentGlobalXmin = InvalidTransactionId;

- bool        ReferentialIntegritySnapshotOverride = false;
-

  /*
   * HeapTupleSatisfiesItself
--- 39,44 ----
***************
*** 665,674 ****
  bool
  HeapTupleSatisfiesSnapshot(HeapTupleHeader tuple, Snapshot snapshot)
  {
-     /* XXX this is horribly ugly: */
-     if (ReferentialIntegritySnapshotOverride)
-         return HeapTupleSatisfiesNow(tuple);
-
      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
      {
          if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 663,668 ----
***************
*** 978,986 ****
  void
  SetQuerySnapshot(void)
  {
-     /* Initialize snapshot overriding to false */
-     ReferentialIntegritySnapshotOverride = false;
-
      /* 1st call in xaction? */
      if (SerializableSnapshot == NULL)
      {
--- 972,977 ----
*** src/include/access/valid.h.orig    Sun Aug  3 23:01:27 2003
--- src/include/access/valid.h    Thu Sep 25 12:50:32 2003
***************
*** 93,99 ****
                             relation, \
                             buffer, \
                             disk_page, \
!                            seeself, \
                             nKeys, \
                             key, \
                             res) \
--- 93,99 ----
                             relation, \
                             buffer, \
                             disk_page, \
!                            snapshot, \
                             nKeys, \
                             key, \
                             res) \
***************
*** 112,118 ****
          { \
              uint16    _infomask = (tuple)->t_data->t_infomask; \
              \
!             (res) = HeapTupleSatisfiesVisibility((tuple), (seeself)); \
              if ((tuple)->t_data->t_infomask != _infomask) \
                  SetBufferCommitInfoNeedsSave(buffer); \
          } \
--- 112,118 ----
          { \
              uint16    _infomask = (tuple)->t_data->t_infomask; \
              \
!             (res) = HeapTupleSatisfiesVisibility((tuple), (snapshot)); \
              if ((tuple)->t_data->t_infomask != _infomask) \
                  SetBufferCommitInfoNeedsSave(buffer); \
          } \
*** src/include/executor/executor.h.orig    Mon Aug 18 21:13:41 2003
--- src/include/executor/executor.h    Thu Sep 25 12:50:26 2003
***************
*** 85,91 ****
  /*
   * prototypes from functions in execMain.c
   */
! extern void ExecutorStart(QueryDesc *queryDesc, bool explainOnly);
  extern TupleTableSlot *ExecutorRun(QueryDesc *queryDesc,
              ScanDirection direction, long count);
  extern void ExecutorEnd(QueryDesc *queryDesc);
--- 85,92 ----
  /*
   * prototypes from functions in execMain.c
   */
! extern void ExecutorStart(QueryDesc *queryDesc, bool useSnapshotNow,
!                           bool explainOnly);
  extern TupleTableSlot *ExecutorRun(QueryDesc *queryDesc,
              ScanDirection direction, long count);
  extern void ExecutorEnd(QueryDesc *queryDesc);
*** src/include/executor/spi.h.orig    Sun Aug  3 23:01:33 2003
--- src/include/executor/spi.h    Thu Sep 25 12:50:26 2003
***************
*** 84,89 ****
--- 84,91 ----
  extern int    SPI_exec(const char *src, int tcount);
  extern int SPI_execp(void *plan, Datum *values, const char *Nulls,
            int tcount);
+ extern int SPI_execp_now(void *plan, Datum *values, const char *Nulls,
+           int tcount);
  extern void *SPI_prepare(const char *src, int nargs, Oid *argtypes);
  extern void *SPI_saveplan(void *plan);
  extern int    SPI_freeplan(void *plan);
*** src/include/nodes/execnodes.h.orig    Fri Aug 22 16:30:27 2003
--- src/include/nodes/execnodes.h    Thu Sep 25 13:52:34 2003
***************
*** 286,291 ****
--- 286,292 ----
      /* Basic state for all query types: */
      ScanDirection es_direction; /* current scan direction */
      Snapshot    es_snapshot;    /* time qual to use */
+     CommandId    es_snapshot_cid;    /* CommandId component of time qual */
      List       *es_range_table; /* List of RangeTableEntrys */

      /* Info about target table for insert/update/delete queries: */
*** src/include/utils/tqual.h.orig    Sun Aug  3 23:01:45 2003
--- src/include/utils/tqual.h    Thu Sep 25 12:50:12 2003
***************
*** 44,57 ****
  extern TransactionId RecentXmin;
  extern TransactionId RecentGlobalXmin;

- extern bool ReferentialIntegritySnapshotOverride;
-
- #define IsSnapshotNow(snapshot)        ((Snapshot) (snapshot) == SnapshotNow)
- #define IsSnapshotSelf(snapshot)    ((Snapshot) (snapshot) == SnapshotSelf)
- #define IsSnapshotAny(snapshot)        ((Snapshot) (snapshot) == SnapshotAny)
- #define IsSnapshotToast(snapshot)    ((Snapshot) (snapshot) == SnapshotToast)
- #define IsSnapshotDirty(snapshot)    ((Snapshot) (snapshot) == SnapshotDirty)
-

  /*
   * HeapTupleSatisfiesVisibility
--- 44,49 ----
***************
*** 62,80 ****
   *        Beware of multiple evaluations of snapshot argument.
   */
  #define HeapTupleSatisfiesVisibility(tuple, snapshot) \
! (IsSnapshotNow(snapshot) ? \
      HeapTupleSatisfiesNow((tuple)->t_data) \
  : \
!     (IsSnapshotSelf(snapshot) ? \
          HeapTupleSatisfiesItself((tuple)->t_data) \
      : \
!         (IsSnapshotAny(snapshot) ? \
              true \
          : \
!             (IsSnapshotToast(snapshot) ? \
                  HeapTupleSatisfiesToast((tuple)->t_data) \
              : \
!                 (IsSnapshotDirty(snapshot) ? \
                      HeapTupleSatisfiesDirty((tuple)->t_data) \
                  : \
                      HeapTupleSatisfiesSnapshot((tuple)->t_data, snapshot) \
--- 54,72 ----
   *        Beware of multiple evaluations of snapshot argument.
   */
  #define HeapTupleSatisfiesVisibility(tuple, snapshot) \
! ((snapshot) == SnapshotNow ? \
      HeapTupleSatisfiesNow((tuple)->t_data) \
  : \
!     ((snapshot) == SnapshotSelf ? \
          HeapTupleSatisfiesItself((tuple)->t_data) \
      : \
!         ((snapshot) == SnapshotAny ? \
              true \
          : \
!             ((snapshot) == SnapshotToast ? \
                  HeapTupleSatisfiesToast((tuple)->t_data) \
              : \
!                 ((snapshot) == SnapshotDirty ? \
                      HeapTupleSatisfiesDirty((tuple)->t_data) \
                  : \
                      HeapTupleSatisfiesSnapshot((tuple)->t_data, snapshot) \

Re: invalid tid errors in latest 7.3.4 stable.

From
Wade Klaver
Date:
Hello, Naturally, as I found this problem in a production database running 7.3.4, a 
back-patch to 7.3 would be desireable.  Even if just a patch was available 
and was not commited to -STABLE, this would do.  I would also then be able to 
test such a critter on our development server for a future back-port if that 
ever comes to pass.-Wade
> I've committed the attached patch into CVS HEAD.  I am now wondering
> whether to back-patch it to the 7.3 branch or not.  It's a bit larger
> than I would have liked, and really needs more testing before being
> shoved into a stable branch.
>
> AFAICT you need a minimum of two levels of triggers invoked by an RI
> trigger to make this happen, so it may be a corner case best left
> unfixed in the 7.3 branch.
>
> Opinions anyone?
>
>             regards, tom lane

-- 
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com

/"\   ASCII Ribbon Campaign  .
\ / - NO HTML/RTF in e-mail  .X  - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------



Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
>> Okay, I'll work out some extension of the APIs to let us propagate the
>> snapshot request down through SPI and into the Executor, rather than
>> using a global variable for it.  (Unless someone has a better idea...)

Just when you thought it was safe to go back in the water ...

Chris Kratz sent me the attached example, which fails in 7.3 and (still)
fails in CVS HEAD too.

It appears that the failure mode goes like this: "DELETE FROM activity"
cascades via ON DELETE CASCADE to a delete in qry_column_list.  The RI
trigger's delete query fires the RULE, and so must execute "UPDATE
report_objects".  The compilation of report_objects' plpgsql trigger
advances the CommandCounter, creating the potential for Halloween
problems when SnapshotNow is used to fetch values.  In particular the
UPDATE sees its own output rows as valid source rows.

As far as the "DELETE FROM qry_column_list" goes, I think the solution
is that fetching rows can't use pure SnapshotNow after all.  What we
need is to create a fresh QuerySnapshot that shows all transactions
committed-to-date as committed, and saves the current CommandCounter as
the criterion for locally created rows.  Unlike SnapshotNow, this would
mean that transactions committed just after we take the new snapshot
would not be seen as committed.  This should be okay AFAICS --- once we
reach the RI triggers, all transactions we need to worry about should be
committed.  (If not, surely there's a race condition anyway.)  Also note
that an RI query would *not* see the effects of actions it indirectly
triggers.  This should be okay, because if they do anything that
requires RI validation, they should cause additional RI trigger firings
to be queued for attention later.

But Chris' example raises a host of other questions in my mind.  Should
we apply this forcibly-updated QuerySnapshot to actions that are
indirectly triggered by RI queries?  In CVS tip, SnapshotNow rules are
in fact used for the UPDATE that's generated by the RULE, because it's
part of the generated plan for the DELETE.  But any queries executed
inside triggers fired as a result of all this would use the pre-existing
QuerySnapshot, and hence could see a worldview completely inconsistent
with the rows they are being fired for :-(.  It's worse in 7.3, because
the first trigger exit would revert ReferentialIntegritySnapshotOverride
to false, meaning you wouldn't even be using the same snapshot rules
throughout the UPDATE/DELETE :-( :-(

I am inclined to think now that the right solution is for the RI
triggers to update the global QuerySnapshot to current time when they
start, and then revert it to what it had been before exiting.  (And that
code had better be in the RI triggers themselves, *not* in the generic
trigger-calling code.)  This would ensure that actions taken indirectly
as a result of RI behavior would see a consistent worldview.

The main argument I can see against this is that it would be a really
big wart on the behavior of SERIALIZABLE transactions.  Instead of
saying "in a SERIALIZABLE transaction, you only see the effects of
transactions committed before your transaction started", we'd have to
add a footnote "except in actions taken as a result of RI-generated
queries", which sure complicates matters from a logical point of view.
(In READ COMMITTED mode, on the other hand, it's no big deal; we are
effectively just decreeing that a new command starts before the RI
triggers run.)

Comments?  Anyone have a better idea?

Anyway, on to Chris' example.  Load the attached script into a database
that has plpgsql already created, and then do
    DELETE FROM Activity WHERE ActivityID = 16739;
You'll get
    ERROR:  attempted to mark4update invisible tuple
(or the equivalent 7.3 message).  This is reproducible so long as you
start a fresh session each time you attempt the DELETE.  If you try the
DELETE again in the same session, it will succeed, because the trigger
function is already compiled and so no CommandCounterIncrement occurs at
the critical instant.  (It might be possible to make the behavior stable
by adding some non-SELECT query inside the trigger function to force
a CommandCounterIncrement to occur anyway.  I haven't tried though.)

            regards, tom lane



CREATE TABLE report_objects (
    id serial,
    querystring text,
    sortby text,
    order_by integer,
    include_subagency text,
    query_sql text,
    report_sql text
);

CREATE TABLE activity (
    activityid serial
);

CREATE TABLE qry_column_list (
    col_id serial NOT NULL,
    query integer NOT NULL,
    activity_id integer
);

COPY report_objects (id, querystring, sortby, order_by, include_subagency, query_sql, report_sql) FROM stdin;
1642    \N    \N    \N    \N    \N    \N
\.

COPY activity (activityid) FROM stdin;
16739
\.

COPY qry_column_list (col_id, query, activity_id) FROM stdin;
7298    1642    16739
7299    1642    \N
7300    1642    16739
7301    1642    16739
7302    1642    16739
7303    1642    16739
7304    1642    16739
7305    1642    \N
\.

ALTER TABLE ONLY report_objects
    ADD CONSTRAINT report_objects_pkey PRIMARY KEY (id);

ALTER TABLE ONLY activity
    ADD CONSTRAINT activity_pkey PRIMARY KEY (activityid);

ALTER TABLE ONLY qry_column_list
    ADD CONSTRAINT qry_column_list_pkey PRIMARY KEY (col_id);

ALTER TABLE ONLY qry_column_list
    ADD CONSTRAINT "$2" FOREIGN KEY (activity_id)
    REFERENCES activity(activityid) MATCH FULL
    ON UPDATE CASCADE ON DELETE CASCADE;

CREATE FUNCTION clear_qry_sql() RETURNS "trigger"
    AS '
        begin
              NEW.query_sql := NULL;
              NEW.report_sql := NULL;
           return NEW;
        end;'
    LANGUAGE plpgsql;

CREATE TRIGGER qry_chg_clr_sql
    BEFORE UPDATE ON report_objects
    FOR EACH ROW
    EXECUTE PROCEDURE clear_qry_sql();

CREATE RULE qry_col_del_clr_sql AS
  ON DELETE TO qry_column_list DO
  UPDATE report_objects SET query_sql = NULL::text, report_sql = NULL::text
  WHERE (report_objects.id = old.query);

Re: invalid tid errors in latest 7.3.4 stable.

From
Shridhar Daithankar
Date:
On Friday 26 September 2003 19:50, Tom Lane wrote:
> Anyway, on to Chris' example.  Load the attached script into a database
> that has plpgsql already created, and then do
>     DELETE FROM Activity WHERE ActivityID = 16739;
> You'll get
>     ERROR:  attempted to mark4update invisible tuple
> (or the equivalent 7.3 message).  This is reproducible so long as you
> start a fresh session each time you attempt the DELETE.  If you try the
> DELETE again in the same session, it will succeed, because the trigger
> function is already compiled and so no CommandCounterIncrement occurs at
> the critical instant.  (It might be possible to make the behavior stable
> by adding some non-SELECT query inside the trigger function to force
> a CommandCounterIncrement to occur anyway.  I haven't tried though.)

I didn't understand exactly the explanation but this last paragraph is bit 
interesting.

If the trigger function is precompiled, the error would not be reproducible 
and it will work correctly, right?

Can we precompile such RI triggers on postmaster startup? Could that be a 
workaround?

Just a thought..
Shridhar



Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> If the trigger function is precompiled, the error would not be reproducible 
> and it will work correctly, right?

Only because the trigger in the example doesn't issue any queries of its
own.  If it did, it would cause CommandCounterIncrement(s) anyway.

> Can we precompile such RI triggers on postmaster startup? Could that be a 
> workaround?

I've thought for some time that it's a bad idea that there is an extra
CCI done when compiling a plpgsql function, because it creates
inconsistencies of behavior.  But getting rid of it does not fix the
fundamental issues here at all, it merely means that this particular
drastically-oversimplified example wouldn't happen to fail.

(IIRC, the extra CCI is actually in spi.c, not in plpgsql, so removing
it could potentially break other code; thus I've hesitated to do it.)
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Fri, 26 Sep 2003, Tom Lane wrote:

> >> Okay, I'll work out some extension of the APIs to let us propagate the
> >> snapshot request down through SPI and into the Executor, rather than
> >> using a global variable for it.  (Unless someone has a better idea...)
>
> Just when you thought it was safe to go back in the water ...
>
> Chris Kratz sent me the attached example, which fails in 7.3 and (still)
> fails in CVS HEAD too.

> As far as the "DELETE FROM qry_column_list" goes, I think the solution
> is that fetching rows can't use pure SnapshotNow after all.  What we
> need is to create a fresh QuerySnapshot that shows all transactions
> committed-to-date as committed, and saves the current CommandCounter as
> the criterion for locally created rows.  Unlike SnapshotNow, this would
> mean that transactions committed just after we take the new snapshot
> would not be seen as committed.  This should be okay AFAICS --- once we
> reach the RI triggers, all transactions we need to worry about should be
> committed.  (If not, surely there's a race condition anyway.)  Also note

That should be true. By the time the triggers have started running,
anything that might be conflicting shouldn't be able to commit until after
the transaction the trigger is in (since we already have locks on rows
they'd need to be able to lock).

> that an RI query would *not* see the effects of actions it indirectly
> triggers.  This should be okay, because if they do anything that
> requires RI validation, they should cause additional RI trigger firings
> to be queued for attention later.

I feel vague uneasiness about this, but can't think of a counter example,
it's probably just breakfast acting up.

> But Chris' example raises a host of other questions in my mind.  Should
> we apply this forcibly-updated QuerySnapshot to actions that are
> indirectly triggered by RI queries?  In CVS tip, SnapshotNow rules are
> in fact used for the UPDATE that's generated by the RULE, because it's
> part of the generated plan for the DELETE.  But any queries executed
> inside triggers fired as a result of all this would use the pre-existing
> QuerySnapshot, and hence could see a worldview completely inconsistent
> with the rows they are being fired for :-(.  It's worse in 7.3, because
> the first trigger exit would revert ReferentialIntegritySnapshotOverride
> to false, meaning you wouldn't even be using the same snapshot rules
> throughout the UPDATE/DELETE :-( :-(
>
> I am inclined to think now that the right solution is for the RI
> triggers to update the global QuerySnapshot to current time when they
> start, and then revert it to what it had been before exiting.  (And that
> code had better be in the RI triggers themselves, *not* in the generic
> trigger-calling code.)  This would ensure that actions taken indirectly
> as a result of RI behavior would see a consistent worldview.
>
> The main argument I can see against this is that it would be a really
> big wart on the behavior of SERIALIZABLE transactions.  Instead of
> saying "in a SERIALIZABLE transaction, you only see the effects of
> transactions committed before your transaction started", we'd have to
> add a footnote "except in actions taken as a result of RI-generated
> queries", which sure complicates matters from a logical point of view.
> (In READ COMMITTED mode, on the other hand, it's no big deal; we are
> effectively just decreeing that a new command starts before the RI
> triggers run.)

Given the two suggestions above, I think I'd vote for the latter. It seems
to break the idea of serializable, but it seems like it'd be easier for
people to work with than the former where you might not be able to even
see the row you're working upon in its own triggers.

I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failure since
I think that case comes in when a transaction that started after us (and
has already committed) has done something that the constraint's search
condition would need to see we'd potentially be opening up the possibility
for phantoms. If we have to get the same rows as a previous set for
the same search condition, but the row in question was already committed
by that transaction that started after this one, we no longer have the
freedom to pretend that the transactions were in the other order.


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I think theoretically in serializable the cases where the difference
> between the snapshot from this statement and the standard snapshot for the
> transaction are noticable we probably have a serialization failure

Hmm, that is a good point.  It would be cleaner to throw a "can't
serialize" failure than have the RI triggers run under a different
snapshot.  I am not sure if we can implement that behavior easily,
though.  Can you think of a way to detect whether there's an RI conflict
against a later-started transaction?
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Fri, 26 Sep 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > I think theoretically in serializable the cases where the difference
> > between the snapshot from this statement and the standard snapshot for the
> > transaction are noticable we probably have a serialization failure
>
> Hmm, that is a good point.  It would be cleaner to throw a "can't
> serialize" failure than have the RI triggers run under a different
> snapshot.  I am not sure if we can implement that behavior easily,
> though.  Can you think of a way to detect whether there's an RI conflict
> against a later-started transaction?

Not a complete one yet. :(
I think the case of a row that matches the constraint's search condition
on either check or action but which is committed and invisible to our
snapshot (which for read committed is taken at some point after the
original row modification that this was triggered by) is an error may
cover the basic cases, but I don't feel confident that I'm not
missing some trigger/rule case.




Re: invalid tid errors in latest 7.3.4 stable.

From
Kevin Brown
Date:
Tom Lane wrote:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > I think theoretically in serializable the cases where the difference
> > between the snapshot from this statement and the standard snapshot for the
> > transaction are noticable we probably have a serialization failure
> 
> Hmm, that is a good point.  It would be cleaner to throw a "can't
> serialize" failure than have the RI triggers run under a different
> snapshot.  I am not sure if we can implement that behavior easily,
> though.  Can you think of a way to detect whether there's an RI conflict
> against a later-started transaction?

Just some thoughts on this that, of course, could be wrong.  So please
don't be too hard on me if I'm full of it.  :-)

By "a later-started transaction" I assume you mean a later-started
transaction that commits before yours does?

I don't see how RI is any different than dealing with straight SQL
in this regard.  The effect of RI is to read/write/delete rows from a
related table that you otherwise wouldn't read or modify, and that means
that the RI mechanism needs to be treated in exactly the same way that
the equivalent SELECT/UPDATE/DELETE would be.

So the question I have is: what would PG do in the case that you SELECT
the same row(s) that the RI triggers are reading implicitly?  For
instance, suppose we have two tables:
CREATE TABLE corps (id integer PRIMARY KEY, name varchar(32));CREATE TABLE widgets (id integer PRIMARY KEY, name
varchar(32),   corpid integer REFERENCES corps(id) ON DELETE CASCADE);
 

When, within a transaction, I do:
INSERT INTO widgets (id, name, corpid) VALUES (1, 'cpu', 3);

the RI mechanism will automatically check to make sure that the value
3 is in the id column of the corps table.  Put another way, it will do
an implicit "SELECT id FROM corps WHERE id = 3", right?  So suppose
that for the purposes of testing the serialization code I remove the RI
triggers and then actually do the following:
SELECT id FROM corps WHERE id = 3;INSERT INTO widgets (id, name, corpid) VALUES (1, 'cpu', 3);

If my transaction is serializable then clearly, when another transaction
does
UPDATE corps SET id = 4 WHERE id = 3;

and commits before my transaction commits, either the updating
transaction is in violation of serializability rules or the inserting
transaction is.  Serialization is maintained if either of those
transactions aborts with a serialization error.

But note that whether or not RI is involved should be entirely
irrelevant.  What matters is what rows each transacion sees and
modifies.  How the row gets looked at doesn't matter; the only thing
that matters is that the row *does* get looked at.

The important thing here is that the effect of the RI mechanism MUST be
the same as if the equivalent manual SQL statements were exected within
the same transaction.  If it's not, then the RI mechanism is broken and
needs to be fixed at that level.

But if PG exhibits exactly the same bug this thread refers to regardless
of whether a row is examined/modified via directly issued SQL or via
the RI mechanism then the problem lies not within the RI code at all,
but within the serialization code.


I just hope I'm not merely stating the obvious here...


-- 
Kevin Brown                          kevin@sysexperts.com


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Fri, 26 Sep 2003, Kevin Brown wrote:

> Tom Lane wrote:
> > Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > > I think theoretically in serializable the cases where the difference
> > > between the snapshot from this statement and the standard snapshot for the
> > > transaction are noticable we probably have a serialization failure
> >
> > Hmm, that is a good point.  It would be cleaner to throw a "can't
> > serialize" failure than have the RI triggers run under a different
> > snapshot.  I am not sure if we can implement that behavior easily,
> > though.  Can you think of a way to detect whether there's an RI conflict
> > against a later-started transaction?
>
> Just some thoughts on this that, of course, could be wrong.  So please
> don't be too hard on me if I'm full of it.  :-)
>
> By "a later-started transaction" I assume you mean a later-started
> transaction that commits before yours does?
...
> But if PG exhibits exactly the same bug this thread refers to regardless
> of whether a row is examined/modified via directly issued SQL or via
> the RI mechanism then the problem lies not within the RI code at all,
> but within the serialization code.

It's actual a different problem from the original one at this point.  If
one just switches to using whatever snapshot is in place for the
transaction, you run into the problem that our serializable isolation mode
isn't entirely serializable and therefore isn't sufficient to guarantee
that the constraint is satisfied.

The case at hand (with *'s on the ri queries) assuming pk already
has an id=1 row would be.
T1: begin;
T1: set transaction isolation level serializable;
T1 ... (something that does a select, not necessarily on either pk or fk)
T2: begin;
T2: insert into fk values (1);
T2*:select * from pk where id=1 for update;
T2: commit;
T1: delete from pk where id=1;
T1*:select * from fk where id=1 for update;
T1: commit;

If you want to treat the serial execution as T1 followed by T2.  Then
T2* would have to show no rows for pk and T2 rolls back.

If you want to treat the order as T2,T1, then T1* would have to see the
row that T2 inserted and T1 rolls back.

Right now, you won't get that, you'll get T2* showing 1 row and T1*
showing 0 rows.


Re: invalid tid errors in latest 7.3.4 stable.

From
Kevin Brown
Date:
Stephan Szabo wrote:
> The case at hand (with *'s on the ri queries) assuming pk already
> has an id=1 row would be.
> T1: begin;
> T1: set transaction isolation level serializable;
> T1 ... (something that does a select, not necessarily on either pk or fk)
> T2: begin;
> T2: insert into fk values (1);
> T2*:select * from pk where id=1 for update;
> T2: commit;
> T1: delete from pk where id=1;
> T1*:select * from fk where id=1 for update;
> T1: commit;
> 
> If you want to treat the serial execution as T1 followed by T2.  Then
> T2* would have to show no rows for pk and T2 rolls back.
> 
> If you want to treat the order as T2,T1, then T1* would have to see the
> row that T2 inserted and T1 rolls back.
> 
> Right now, you won't get that, you'll get T2* showing 1 row and T1*
> showing 0 rows.

Does it also behave this way *without* any actual foreign key
constraints in place?  In other words, if you perform the RI queries
explicitly?

If so, then the problem is with the serialization code.  Sounds like
that's pretty much what you're saying.


The problem in the scenario you described should be solved if we mark any
rows that are selected with the "for update" option (either implicitly,
as with RI triggers, or explicitly) as having been modified by the
selecting transaction, the equivalent of (in the case of T2*) "update pk
set id=id where id=1" but without firing any of the ON MODIFY triggers.
A rollback would, of course, not have any effect on the data in those
rows since there weren't any real changes.  This "fix" won't work,
of course, if the serialization code is so broken that it doesn't work
properly even in the face of updates (something I'd find hard to believe).


-- 
Kevin Brown                          kevin@sysexperts.com


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Fri, 26 Sep 2003, Kevin Brown wrote:

> Stephan Szabo wrote:
> > The case at hand (with *'s on the ri queries) assuming pk already
> > has an id=1 row would be.
> > T1: begin;
> > T1: set transaction isolation level serializable;
> > T1 ... (something that does a select, not necessarily on either pk or fk)
> > T2: begin;
> > T2: insert into fk values (1);
> > T2*:select * from pk where id=1 for update;
> > T2: commit;
> > T1: delete from pk where id=1;
> > T1*:select * from fk where id=1 for update;
> > T1: commit;
> >
> > If you want to treat the serial execution as T1 followed by T2.  Then
> > T2* would have to show no rows for pk and T2 rolls back.
> >
> > If you want to treat the order as T2,T1, then T1* would have to see the
> > row that T2 inserted and T1 rolls back.
> >
> > Right now, you won't get that, you'll get T2* showing 1 row and T1*
> > showing 0 rows.
>
> Does it also behave this way *without* any actual foreign key
> constraints in place?  In other words, if you perform the RI queries
> explicitly?

Yeah, that wasn't clear from what I'd wrote, but that was from two psql
sessions with non-ri constraint tables (excepting that I used a real
select in place of the holder).

> The problem in the scenario you described should be solved if we mark any
> rows that are selected with the "for update" option (either implicitly,
> as with RI triggers, or explicitly) as having been modified by the
> selecting transaction, the equivalent of (in the case of T2*) "update pk
> set id=id where id=1" but without firing any of the ON MODIFY triggers.
> A rollback would, of course, not have any effect on the data in those
> rows since there weren't any real changes.  This "fix" won't work,
> of course, if the serialization code is so broken that it doesn't work
> properly even in the face of updates (something I'd find hard to believe).

That fixes the case above which will fix the ri constraints for right now
(although they really have to stop using for update eventually), but
doesn't really solve the serialization problem since it still exists
AFAICS without for update. Without the for update, you still have T2*
getting 1 row and T1* getting 0 which can't happen for either ordering of
the transactions.  It gets worse if that select as a holder at the
beginning of T1 was say select * from fk where id=1 because SQL tells us
that the later select can't see a different set of rows from the earlier
one, so T2 shouldn't be allowed to commit before T1.


Re: invalid tid errors in latest 7.3.4 stable.

From
Kevin Brown
Date:
Stephan Szabo wrote:
> > The problem in the scenario you described should be solved if we mark any
> > rows that are selected with the "for update" option (either implicitly,
> > as with RI triggers, or explicitly) as having been modified by the
> > selecting transaction, the equivalent of (in the case of T2*) "update pk
> > set id=id where id=1" but without firing any of the ON MODIFY triggers.
> > A rollback would, of course, not have any effect on the data in those
> > rows since there weren't any real changes.  This "fix" won't work,
> > of course, if the serialization code is so broken that it doesn't work
> > properly even in the face of updates (something I'd find hard to believe).
> 
> That fixes the case above which will fix the ri constraints for right now
> (although they really have to stop using for update eventually), but
> doesn't really solve the serialization problem since it still exists
> AFAICS without for update. Without the for update, you still have T2*
> getting 1 row and T1* getting 0 which can't happen for either ordering of
> the transactions.  It gets worse if that select as a holder at the
> beginning of T1 was say select * from fk where id=1 because SQL tells us
> that the later select can't see a different set of rows from the earlier
> one, so T2 shouldn't be allowed to commit before T1.

That's what I was afraid of, and what I figured serialization really
meant: what you see is a snapshot of the database as it was at
transaction start time.

I can't think of any good way to implement proper serialization without
destroying a serialized transaction's read performance, because it
seems to me that the only way to properly implement serialization is to
somehow record on-disk all the rows a serializable transaction visits,
which means that a serializable transaction is going to be *much* slower
than a read-committed transaction.  You have to mark such rows because
other transactions (even read-committed transactions) have to abort
if they attempt to modify such a row, and the list of such rows can
grow far too large to record it in shared memory.  Worse, you have to
maintain a dynamic list of serializable transactions that have seen the
row and remove a transaction from the list once it commits or rolls back,
because the only time a transaction needs to care about this when changing
a row is when there's a currently-running transaction that's seen it.

We could use the MVCC mechanism to implement it: duplicate the row being
examined and assign the reader's transaction ID to the duplicate just
as if it had modified the row.  But you also have to somehow flag the
duplicate as being there as a result of a serializable read, so that
other serializable transactions that try to modify the row after the one
in question has committed won't themselves throw a serialization error
(because without the flag they'd think they were attempting to read a
row that had been modified by someone else during their lifetime).


The other situation you have to deal with is when you have two
transactions, 1 and 2, that start and commit in that order but which
have overlapping times of execution.  If transaction 1 modifies a row
after transaction 2 starts, then commits before transaction 2 reads it,
transaction 2 has to be able to detect that and throw a serialization
error.

The way around that problem is to assign a commit ID to each transaction
at commit time.  The commit ID is just the transaction ID that will be
assigned to the next transaction that runs.  It might make sense for
assignment of commit IDs to increment the transaction ID counter the
way assignment of a transaction ID does.  Anyway, if a serializable
transaction reads a row that has a commit ID greater than the reader's
transaction ID, it throws a serialization error.  It's probably sufficient
to store the commit ID along with the transaction ID of the committer
in the transaction log as well as in shared memory, so that the commit
ID can be quickly looked up from the transaction ID.



Maybe there's a better way around all this, but I certainly can't think
of one at the moment.  :-(



-- 
Kevin Brown                          kevin@sysexperts.com


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Fri, 26 Sep 2003, Tom Lane wrote:
>> Hmm, that is a good point.  It would be cleaner to throw a "can't
>> serialize" failure than have the RI triggers run under a different
>> snapshot.  I am not sure if we can implement that behavior easily,
>> though.  Can you think of a way to detect whether there's an RI conflict
>> against a later-started transaction?

> Not a complete one yet. :(

After working through the cases it doesn't seem like this would be that
hard to do.  Here's my proposal:

1. When running in a READ COMMITTED transaction, the RI triggers simply
need to do SetQuerySnapshot before acting.  This will update the snap so
that they can see all rows they need to see; they don't need to fool
with using SnapshotNow.  Any triggers or rules fired indirectly as a
result of RI actions can use this same snapshot.  No inconsistencies.

2. When running in a SERIALIZABLE transaction, our objective is to use
the transaction's serializable snapshot for triggers and rules fired by
RI actions.  Therefore the RI actions should throw "can't serialize"
errors if they would need to affect rows that are not visible under this
snapshot.  To see what's involved, break down the problem into cases:

* FK->PK actions (that is, checking a matching PK row exists; we're
trying to do a SELECT FOR UPDATE on the PK table).  Four cases:   1. PK row existed before our xact, still exists: no
problem  2. PK row existed before our xact, updated: no problem really,      but would be okay to throw serialization
error  3. PK row existed before our xact, deleted: must throw error,      but does it matter if we say "serialization
error"     or "key not found"?   4. no PK row existed before our xact: really should throw a      "key not found" error
evenif a row has since been      inserted.
 

7.3 and CVS tip in fact throw can't-serialize errors in both cases 2 and
3.  They get case 4 wrong: they will allow an FK to be inserted even
though the PK row is too young to be visible to our transaction.

I claim that all we need to do here is run the SELECT FOR UPDATE under
the transaction's serializable snapshot.  That will give the same
behavior as the existing code for cases 2 and 3, and will give what
I claim is the right behavior (throw error) for case 4.

* PK->FK actions (cascading from a PK update or delete):   1. FK row existed before our xact, still exists: no problem,
    just perform the action   2. FK row existed before our xact, updated: must throw serialization      error since we
cannotupdate row in SERIALIZABLE mode   3. FK row existed before our xact, deleted: ideally should throw
serializationerror, but would probably be okay to ignore row too   4. FK row inserted during our xact: must throw
error,but could      call it either serialization or key-exists.
 

7.3 throws can't-serialize errors in cases 2 and 3, and a key-exists
error in case 4.

We cannot implement case 4 correctly if we do the scan under our
serialization snapshot (we'll fail to see the row at all).  Seems what
we must do is scan using a current snapshot (from a SetQuerySnapshot
taken when the RI trigger is entered), and then throw serialization
error if we find any rows to update or delete that would not be visible
under the transaction's serializable snapshot.  If the rows are visible,
it's okay to operate on them with the serializable snapshot as
QuerySnapshot for rules or triggers.  Note that this implementation
means that case 3 will not throw errors, because such rows will be
ignored by the scan.  I think this is an okay tradeoff for getting the
other cases right.

The Executor API extension that we need to implement this behavior
is to pass in a second "crosscheck" snapshot against which tuples to be
updated/deleted are checked; if they don't pass the crosscheck then
throw serialization error.  This only requires minimal mods in
execMain.c.  The snapshot-creation code in tqual.c will also need an
extra entry point so that the RI triggers can create a fresh snapshot
even in SERIALIZABLE mode (SetQuerySnapshot won't do this).

Comments?
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Tue, 30 Sep 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Fri, 26 Sep 2003, Tom Lane wrote:
> >> Hmm, that is a good point.  It would be cleaner to throw a "can't
> >> serialize" failure than have the RI triggers run under a different
> >> snapshot.  I am not sure if we can implement that behavior easily,
> >> though.  Can you think of a way to detect whether there's an RI conflict
> >> against a later-started transaction?
>
> > Not a complete one yet. :(
>
> After working through the cases it doesn't seem like this would be that
> hard to do.  Here's my proposal:
>
> 1. When running in a READ COMMITTED transaction, the RI triggers simply
> need to do SetQuerySnapshot before acting.  This will update the snap so
> that they can see all rows they need to see; they don't need to fool
> with using SnapshotNow.  Any triggers or rules fired indirectly as a
> result of RI actions can use this same snapshot.  No inconsistencies.

This seems right.  The locks should keep anyone else's hands off if there
were any possibility of conflicts.

> 2. When running in a SERIALIZABLE transaction, our objective is to use
> the transaction's serializable snapshot for triggers and rules fired by
> RI actions.  Therefore the RI actions should throw "can't serialize"
> errors if they would need to affect rows that are not visible under this
> snapshot.  To see what's involved, break down the problem into cases:
>
> * FK->PK actions (that is, checking a matching PK row exists; we're
> trying to do a SELECT FOR UPDATE on the PK table).  Four cases:
>     1. PK row existed before our xact, still exists: no problem
>     2. PK row existed before our xact, updated: no problem really,
>        but would be okay to throw serialization error
>     3. PK row existed before our xact, deleted: must throw error,
>        but does it matter if we say "serialization error"
>        or "key not found"?
>     4. no PK row existed before our xact: really should throw a
>        "key not found" error even if a row has since been
>        inserted.
>
> 7.3 and CVS tip in fact throw can't-serialize errors in both cases 2 and
> 3.  They get case 4 wrong: they will allow an FK to be inserted even
> though the PK row is too young to be visible to our transaction.
>
> I claim that all we need to do here is run the SELECT FOR UPDATE under
> the transaction's serializable snapshot.  That will give the same
> behavior as the existing code for cases 2 and 3, and will give what
> I claim is the right behavior (throw error) for case 4.

An error for 2-4 seems reasonable here. I wonder how easily the behavior
for 2 and 3 will carry over when we switch away from FOR UPDATE on the
constraint checks, but we can worry about that then.

> * PK->FK actions (cascading from a PK update or delete):
>     1. FK row existed before our xact, still exists: no problem,
>        just perform the action
>     2. FK row existed before our xact, updated: must throw serialization
>        error since we cannot update row in SERIALIZABLE mode
>     3. FK row existed before our xact, deleted: ideally should throw
>        serialization error, but would probably be okay to ignore row too
>     4. FK row inserted during our xact: must throw error, but could
>        call it either serialization or key-exists.
>
> 7.3 throws can't-serialize errors in cases 2 and 3, and a key-exists
> error in case 4.
>
> We cannot implement case 4 correctly if we do the scan under our
> serialization snapshot (we'll fail to see the row at all).  Seems what
> we must do is scan using a current snapshot (from a SetQuerySnapshot
> taken when the RI trigger is entered), and then throw serialization
> error if we find any rows to update or delete that would not be visible
> under the transaction's serializable snapshot.  If the rows are visible,
> it's okay to operate on them with the serializable snapshot as
> QuerySnapshot for rules or triggers.  Note that this implementation
> means that case 3 will not throw errors, because such rows will be
> ignored by the scan.  I think this is an okay tradeoff for getting the
> other cases right.

It's probably better than the current situation anyway. I think that it'd
be revisitable if someone wanted to bring it up later.

> The Executor API extension that we need to implement this behavior
> is to pass in a second "crosscheck" snapshot against which tuples to be
> updated/deleted are checked; if they don't pass the crosscheck then
> throw serialization error.  This only requires minimal mods in
> execMain.c.  The snapshot-creation code in tqual.c will also need an
> extra entry point so that the RI triggers can create a fresh snapshot
> even in SERIALIZABLE mode (SetQuerySnapshot won't do this).

I'd have figured that this would be a bigger deal to implement cleanly,
but if it isn't, then it sounds good.  I'm a little worried because
there's been talk of beta and this going in now for fear of breaking
something worse than it already is, but if you think that it's safe
enough.


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Tue, 30 Sep 2003, Tom Lane wrote:
>> I think I can implement it and it will act as stated in my proposal.
>> Whether people like the proposed behavior is the big question in my
>> mind.

> I think it's more reasonable than the current behavior or any of
> the others we've hit along the way, and we have to pretty much choose
> now if we want to change it for 7.4.

I've committed the attached patch.  One thing I wanted to double-check
with you is that the SELECT FOR UPDATES done in the noaction cases are
being correctly handled.  I think it is correct to do them with the
current snapshot rather than the start-of-transaction snap; do you
agree?  Also, I did not propagate the crosscheck support into
heap_mark4update, meaning that these SELECT FOR UPDATEs won't complain
if they find a row that was inserted later than the start of the
serializable transaction.  I'm not totally sure if they should or not;
what do you think?
        regards, tom lane

*** src/backend/access/heap/heapam.c.orig    Thu Sep 25 10:22:54 2003
--- src/backend/access/heap/heapam.c    Wed Oct  1 16:02:27 2003
***************
*** 1207,1220 ****  * NB: do not call this directly unless you are prepared to deal with  * concurrent-update
conditions. Use simple_heap_delete instead.  *  * Normal, successful return value is HeapTupleMayBeUpdated, which  *
actuallymeans we did delete it.  Failure return codes are  * HeapTupleSelfUpdated, HeapTupleUpdated, or
HeapTupleBeingUpdated
!  * (the last only possible if wait == false).  */ int heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, CommandId cid, bool wait) {     ItemId        lp;     HeapTupleData tp;
--- 1207,1229 ----  * NB: do not call this directly unless you are prepared to deal with  * concurrent-update
conditions. Use simple_heap_delete instead.  *
 
+  *    relation - table to be modified
+  *    tid - TID of tuple to be deleted
+  *    ctid - output parameter, used only for failure case (see below)
+  *    cid - delete command ID to use in verifying tuple visibility
+  *    crosscheck - if not SnapshotAny, also check tuple against this
+  *    wait - true if should wait for any conflicting update to commit/abort
+  *  * Normal, successful return value is HeapTupleMayBeUpdated, which  * actually means we did delete it.  Failure
returncodes are  * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
 
!  * (the last only possible if wait == false).  On a failure return,
!  * *ctid is set to the ctid link of the target tuple (possibly a later
!  * version of the row).  */ int heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, CommandId cid, Snapshot crosscheck, bool wait) {     ItemId        lp;
HeapTupleDatatp;
 
***************
*** 1240,1246 ****     tp.t_tableOid = relation->rd_id;  l1:
!     result = HeapTupleSatisfiesUpdate(&tp, cid);      if (result == HeapTupleInvisible)     {
--- 1249,1255 ----     tp.t_tableOid = relation->rd_id;  l1:
!     result = HeapTupleSatisfiesUpdate(tp.t_data, cid);      if (result == HeapTupleInvisible)     {
***************
*** 1278,1283 ****
--- 1287,1300 ----         else             result = HeapTupleUpdated;     }
+ 
+     if (crosscheck != SnapshotAny && result == HeapTupleMayBeUpdated)
+     {
+         /* Perform additional check for serializable RI updates */
+         if (!HeapTupleSatisfiesSnapshot(tp.t_data, crosscheck))
+             result = HeapTupleUpdated;
+     }
+      if (result != HeapTupleMayBeUpdated)     {         Assert(result == HeapTupleSelfUpdated ||
***************
*** 1378,1384 ****      result = heap_delete(relation, tid,                          &ctid,
!                          GetCurrentCommandId(),                          true /* wait for commit */);     switch
(result)    {
 
--- 1395,1401 ----      result = heap_delete(relation, tid,                          &ctid,
!                          GetCurrentCommandId(), SnapshotAny,                          true /* wait for commit */);
switch (result)     {
 
***************
*** 1407,1420 ****  * NB: do not call this directly unless you are prepared to deal with  * concurrent-update
conditions. Use simple_heap_update instead.  *  * Normal, successful return value is HeapTupleMayBeUpdated, which  *
actuallymeans we *did* update it.  Failure return codes are  * HeapTupleSelfUpdated, HeapTupleUpdated, or
HeapTupleBeingUpdated
!  * (the last only possible if wait == false).  */ int heap_update(Relation relation, ItemPointer otid, HeapTuple
newtup,
!             ItemPointer ctid, CommandId cid, bool wait) {     ItemId        lp;     HeapTupleData oldtup;
--- 1424,1449 ----  * NB: do not call this directly unless you are prepared to deal with  * concurrent-update
conditions. Use simple_heap_update instead.  *
 
+  *    relation - table to be modified
+  *    otid - TID of old tuple to be replaced
+  *    newtup - newly constructed tuple data to store
+  *    ctid - output parameter, used only for failure case (see below)
+  *    cid - update command ID to use in verifying old tuple visibility
+  *    crosscheck - if not SnapshotAny, also check old tuple against this
+  *    wait - true if should wait for any conflicting update to commit/abort
+  *  * Normal, successful return value is HeapTupleMayBeUpdated, which  * actually means we *did* update it.  Failure
returncodes are  * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
 
!  * (the last only possible if wait == false).  On a failure return,
!  * *ctid is set to the ctid link of the old tuple (possibly a later
!  * version of the row).
!  * On success, newtup->t_self is set to the TID where the new tuple
!  * was inserted.  */ int heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
!             ItemPointer ctid, CommandId cid, Snapshot crosscheck, bool wait) {     ItemId        lp;
HeapTupleDataoldtup;
 
***************
*** 1450,1456 ****      */  l2:
!     result = HeapTupleSatisfiesUpdate(&oldtup, cid);      if (result == HeapTupleInvisible)     {
--- 1479,1485 ----      */  l2:
!     result = HeapTupleSatisfiesUpdate(oldtup.t_data, cid);      if (result == HeapTupleInvisible)     {
***************
*** 1488,1493 ****
--- 1517,1530 ----         else             result = HeapTupleUpdated;     }
+ 
+     if (crosscheck != SnapshotAny && result == HeapTupleMayBeUpdated)
+     {
+         /* Perform additional check for serializable RI updates */
+         if (!HeapTupleSatisfiesSnapshot(oldtup.t_data, crosscheck))
+             result = HeapTupleUpdated;
+     }
+      if (result != HeapTupleMayBeUpdated)     {         Assert(result == HeapTupleSelfUpdated ||
***************
*** 1718,1724 ****      result = heap_update(relation, otid, tup,                          &ctid,
!                          GetCurrentCommandId(),                          true /* wait for commit */);     switch
(result)    {
 
--- 1755,1761 ----      result = heap_update(relation, otid, tup,                          &ctid,
!                          GetCurrentCommandId(), SnapshotAny,                          true /* wait for commit */);
switch (result)     {
 
***************
*** 1767,1773 ****     tuple->t_len = ItemIdGetLength(lp);  l3:
!     result = HeapTupleSatisfiesUpdate(tuple, cid);      if (result == HeapTupleInvisible)     {
--- 1804,1810 ----     tuple->t_len = ItemIdGetLength(lp);  l3:
!     result = HeapTupleSatisfiesUpdate(tuple->t_data, cid);      if (result == HeapTupleInvisible)     {
*** src/backend/commands/async.c.orig    Mon Sep 15 19:33:39 2003
--- src/backend/commands/async.c    Wed Oct  1 15:36:02 2003
***************
*** 537,543 ****                  */                 result = heap_update(lRel, &lTuple->t_self, rTuple,
                     &ctid,
 
!                                      GetCurrentCommandId(),                                      false /* no wait for
commit*/);                 switch (result)                 {
 
--- 537,543 ----                  */                 result = heap_update(lRel, &lTuple->t_self, rTuple,
                     &ctid,
 
!                                      GetCurrentCommandId(), SnapshotAny,                                      false
/*no wait for commit */);                 switch (result)                 {
 
*** src/backend/executor/execMain.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/executor/execMain.c    Wed Oct  1 17:22:30 2003
***************
*** 104,111 ****  * field of the QueryDesc is filled in to describe the tuples that will be  * returned, and the
internalfields (estate and planstate) are set up.  *
 
!  * If useSnapshotNow is true, run the query with SnapshotNow time qual rules
!  * instead of the normal use of QuerySnapshot.  *  * If explainOnly is true, we are not actually intending to run the
plan, * only to set up for EXPLAIN; so skip unwanted side-effects.
 
--- 104,117 ----  * field of the QueryDesc is filled in to describe the tuples that will be  * returned, and the
internalfields (estate and planstate) are set up.  *
 
!  * If useCurrentSnapshot is true, run the query with the latest available
!  * snapshot, instead of the normal QuerySnapshot.  Also, if it's an update
!  * or delete query, check that the rows to be updated or deleted would be
!  * visible to the normal QuerySnapshot.  (This is a special-case behavior
!  * needed for referential integrity updates in serializable transactions.
!  * We must check all currently-committed rows, but we want to throw a
!  * can't-serialize error if any rows that would need updates would not be
!  * visible under the normal serializable snapshot.)  *  * If explainOnly is true, we are not actually intending to
runthe plan,  * only to set up for EXPLAIN; so skip unwanted side-effects.
 
***************
*** 115,121 ****  * ----------------------------------------------------------------  */ void
! ExecutorStart(QueryDesc *queryDesc, bool useSnapshotNow, bool explainOnly) {     EState       *estate;
MemoryContextoldcontext;
 
--- 121,127 ----  * ----------------------------------------------------------------  */ void
! ExecutorStart(QueryDesc *queryDesc, bool useCurrentSnapshot, bool explainOnly) {     EState       *estate;
MemoryContextoldcontext;
 
***************
*** 157,171 ****      * the life of this query, even if it outlives the current command and      * current snapshot.
 */
 
!     if (useSnapshotNow)     {
!         estate->es_snapshot = SnapshotNow;
!         estate->es_snapshot_cid = GetCurrentCommandId();     }     else     {         estate->es_snapshot =
CopyQuerySnapshot();
!         estate->es_snapshot_cid = estate->es_snapshot->curcid;     }      /*
--- 163,180 ----      * the life of this query, even if it outlives the current command and      * current snapshot.
 */
 
!     if (useCurrentSnapshot)     {
!         /* RI update/delete query --- must use an up-to-date snapshot */
!         estate->es_snapshot = CopyCurrentSnapshot();
!         /* crosscheck updates/deletes against transaction snapshot */
!         estate->es_crosscheck_snapshot = CopyQuerySnapshot();     }     else     {
+         /* normal query --- use query snapshot, no crosscheck */         estate->es_snapshot = CopyQuerySnapshot();
!         estate->es_crosscheck_snapshot = SnapshotAny;     }      /*
***************
*** 1118,1124 ****                      tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
test= heap_mark4update(erm->relation, &tuple, &buffer,
 
!                                             estate->es_snapshot_cid);                     ReleaseBuffer(buffer);
              switch (test)                     {
 
--- 1127,1133 ----                      tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
test= heap_mark4update(erm->relation, &tuple, &buffer,
 
!                                             estate->es_snapshot->curcid);                     ReleaseBuffer(buffer);
                  switch (test)                     {
 
***************
*** 1278,1284 ****     if (estate->es_into_relation_descriptor != NULL)     {
heap_insert(estate->es_into_relation_descriptor,tuple,
 
!                     estate->es_snapshot_cid);         IncrAppended();     } 
--- 1287,1293 ----     if (estate->es_into_relation_descriptor != NULL)     {
heap_insert(estate->es_into_relation_descriptor,tuple,
 
!                     estate->es_snapshot->curcid);         IncrAppended();     } 
***************
*** 1354,1360 ****      * insert the tuple      */     newId = heap_insert(resultRelationDesc, tuple,
!                         estate->es_snapshot_cid);      IncrAppended();     (estate->es_processed)++;
--- 1363,1369 ----      * insert the tuple      */     newId = heap_insert(resultRelationDesc, tuple,
!                         estate->es_snapshot->curcid);      IncrAppended();     (estate->es_processed)++;
***************
*** 1406,1412 ****         bool        dodelete;          dodelete = ExecBRDeleteTriggers(estate, resultRelInfo,
tupleid,
!                                         estate->es_snapshot_cid);          if (!dodelete)            /* "do nothing"
*/            return;
 
--- 1415,1421 ----         bool        dodelete;          dodelete = ExecBRDeleteTriggers(estate, resultRelInfo,
tupleid,
!                                         estate->es_snapshot->curcid);          if (!dodelete)            /* "do
nothing"*/             return;
 
***************
*** 1418,1424 **** ldelete:;     result = heap_delete(resultRelationDesc, tupleid,                          &ctid,
!                          estate->es_snapshot_cid,                          true /* wait for commit */);     switch
(result)    {
 
--- 1427,1434 ---- ldelete:;     result = heap_delete(resultRelationDesc, tupleid,                          &ctid,
!                          estate->es_snapshot->curcid,
!                          estate->es_crosscheck_snapshot,                          true /* wait for commit */);
switch(result)     {
 
***************
*** 1517,1523 ****          newtuple = ExecBRUpdateTriggers(estate, resultRelInfo,
  tupleid, tuple,
 
!                                         estate->es_snapshot_cid);          if (newtuple == NULL)    /* "do nothing"
*/            return;
 
--- 1527,1533 ----          newtuple = ExecBRUpdateTriggers(estate, resultRelInfo,
  tupleid, tuple,
 
!                                         estate->es_snapshot->curcid);          if (newtuple == NULL)    /* "do
nothing"*/             return;
 
***************
*** 1553,1559 ****      */     result = heap_update(resultRelationDesc, tupleid, tuple,
&ctid,
!                          estate->es_snapshot_cid,                          true /* wait for commit */);     switch
(result)    {
 
--- 1563,1570 ----      */     result = heap_update(resultRelationDesc, tupleid, tuple,
&ctid,
!                          estate->es_snapshot->curcid,
!                          estate->es_crosscheck_snapshot,                          true /* wait for commit */);
switch(result)     {
 
***************
*** 2039,2045 ****      */     epqstate->es_direction = ForwardScanDirection;     epqstate->es_snapshot =
estate->es_snapshot;
!     epqstate->es_snapshot_cid = estate->es_snapshot_cid;     epqstate->es_range_table = estate->es_range_table;
epqstate->es_result_relations= estate->es_result_relations;     epqstate->es_num_result_relations =
estate->es_num_result_relations;
--- 2050,2056 ----      */     epqstate->es_direction = ForwardScanDirection;     epqstate->es_snapshot =
estate->es_snapshot;
!     epqstate->es_crosscheck_snapshot = estate->es_crosscheck_snapshot;     epqstate->es_range_table =
estate->es_range_table;    epqstate->es_result_relations = estate->es_result_relations;
epqstate->es_num_result_relations= estate->es_num_result_relations;
 
*** src/backend/executor/execUtils.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/executor/execUtils.c    Wed Oct  1 15:35:55 2003
***************
*** 178,184 ****      */     estate->es_direction = ForwardScanDirection;     estate->es_snapshot = SnapshotNow;
!     estate->es_snapshot_cid = FirstCommandId;     estate->es_range_table = NIL;      estate->es_result_relations =
NULL;
--- 178,184 ----      */     estate->es_direction = ForwardScanDirection;     estate->es_snapshot = SnapshotNow;
!     estate->es_crosscheck_snapshot = SnapshotAny; /* means no crosscheck */     estate->es_range_table = NIL;
estate->es_result_relations= NULL;
 
*** src/backend/executor/nodeSubplan.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/executor/nodeSubplan.c    Wed Oct  1 17:22:30 2003
***************
*** 709,715 ****     sp_estate->es_tupleTable =         ExecCreateTupleTable(ExecCountSlotsNode(subplan->plan) + 10);
 sp_estate->es_snapshot = estate->es_snapshot;
 
!     sp_estate->es_snapshot_cid = estate->es_snapshot_cid;     sp_estate->es_instrument = estate->es_instrument;
/*
--- 709,715 ----     sp_estate->es_tupleTable =         ExecCreateTupleTable(ExecCountSlotsNode(subplan->plan) + 10);
 sp_estate->es_snapshot = estate->es_snapshot;
 
!     sp_estate->es_crosscheck_snapshot = estate->es_crosscheck_snapshot;     sp_estate->es_instrument =
estate->es_instrument;     /*
 
*** src/backend/executor/nodeSubqueryscan.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/executor/nodeSubqueryscan.c    Wed Oct  1 17:22:31 2003
***************
*** 177,183 ****     sp_estate->es_tupleTable =         ExecCreateTupleTable(ExecCountSlotsNode(node->subplan) + 10);
 sp_estate->es_snapshot = estate->es_snapshot;
 
!     sp_estate->es_snapshot_cid = estate->es_snapshot_cid;     sp_estate->es_instrument = estate->es_instrument;
/*
--- 177,183 ----     sp_estate->es_tupleTable =         ExecCreateTupleTable(ExecCountSlotsNode(node->subplan) + 10);
 sp_estate->es_snapshot = estate->es_snapshot;
 
!     sp_estate->es_crosscheck_snapshot = estate->es_crosscheck_snapshot;     sp_estate->es_instrument =
estate->es_instrument;     /*
 
*** src/backend/executor/spi.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/executor/spi.c    Wed Oct  1 16:33:44 2003
***************
*** 33,43 ****  static int    _SPI_execute(const char *src, int tcount, _SPI_plan *plan); static int
_SPI_pquery(QueryDesc*queryDesc, bool runit,
 
!                         bool useSnapshotNow, int tcount);  static int _SPI_execute_plan(_SPI_plan *plan,
               Datum *Values, const char *Nulls,
 
!                              bool useSnapshotNow, int tcount);  static void _SPI_cursor_operation(Portal portal, bool
forward,int count,                       DestReceiver *dest);
 
--- 33,43 ----  static int    _SPI_execute(const char *src, int tcount, _SPI_plan *plan); static int
_SPI_pquery(QueryDesc*queryDesc, bool runit,
 
!                         bool useCurrentSnapshot, int tcount);  static int _SPI_execute_plan(_SPI_plan *plan,
                   Datum *Values, const char *Nulls,
 
!                              bool useCurrentSnapshot, int tcount);  static void _SPI_cursor_operation(Portal portal,
boolforward, int count,                       DestReceiver *dest);
 
***************
*** 245,256 **** }  /*
!  * SPI_execp_now -- identical to SPI_execp, except that we use SnapshotNow
!  * instead of the normal QuerySnapshot.  This is currently not documented
!  * in spi.sgml because it is only intended for use by RI triggers.  */ int
! SPI_execp_now(void *plan, Datum *Values, const char *Nulls, int tcount) {     int            res; 
--- 245,258 ---- }  /*
!  * SPI_execp_current -- identical to SPI_execp, except that we expose the
!  * Executor option to use a current snapshot instead of the normal
!  * QuerySnapshot.  This is currently not documented in spi.sgml because
!  * it is only intended for use by RI triggers.  */ int
! SPI_execp_current(void *plan, Datum *Values, const char *Nulls,
!                   bool useCurrentSnapshot, int tcount) {     int            res; 
***************
*** 264,270 ****     if (res < 0)         return res; 
!     res = _SPI_execute_plan((_SPI_plan *) plan, Values, Nulls, true, tcount);      _SPI_end_call(true);     return
res;
--- 266,273 ----     if (res < 0)         return res; 
!     res = _SPI_execute_plan((_SPI_plan *) plan, Values, Nulls,
!                             useCurrentSnapshot, tcount);      _SPI_end_call(true);     return res;
***************
*** 1124,1130 ****  static int _SPI_execute_plan(_SPI_plan *plan, Datum *Values, const char *Nulls,
!                   bool useSnapshotNow, int tcount) {     List       *query_list_list = plan->qtlist;     List
*plan_list= plan->ptlist;
 
--- 1127,1133 ----  static int _SPI_execute_plan(_SPI_plan *plan, Datum *Values, const char *Nulls,
!                   bool useCurrentSnapshot, int tcount) {     List       *query_list_list = plan->qtlist;     List
 *plan_list = plan->ptlist;
 
***************
*** 1195,1201 ****             {                 qdesc = CreateQueryDesc(queryTree, planTree, dest,
                   paramLI, false);
 
!                 res = _SPI_pquery(qdesc, true, useSnapshotNow,                                   queryTree->canSetTag
?tcount : 0);                 if (res < 0)                     return res;
 
--- 1198,1204 ----             {                 qdesc = CreateQueryDesc(queryTree, planTree, dest,
                   paramLI, false);
 
!                 res = _SPI_pquery(qdesc, true, useCurrentSnapshot,
queryTree->canSetTag? tcount : 0);                 if (res < 0)                     return res;
 
***************
*** 1208,1214 **** }  static int
! _SPI_pquery(QueryDesc *queryDesc, bool runit, bool useSnapshotNow, int tcount) {     int            operation =
queryDesc->operation;    int            res;
 
--- 1211,1218 ---- }  static int
! _SPI_pquery(QueryDesc *queryDesc, bool runit,
!             bool useCurrentSnapshot, int tcount) {     int            operation = queryDesc->operation;     int
    res;
 
***************
*** 1245,1251 ****         ResetUsage(); #endif 
!     ExecutorStart(queryDesc, useSnapshotNow, false);      ExecutorRun(queryDesc, ForwardScanDirection, (long)
tcount);
 
--- 1249,1255 ----         ResetUsage(); #endif 
!     ExecutorStart(queryDesc, useCurrentSnapshot, false);      ExecutorRun(queryDesc, ForwardScanDirection, (long)
tcount);
 
*** src/backend/storage/ipc/sinval.c.orig    Wed Sep 24 14:54:01 2003
--- src/backend/storage/ipc/sinval.c    Wed Oct  1 16:02:42 2003
***************
*** 330,339 ****      * lastBackend would be sufficient.  But it seems better to do the      * malloc while not holding
thelock, so we can't look at lastBackend.      *
 
!      * if (snapshot->xip != NULL) no need to free and reallocate xip;
!      *
!      * We can reuse the old xip array, because MaxBackends does not change at
!      * runtime.      */     if (snapshot->xip == NULL)     {
--- 330,339 ----      * lastBackend would be sufficient.  But it seems better to do the      * malloc while not holding
thelock, so we can't look at lastBackend.      *
 
!      * This does open a possibility for avoiding repeated malloc/free:
!      * since MaxBackends does not change at runtime, we can simply reuse
!      * the previous xip array if any.  (This relies on the fact that all
!      * calls pass static SnapshotData structs.)      */     if (snapshot->xip == NULL)     {
*** src/backend/utils/adt/ri_triggers.c.orig    Mon Sep 29 13:44:31 2003
--- src/backend/utils/adt/ri_triggers.c    Wed Oct  1 16:49:40 2003
***************
*** 157,162 ****
--- 157,163 ---- static bool ri_PerformCheck(RI_QueryKey *qkey, void *qplan,                 Relation fk_rel, Relation
pk_rel,                HeapTuple old_tuple, HeapTuple new_tuple,
 
+                 bool detectNewRows,                 int expect_OK, const char *constrname); static void
ri_ExtractValues(RI_QueryKey*qkey, int key_idx,                  Relation rel, HeapTuple tuple,
 
***************
*** 276,281 ****
--- 277,283 ----         ri_PerformCheck(&qkey, qplan,                         fk_rel, pk_rel,
NULL,NULL,
 
+                         false,                         SPI_OK_SELECT,
tgargs[RI_CONSTRAINT_NAME_ARGNO]);
 
***************
*** 433,438 ****
--- 435,441 ----     ri_PerformCheck(&qkey, qplan,                     fk_rel, pk_rel,                     NULL,
new_row,
+                     false,                     SPI_OK_SELECT,                     tgargs[RI_CONSTRAINT_NAME_ARGNO]);

***************
*** 594,599 ****
--- 597,603 ----     result = ri_PerformCheck(&qkey, qplan,                              fk_rel, pk_rel,
             old_row, NULL,
 
+                              true,        /* treat like update */                              SPI_OK_SELECT, NULL);
   if (SPI_finish() != SPI_OK_FINISH)
 
***************
*** 752,757 ****
--- 756,762 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
          old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_SELECT,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 942,947 ****
--- 947,953 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
          old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_SELECT,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 1102,1107 ****
--- 1108,1114 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_DELETE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 1285,1290 ****
--- 1292,1298 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, new_row,
 
+                             true, /* must detect new rows */                             SPI_OK_UPDATE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 1453,1458 ****
--- 1461,1467 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_SELECT,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 1633,1638 ****
--- 1642,1648 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_SELECT,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 1802,1807 ****
--- 1812,1818 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_UPDATE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 2019,2024 ****
--- 2030,2036 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_UPDATE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 2188,2193 ****
--- 2200,2206 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_UPDATE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 2392,2397 ****
--- 2405,2411 ----             ri_PerformCheck(&qkey, qplan,                             fk_rel, pk_rel,
            old_row, NULL,
 
+                             true, /* must detect new rows */                             SPI_OK_UPDATE,
             tgargs[RI_CONSTRAINT_NAME_ARGNO]); 
 
***************
*** 2788,2798 ****
--- 2802,2814 ---- ri_PerformCheck(RI_QueryKey *qkey, void *qplan,                 Relation fk_rel, Relation pk_rel,
            HeapTuple old_tuple, HeapTuple new_tuple,
 
+                 bool detectNewRows,                 int expect_OK, const char *constrname) {     Relation
query_rel,                source_rel;     int            key_idx;
 
+     bool        useCurrentSnapshot;     int            limit;     int            spi_result;     AclId
save_uid;
***************
*** 2842,2850 ****                          vals, nulls);     } 
!     /* Switch to proper UID to perform check as */
!     save_uid = GetUserId();
!     SetUserId(RelationGetForm(query_rel)->relowner);      /*      * If this is a select query (e.g., for a 'no
action'or 'restrict'
 
--- 2858,2882 ----                          vals, nulls);     } 
!     /*
!      * In READ COMMITTED mode, we just need to make sure the regular query
!      * snapshot is up-to-date, and we will see all rows that could be
!      * interesting.  In SERIALIZABLE mode, we can't update the regular query
!      * snapshot.  If the caller passes detectNewRows == false then it's okay
!      * to do the query with the transaction snapshot; otherwise we tell the
!      * executor to force a current snapshot (and error out if it finds any
!      * rows under current snapshot that wouldn't be visible per the
!      * transaction snapshot).
!      */
!     if (XactIsoLevel == XACT_SERIALIZABLE)
!     {
!         useCurrentSnapshot = detectNewRows;
!     }
!     else
!     {
!         SetQuerySnapshot();
!         useCurrentSnapshot = false;
!     }      /*      * If this is a select query (e.g., for a 'no action' or 'restrict'
***************
*** 2854,2872 ****      */     limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0; 
!     /*
!      * Run the plan, using SnapshotNow time qual rules so that we can see
!      * all committed tuples, even those committed after our own transaction
!      * or query started.
!      */
!     spi_result = SPI_execp_now(qplan, vals, nulls, limit);      /* Restore UID */     SetUserId(save_uid);      /*
Checkresult */     if (spi_result < 0)
 
!         elog(ERROR, "SPI_execp_now returned %d", spi_result);      if (expect_OK >= 0 && spi_result != expect_OK)
   ri_ReportViolation(qkey, constrname ? constrname : "",
 
--- 2886,2905 ----      */     limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0; 
!     /* Switch to proper UID to perform check as */
!     save_uid = GetUserId();
!     SetUserId(RelationGetForm(query_rel)->relowner);
! 
!     /* Finally we can run the query. */
!     spi_result = SPI_execp_current(qplan, vals, nulls,
!                                    useCurrentSnapshot, limit);      /* Restore UID */     SetUserId(save_uid);
/*Check result */     if (spi_result < 0)
 
!         elog(ERROR, "SPI_execp_current returned %d", spi_result);      if (expect_OK >= 0 && spi_result != expect_OK)
       ri_ReportViolation(qkey, constrname ? constrname : "",
 
*** src/backend/utils/time/tqual.c.orig    Thu Sep 25 14:58:35 2003
--- src/backend/utils/time/tqual.c    Wed Oct  1 16:02:52 2003
***************
*** 26,39 **** #include "storage/sinval.h" #include "utils/tqual.h" 
! 
! static SnapshotData SnapshotDirtyData;
! Snapshot    SnapshotDirty = &SnapshotDirtyData;
!  static SnapshotData QuerySnapshotData; static SnapshotData SerializableSnapshotData; Snapshot    QuerySnapshot =
NULL;Snapshot    SerializableSnapshot = NULL;  /* These are updated by GetSnapshotData: */ TransactionId RecentXmin =
InvalidTransactionId;
--- 26,44 ---- #include "storage/sinval.h" #include "utils/tqual.h" 
! /*
!  * The SnapshotData structs are static to simplify memory allocation
!  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
!  */ static SnapshotData QuerySnapshotData; static SnapshotData SerializableSnapshotData;
+ static SnapshotData CurrentSnapshotData;
+ static SnapshotData SnapshotDirtyData;
+ 
+ /* Externally visible pointers to valid snapshots: */ Snapshot    QuerySnapshot = NULL; Snapshot
SerializableSnapshot= NULL;
 
+ Snapshot    SnapshotDirty = &SnapshotDirtyData;  /* These are updated by GetSnapshotData: */ TransactionId RecentXmin
=InvalidTransactionId;
 
***************
*** 387,396 ****  *    CurrentCommandId.  */ int
! HeapTupleSatisfiesUpdate(HeapTuple htuple, CommandId curcid) {
-     HeapTupleHeader tuple = htuple->t_data;
-      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))     {         if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 392,399 ----  *    CurrentCommandId.  */ int
! HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid) {     if (!(tuple->t_infomask &
HEAP_XMIN_COMMITTED))    {         if (tuple->t_infomask & HEAP_XMIN_INVALID)
 
***************
*** 1024,1029 ****
--- 1027,1068 ---- }  /*
+  * CopyCurrentSnapshot
+  *        Make a snapshot that is up-to-date as of the current instant,
+  *        and return a copy.
+  *
+  * The copy is palloc'd in the current memory context.
+  */
+ Snapshot
+ CopyCurrentSnapshot(void)
+ {
+     Snapshot    currentSnapshot;
+     Snapshot    snapshot;
+ 
+     if (QuerySnapshot == NULL)    /* should not be first call in xact */
+         elog(ERROR, "no snapshot has been set");
+ 
+     /* Update the static struct */
+     currentSnapshot = GetSnapshotData(&CurrentSnapshotData, false);
+     currentSnapshot->curcid = GetCurrentCommandId();
+ 
+     /* Make a copy */
+     snapshot = (Snapshot) palloc(sizeof(SnapshotData));
+     memcpy(snapshot, currentSnapshot, sizeof(SnapshotData));
+     if (snapshot->xcnt > 0)
+     {
+         snapshot->xip = (TransactionId *)
+             palloc(snapshot->xcnt * sizeof(TransactionId));
+         memcpy(snapshot->xip, currentSnapshot->xip,
+                snapshot->xcnt * sizeof(TransactionId));
+     }
+     else
+         snapshot->xip = NULL;
+ 
+     return snapshot;
+ }
+ 
+ /*  * FreeXactSnapshot  *        Free snapshot(s) at end of transaction.  */
***************
*** 1031,1038 **** FreeXactSnapshot(void) {     /*
!      * We do not free(QuerySnapshot->xip); or
!      * free(SerializableSnapshot->xip); they will be reused soon      */     QuerySnapshot = NULL;
SerializableSnapshot= NULL;
 
--- 1070,1078 ---- FreeXactSnapshot(void) {     /*
!      * We do not free the xip arrays for the snapshot structs;
!      * they will be reused soon.  So this is now just a state
!      * change to prevent outside callers from accessing the snapshots.      */     QuerySnapshot = NULL;
SerializableSnapshot= NULL;
 
*** src/include/access/heapam.h.orig    Mon Sep 15 19:33:43 2003
--- src/include/access/heapam.h    Wed Oct  1 15:35:11 2003
***************
*** 155,163 ****  extern Oid    heap_insert(Relation relation, HeapTuple tup, CommandId cid); extern int
heap_delete(Relationrelation, ItemPointer tid, ItemPointer ctid,
 
!             CommandId cid, bool wait); extern int heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
!             ItemPointer ctid, CommandId cid, bool wait); extern int heap_mark4update(Relation relation, HeapTuple
tup,                 Buffer *userbuf, CommandId cid); 
 
--- 155,163 ----  extern Oid    heap_insert(Relation relation, HeapTuple tup, CommandId cid); extern int
heap_delete(Relationrelation, ItemPointer tid, ItemPointer ctid,
 
!             CommandId cid, Snapshot crosscheck, bool wait); extern int heap_update(Relation relation, ItemPointer
otid,HeapTuple tup,
 
!             ItemPointer ctid, CommandId cid, Snapshot crosscheck, bool wait); extern int heap_mark4update(Relation
relation,HeapTuple tup,                  Buffer *userbuf, CommandId cid); 
 
*** src/include/executor/executor.h.orig    Thu Sep 25 14:58:35 2003
--- src/include/executor/executor.h    Wed Oct  1 15:35:05 2003
***************
*** 85,91 **** /*  * prototypes from functions in execMain.c  */
! extern void ExecutorStart(QueryDesc *queryDesc, bool useSnapshotNow,                           bool explainOnly);
externTupleTableSlot *ExecutorRun(QueryDesc *queryDesc,             ScanDirection direction, long count);
 
--- 85,91 ---- /*  * prototypes from functions in execMain.c  */
! extern void ExecutorStart(QueryDesc *queryDesc, bool useCurrentSnapshot,                           bool explainOnly);
externTupleTableSlot *ExecutorRun(QueryDesc *queryDesc,             ScanDirection direction, long count);
 
*** src/include/executor/spi.h.orig    Thu Sep 25 14:58:36 2003
--- src/include/executor/spi.h    Wed Oct  1 16:33:39 2003
***************
*** 84,91 **** extern int    SPI_exec(const char *src, int tcount); extern int SPI_execp(void *plan, Datum *values,
constchar *Nulls,           int tcount);
 
! extern int SPI_execp_now(void *plan, Datum *values, const char *Nulls,
!           int tcount); extern void *SPI_prepare(const char *src, int nargs, Oid *argtypes); extern void
*SPI_saveplan(void*plan); extern int    SPI_freeplan(void *plan);
 
--- 84,91 ---- extern int    SPI_exec(const char *src, int tcount); extern int SPI_execp(void *plan, Datum *values,
constchar *Nulls,           int tcount);
 
! extern int SPI_execp_current(void *plan, Datum *values, const char *Nulls,
!                              bool useCurrentSnapshot, int tcount); extern void *SPI_prepare(const char *src, int
nargs,Oid *argtypes); extern void *SPI_saveplan(void *plan); extern int    SPI_freeplan(void *plan);
 
*** src/include/nodes/execnodes.h.orig    Thu Sep 25 14:58:36 2003
--- src/include/nodes/execnodes.h    Wed Oct  1 15:35:00 2003
***************
*** 286,292 ****     /* Basic state for all query types: */     ScanDirection es_direction; /* current scan direction
*/    Snapshot    es_snapshot;    /* time qual to use */
 
!     CommandId    es_snapshot_cid;    /* CommandId component of time qual */     List       *es_range_table; /* List
ofRangeTableEntrys */      /* Info about target table for insert/update/delete queries: */
 
--- 286,292 ----     /* Basic state for all query types: */     ScanDirection es_direction; /* current scan direction
*/    Snapshot    es_snapshot;    /* time qual to use */
 
!     Snapshot    es_crosscheck_snapshot;    /* crosscheck time qual for RI */     List       *es_range_table; /* List
ofRangeTableEntrys */      /* Info about target table for insert/update/delete queries: */
 
*** src/include/utils/tqual.h.orig    Thu Sep 25 14:58:36 2003
--- src/include/utils/tqual.h    Wed Oct  1 16:02:21 2003
***************
*** 100,106 **** extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple); extern bool
HeapTupleSatisfiesSnapshot(HeapTupleHeadertuple,                            Snapshot snapshot);
 
! extern int HeapTupleSatisfiesUpdate(HeapTuple tuple,                          CommandId curcid); extern HTSV_Result
HeapTupleSatisfiesVacuum(HeapTupleHeadertuple,                          TransactionId OldestXmin);
 
--- 100,106 ---- extern bool HeapTupleSatisfiesToast(HeapTupleHeader tuple); extern bool
HeapTupleSatisfiesSnapshot(HeapTupleHeadertuple,                            Snapshot snapshot);
 
! extern int HeapTupleSatisfiesUpdate(HeapTupleHeader tuple,                          CommandId curcid); extern
HTSV_ResultHeapTupleSatisfiesVacuum(HeapTupleHeader tuple,                          TransactionId OldestXmin);
 
***************
*** 108,113 ****
--- 108,114 ---- extern Snapshot GetSnapshotData(Snapshot snapshot, bool serializable); extern void
SetQuerySnapshot(void);extern Snapshot CopyQuerySnapshot(void);
 
+ extern Snapshot CopyCurrentSnapshot(void); extern void FreeXactSnapshot(void);  #endif   /* TQUAL_H */


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Wed, 1 Oct 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Tue, 30 Sep 2003, Tom Lane wrote:
> >> I think I can implement it and it will act as stated in my proposal.
> >> Whether people like the proposed behavior is the big question in my
> >> mind.
>
> > I think it's more reasonable than the current behavior or any of
> > the others we've hit along the way, and we have to pretty much choose
> > now if we want to change it for 7.4.
>
> I've committed the attached patch.  One thing I wanted to double-check
> with you is that the SELECT FOR UPDATES done in the noaction cases are
> being correctly handled.  I think it is correct to do them with the
> current snapshot rather than the start-of-transaction snap; do you
> agree?  Also, I did not propagate the crosscheck support into

I think the ones in the main functions need to be current snapshot.  I
think the one in ri_Check_Pk_Match doesn't need to be. That's there to see
if this same transaction has inserted a new row with the old value of the
updated/deleted pk row and the serializable snapshot should be fine.
Any conflicting attempts from another transaction should be waiting on our
completion due to the unique index I think.

> heap_mark4update, meaning that these SELECT FOR UPDATEs won't complain
> if they find a row that was inserted later than the start of the
> serializable transaction.  I'm not totally sure if they should or not;
> what do you think?

Well, I think that not doing so would only change the error from a
serialization error to a matching row exists error.  It might be a bit
surprising if you've just done a select yourself and seen that there were
no matching rows, but I'm not sure that it's a big deal as long as it
errors as appropriate.


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Wed, 1 Oct 2003, Tom Lane wrote:
>> I've committed the attached patch.  One thing I wanted to double-check
>> with you is that the SELECT FOR UPDATES done in the noaction cases are
>> being correctly handled.

> I think the ones in the main functions need to be current snapshot.  I
> think the one in ri_Check_Pk_Match doesn't need to be. That's there to see
> if this same transaction has inserted a new row with the old value of the
> updated/deleted pk row and the serializable snapshot should be fine.
> Any conflicting attempts from another transaction should be waiting on our
> completion due to the unique index I think.

An exact match would be waiting on our commit, but I was confused about
whether ri_Check_Pk_Match could look for partial matches or not.  If it
can, then I'd think it might encounter rows inserted by concurrent
transactions.  OTOH maybe in serializable mode we should pretend those
aren't there.  The thing that really confused me was that it's doing a
SELECT FOR UPDATE --- does it really need to do any row locking, if it's
looking only for rows that our own xact inserted?

>> heap_mark4update, meaning that these SELECT FOR UPDATEs won't complain
>> if they find a row that was inserted later than the start of the
>> serializable transaction.  I'm not totally sure if they should or not;
>> what do you think?

> Well, I think that not doing so would only change the error from a
> serialization error to a matching row exists error.

That was my thought also, and that the row-exists error would be more
informative of the two, but I'm worried that I've overlooked something.
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Wed, 1 Oct 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Wed, 1 Oct 2003, Tom Lane wrote:
> >> I've committed the attached patch.  One thing I wanted to double-check
> >> with you is that the SELECT FOR UPDATES done in the noaction cases are
> >> being correctly handled.
>
> > I think the ones in the main functions need to be current snapshot.  I
> > think the one in ri_Check_Pk_Match doesn't need to be. That's there to see
> > if this same transaction has inserted a new row with the old value of the
> > updated/deleted pk row and the serializable snapshot should be fine.
> > Any conflicting attempts from another transaction should be waiting on our
> > completion due to the unique index I think.
>
> An exact match would be waiting on our commit, but I was confused about
> whether ri_Check_Pk_Match could look for partial matches or not.  If it

Not currently, it only checks for exact matches on fully non-NULL keys.
The only case where partial matches are needed are match partial, but the
rules are actually more complicated (because the which part needs to be
partially matched is dependant on the actual matched rows).
I guess it doesn't hurt for it to be using the current + cross check
excepting a small amount of time presumably since it shouldn't change the
results.

> can, then I'd think it might encounter rows inserted by concurrent
> transactions.  OTOH maybe in serializable mode we should pretend those
> aren't there.  The thing that really confused me was that it's doing a
> SELECT FOR UPDATE --- does it really need to do any row locking, if it's
> looking only for rows that our own xact inserted?

Actually, it probably doesn't. I just wasn't bright enough to think of it
to change it from the other queries at the time.

> >> heap_mark4update, meaning that these SELECT FOR UPDATEs won't complain
> >> if they find a row that was inserted later than the start of the
> >> serializable transaction.  I'm not totally sure if they should or not;
> >> what do you think?
>
> > Well, I think that not doing so would only change the error from a
> > serialization error to a matching row exists error.
>
> That was my thought also, and that the row-exists error would be more
> informative of the two, but I'm worried that I've overlooked something.

I don't see it either, but I'm also a little worried in general.  Do we
have any sort of easy multi-session regression test like tools already
built for sort of lock step testing of things like this?  I think I'm
going to try to run as many cases as I can come up with this week on cvs
tip.


Re: invalid tid errors in latest 7.3.4 stable.

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Tue, 30 Sep 2003, Tom Lane wrote:
>> ...  Note that this implementation
>> means that case 3 will not throw errors, because such rows will be
>> ignored by the scan.  I think this is an okay tradeoff for getting the
>> other cases right.

> It's probably better than the current situation anyway. I think that it'd
> be revisitable if someone wanted to bring it up later.

It might be possible to hack things so that the scans will return
tuples that are visible according to either of two snapshots.  However
this would be considerably less localized a change than what I had in
mind, and it still doesn't really make things watertight.  (For example,
imagine that a conflicting tuple is created by transaction B and then
deleted by transaction C, all while our own serializable transaction is
running.  Arguably we should throw an error, but there's no way to find
that tuple using either our transaction-start or our transaction-end
snapshot.)

> I'd have figured that this would be a bigger deal to implement cleanly,
> but if it isn't, then it sounds good.  I'm a little worried because
> there's been talk of beta and this going in now for fear of breaking
> something worse than it already is, but if you think that it's safe
> enough.

I think I can implement it and it will act as stated in my proposal.
Whether people like the proposed behavior is the big question in my
mind.  (Hope Marc gets the mail lists back online soon ...)
        regards, tom lane


Re: invalid tid errors in latest 7.3.4 stable.

From
Stephan Szabo
Date:
On Tue, 30 Sep 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Tue, 30 Sep 2003, Tom Lane wrote:
> >> ...  Note that this implementation
> >> means that case 3 will not throw errors, because such rows will be
> >> ignored by the scan.  I think this is an okay tradeoff for getting the
> >> other cases right.
>
> > It's probably better than the current situation anyway. I think that it'd
> > be revisitable if someone wanted to bring it up later.
>
> It might be possible to hack things so that the scans will return
> tuples that are visible according to either of two snapshots.  However
> this would be considerably less localized a change than what I had in
> mind, and it still doesn't really make things watertight.  (For example,
> imagine that a conflicting tuple is created by transaction B and then
> deleted by transaction C, all while our own serializable transaction is
> running.  Arguably we should throw an error, but there's no way to find
> that tuple using either our transaction-start or our transaction-end
> snapshot.)

That's what I was thinking, but that it could be done later if someone
wanted.  But you're right that it still wouldn't be a complete solution, I
hadn't thought about that case. :(

> > I'd have figured that this would be a bigger deal to implement cleanly,
> > but if it isn't, then it sounds good.  I'm a little worried because
> > there's been talk of beta and this going in now for fear of breaking
> > something worse than it already is, but if you think that it's safe
> > enough.
>
> I think I can implement it and it will act as stated in my proposal.
> Whether people like the proposed behavior is the big question in my
> mind.

I think it's more reasonable than the current behavior or any of
the others we've hit along the way, and we have to pretty much choose
now if we want to change it for 7.4.

> (Hope Marc gets the mail lists back online soon ...)

Yeah, this always seems to happen right around beta time... :)