Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables
Date
Msg-id 47C7D824.5000707@enterprisedb.com
Whole thread Raw
In response to "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables  ("John Smith" <sodgodofall@gmail.com>)
Responses Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables  ("Gurjeet Singh" <singh.gurjeet@gmail.com>)
Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
John Smith wrote:
> Architecture: Intel Core 2 Duo
> OS: linux-2.6.20-gentoo-r8
> Filesystem: ext3
> Postgres v8.2.3 compiled with gcc 4.1.1-r3
> RAM - 2GB
> Shared buffers - 24MB
> [All other Postgres configuration parameters are default values]
>
> Problem description:
> COPY into temp table fails using a specific combination of
> create/insert on temp tables, prepare/commit in subsequent
> transactions. The "could not open relation" error occurs reliably.
>
> Steps to reproduce:
>
> Existing schema (scripts to create and populate these tables are
> uploaded to http://upload2.net/page/download/rfsLfnuVlYjEcCJ/basetables.tgz.html
> ):

I can't get that link to work. Can you please email me the files
offlist? Or upload somewhere else if they're too big for email.

> Observations:
> 1. The size of the data seems to matters. If the amount of data being
> inserted is dropped to just one or two records per table, the error
> doesn't happen.
> 2. The order of columns for the select into temp2 matters. Changing
> the order can cause the error to go away.
> 3. If the prepare/commit is replaced with a "commit;" the error goes away.
> 4. Removing "temp3" or "temp4" from the transaction causes one run of
> the above statements to succeed, but if the sequence is issued in the
> same PSQL session, the second one will fail.
> 5. Given the current dataset, the error always occurs on line 926 of
> the COPY (even if the values at line 926 are changed).
> 6. <tablespace>/<database>/<oid> typically always corresponds to that
> of temp2 on my system.

I think I see what's happening here. We have restricted two-phase commit
so that you're not supposed to be able to PREPARE TRANSACTION if the
transaction has touched any temporary tables. That's because the 2nd
phase commit can be performed from another backend, and another backend
can't mess with another backend's temporary tables.

However in this case, where you CREATE and DROP the temporary table in
the same transaction, we don't detect that, and let the PREPARE
TRANSACTION to finish. The detection relies on the lock manager, but
we're not holding any locks on the dropped relation.

I think we could in fact allow CREATE+DROP in same transaction, and
remove the table immediately at PREPARE TRANSACTION, but what happens
right now is that we store the relfilenode of the temp table to the
two-phase state file in pg_twophase, for deletion at COMMIT/ROLLBACK
PREPARED. But we don't store the fact that it's a temporary table, and
therefore we try to unlink it like a normal table, and fail to purge the
temp buffers of that table which causes problems later.

Attached is a simple patch to fix that by disallowing
CREATE+DROP+PREPARE TRANSACTION more reliably. It'd still be nice to
debug the full test case of yours to verify that that's what's
happening, though.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.39
diff -c -r1.39 twophase.c
*** src/backend/access/transam/twophase.c    1 Jan 2008 19:45:48 -0000    1.39
--- src/backend/access/transam/twophase.c    29 Feb 2008 09:58:19 -0000
***************
*** 793,798 ****
--- 793,799 ----
      TransactionId *children;
      RelFileNode *commitrels;
      RelFileNode *abortrels;
+     bool haveTempCommit, haveTempAbort;

      /* Initialize linked list */
      records.head = palloc0(sizeof(XLogRecData));
***************
*** 815,824 ****
      hdr.prepared_at = gxact->prepared_at;
      hdr.owner = gxact->owner;
      hdr.nsubxacts = xactGetCommittedChildren(&children);
!     hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels, NULL);
!     hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels, NULL);
      StrNCpy(hdr.gid, gxact->gid, GIDSIZE);

      save_state_data(&hdr, sizeof(TwoPhaseFileHeader));

      /* Add the additional info about subxacts and deletable files */
--- 816,830 ----
      hdr.prepared_at = gxact->prepared_at;
      hdr.owner = gxact->owner;
      hdr.nsubxacts = xactGetCommittedChildren(&children);
!     hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels, NULL, &haveTempCommit);
!     hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels, NULL, &haveTempAbort);
      StrNCpy(hdr.gid, gxact->gid, GIDSIZE);

+     if (haveTempCommit || haveTempAbort)
+         ereport(ERROR,
+                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                  errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
+
      save_state_data(&hdr, sizeof(TwoPhaseFileHeader));

      /* Add the additional info about subxacts and deletable files */
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.257
diff -c -r1.257 xact.c
*** src/backend/access/transam/xact.c    15 Jan 2008 18:56:59 -0000    1.257
--- src/backend/access/transam/xact.c    29 Feb 2008 09:48:52 -0000
***************
*** 802,808 ****
      TransactionId *children;

      /* Get data needed for commit record */
!     nrels = smgrGetPendingDeletes(true, &rels, &haveNonTemp);
      nchildren = xactGetCommittedChildren(&children);

      /*
--- 802,808 ----
      TransactionId *children;

      /* Get data needed for commit record */
!     nrels = smgrGetPendingDeletes(true, &rels, &haveNonTemp, NULL);
      nchildren = xactGetCommittedChildren(&children);

      /*
***************
*** 1174,1180 ****
               xid);

      /* Fetch the data we need for the abort record */
!     nrels = smgrGetPendingDeletes(false, &rels, NULL);
      nchildren = xactGetCommittedChildren(&children);

      /* XXX do we really need a critical section here? */
--- 1174,1180 ----
               xid);

      /* Fetch the data we need for the abort record */
!     nrels = smgrGetPendingDeletes(false, &rels, NULL, NULL);
      nchildren = xactGetCommittedChildren(&children);

      /* XXX do we really need a critical section here? */
Index: src/backend/storage/smgr/smgr.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/smgr/smgr.c,v
retrieving revision 1.109
diff -c -r1.109 smgr.c
*** src/backend/storage/smgr/smgr.c    1 Jan 2008 19:45:52 -0000    1.109
--- src/backend/storage/smgr/smgr.c    29 Feb 2008 09:56:37 -0000
***************
*** 678,689 ****
   *
   * If haveNonTemp isn't NULL, the bool it points to gets set to true if
   * there is any non-temp table pending to be deleted; false if not.
   *
   * Note that the list does not include anything scheduled for termination
   * by upper-level transactions.
   */
  int
! smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
  {
      int            nestLevel = GetCurrentTransactionNestLevel();
      int            nrels;
--- 678,692 ----
   *
   * If haveNonTemp isn't NULL, the bool it points to gets set to true if
   * there is any non-temp table pending to be deleted; false if not.
+  * haveTemp is similar, but gets set if there is any temp table deletions
+  * pending.
   *
   * Note that the list does not include anything scheduled for termination
   * by upper-level transactions.
   */
  int
! smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr,
!                       bool *haveNonTemp, bool *haveTemp)
  {
      int            nestLevel = GetCurrentTransactionNestLevel();
      int            nrels;
***************
*** 693,698 ****
--- 696,703 ----
      nrels = 0;
      if (haveNonTemp)
          *haveNonTemp = false;
+     if (haveTemp)
+         *haveTemp = false;
      for (pending = pendingDeletes; pending != NULL; pending = pending->next)
      {
          if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
***************
*** 711,716 ****
--- 716,723 ----
              *rptr++ = pending->relnode;
          if (haveNonTemp && !pending->isTemp)
              *haveNonTemp = true;
+         if (haveTemp && pending->isTemp)
+             *haveTemp = true;
      }
      return nrels;
  }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Batch update of indexes on data loading
Next
From: Simon Riggs
Date:
Subject: Re: CREATE TABLE, load and freezing