Thread: Strange error related to temporary tables

Strange error related to temporary tables

From
Csaba Nagy
Date:
Postgres version: 8.1.3

I have a code similar to this pseudo-code:

try {
  create temp table temp_report ...;
  ... do some processing using this table...
  drop table temp_report;
  commit;
} catch all errors {
  rollback;
}

This code occasionally triggered the following error (couldn't reliably
reproduce):

SQL state: 42710

ERROR: type "temp_report" already exists


The code is possible to be executed in parallel by multiple threads, on
different connections, or in sequence on the same connection. I would
expect it in both cases to work correctly... in manual tests I was able
to create in parallel temporary tables named identically in different
connections, and on the same connection after rolling back the
transaction which created the first table... so I don't know what to try
to trigger this.

Any ideas what's the problem ?

TIA,
Csaba.



Re: Strange error related to temporary tables

From
Alban Hertroys
Date:
Csaba Nagy wrote:
> The code is possible to be executed in parallel by multiple threads, on
> different connections, or in sequence on the same connection. I would
> expect it in both cases to work correctly... in manual tests I was able
> to create in parallel temporary tables named identically in different
> connections, and on the same connection after rolling back the
> transaction which created the first table... so I don't know what to try
> to trigger this.
>
> Any ideas what's the problem ?

Are you sure that you're not re-using an existing connection from a pool?

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Strange error related to temporary tables

From
Csaba Nagy
Date:
On Thu, 2006-08-31 at 17:25, Alban Hertroys wrote:
> Csaba Nagy wrote:
> > The code is possible to be executed in parallel by multiple threads, on
> > different connections, or in sequence on the same connection. I would
> > expect it in both cases to work correctly... in manual tests I was able
> > to create in parallel temporary tables named identically in different
> > connections, and on the same connection after rolling back the
> > transaction which created the first table... so I don't know what to try
> > to trigger this.
> >
> > Any ideas what's the problem ?
>
> Are you sure that you're not re-using an existing connection from a pool?

Of course I'm re-using a connection from a pool... but if you look at
the pseudo-code from the OP, the temporary table is either dropped or
the transaction which creates it is rolled back, before the connection
is returned to the pool. When another thread is reusing the connection,
the table should be dropped either way, and manual tests show that it
works both ways, so I guess it is some race condition somewhere else.

In any case, the error message is strange in itself, as if I try to
create the temporary table when it exists, the error I get in manual
trial is:

ERROR:  relation "test_temp_table" already exists
        ^^^^^^^^

compared to the error from the OP which I cite here for reference:

ERROR: type "temp_report" already exists
       ^^^^

So it was complaining about an already existing type, not relation.

Any ideas ? I'm still clueless.

Cheers,
Csaba.



Re: Strange error related to temporary tables

From
Alban Hertroys
Date:
Csaba Nagy wrote:
> In any case, the error message is strange in itself, as if I try to
> create the temporary table when it exists, the error I get in manual
> trial is:
>
> ERROR:  relation "test_temp_table" already exists
>         ^^^^^^^^
>
> compared to the error from the OP which I cite here for reference:
>
> ERROR: type "temp_report" already exists
>        ^^^^

That's strange indeed.

Some googling revealed this thread:
http://archives.postgresql.org/pgsql-general/2005-09/msg01100.php

Seems like around this time last year someone had the same problem.
Unfortunately, the cause and it's possible solutions aren't mentioned
(the OP found a workaround though). They may have never been found...

I suppose if you'd check pg_type there is a record containing 'temp_report'?

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Strange error related to temporary tables

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> In any case, the error message is strange in itself, as if I try to
> create the temporary table when it exists, the error I get in manual
> trial is:
> ERROR:  relation "test_temp_table" already exists
>         ^^^^^^^^
> compared to the error from the OP which I cite here for reference:
> ERROR: type "temp_report" already exists
>        ^^^^

This is not too surprising given the way that heap_create_with_catalog
works --- it happens to be easier to insert the pg_type row before
the pg_class row, so if you have two sessions trying to create the same
table at about the same time, that's where the unique index constraint
will kick in.  The initial check for a duplicate pg_class row doesn't
catch the conflict because the guy who's just slightly ahead won't have
committed yet when the second guy looks.

The question is how is this scenario managing to occur, given that the
tables in question are temp tables?  It seems like this must indicate
two backends trying to use the same pg_temp_NNN schema; but that should
surely be impossible --- it's driven off MyBackendId, and if that's not
unique to a live session then we've got *major* problems.

IIRC we've seen prior reports of similar issues, so I believe there is
something there, but without a test case it's gonna be hard to track down.

            regards, tom lane

Re: Strange error related to temporary tables

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
>   create temp table temp_report ...;
>   ... do some processing using this table...
>   drop table temp_report;
>   commit;

> This code occasionally triggered the following error (couldn't reliably
> reproduce):
> ERROR: type "temp_report" already exists

BTW, when this happens, does the error persist?  If it's a race
condition you'd expect not (because the guy who successfully created the
temp table would soon drop it again).  When we saw this problem before,
we were speculating that a temp table's pg_type row had somehow not
gotten dropped during table drop, which'd lead to a persistent failure.

Note that a "persistent" failure could still only manifest occasionally,
if the unwanted pg_type row were in a high-numbered pg_temp_NNN schema
that doesn't get used often.  So i guess the correct thing to do is
"select oid, xmin from pg_type where typname = 'temp_report'" and see
if there are any long-lived entries (xmin far away from the others would
be a tipoff).

            regards, tom lane

Re: Strange error related to temporary tables

From
Csaba Nagy
Date:
I executed what you suggest below, see the results below. Short
conclusion: the type is there in pg_type, the relation is not there in
pg_class. Is there anything I should look for more ?

Cheers,
Csaba.

> BTW, when this happens, does the error persist?  If it's a race
> condition you'd expect not (because the guy who successfully created the
> temp table would soon drop it again).  When we saw this problem before,
> we were speculating that a temp table's pg_type row had somehow not
> gotten dropped during table drop, which'd lead to a persistent failure.
>
> Note that a "persistent" failure could still only manifest occasionally,
> if the unwanted pg_type row were in a high-numbered pg_temp_NNN schema
> that doesn't get used often.  So i guess the correct thing to do is
> "select oid, xmin from pg_type where typname = 'temp_report'" and see
> if there are any long-lived entries (xmin far away from the others would
> be a tipoff).

test03=> select oid, xmin from pg_type where typname = 'temp_report';
   oid    |   xmin
----------+-----------
 58293995 | 220215039
(1 row)

test03=> select * from pg_type where typname = 'temp_report';
   typname   | typnamespace | typowner | typlen | typbyval | typtype |
typisdefined | typdelim | typrelid | typelem | typinput  | typoutput  |
typreceive  |   typsend   | typanalyze | typalign | typstorage |
typnotnull | typbasetype | typtypmod | typndims | typdefaultbin |
typdefault

-------------+--------------+----------+--------+----------+---------+--------------+----------+----------+---------+-----------+------------+-------------+-------------+------------+----------+------------+------------+-------------+-----------+----------+---------------+------------
 temp_report |     58006858 |    16386 |     -1 | f        | c       |
t            | ,        | 58293994 |       0 | record_in | record_out |
record_recv | record_send | -          | d        | x          |
f          |           0 |        -1 |   0 |               |
(1 row)

test03=> select * from pg_namespace where oid=58006858;
  nspname  | nspowner | nspacl
-----------+----------+--------
 pg_temp_8 |       10 |
(1 row)

test03=> select count(*) from pg_class where relname='temp_report';
 count
-------
     0
(1 row)




Re: Strange error related to temporary tables

From
Csaba Nagy
Date:
On Fri, 2006-09-01 at 11:25, Csaba Nagy wrote:
> I executed what you suggest below [snip]

Ok... it looks like the postgres version is not what I expected and
reported in the first mail, but:

test03=> select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.0 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.3
20051023 (prerelease) (Debian 4.0.2-3)

I'm not sure how that got on that box... but I wonder if the version is
relevant ?

Cheers,
Csaba.



Re: Strange error related to temporary tables

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> I executed what you suggest below, see the results below. Short
> conclusion: the type is there in pg_type, the relation is not there in
> pg_class. Is there anything I should look for more ?

> test03=> select oid, xmin from pg_type where typname = 'temp_report';
>    oid    |   xmin
> ----------+-----------
>  58293995 | 220215039
> (1 row)

Is this xmin quite a bit older than what you get for a freshly-created
temp table?  Can you find any entries in pg_depend that show the above
OID as either objid or refobjid?

Also, it's fairly likely (not certain) that the table associated with
this pg_type entry had OID one less, ie 58293994.  Can you find any
trace of that OID in pg_depend, or in pg_class for that matter?

            regards, tom lane

Re: Strange error related to temporary tables

From
Csaba Nagy
Date:
> Is this xmin quite a bit older than what you get for a freshly-created
> temp table?

I would say yes, this is a test system which is highly stressed from
time to time, but does not get continuous load.

test03=> select oid, xmin from pg_type where typname = 'temp_report';
   oid    |   xmin
----------+-----------
 58293995 | 220215039
(1 row)

test03=> create temp table test_new_temp_table (a text);
CREATE TABLE
test03=> select xmin from pg_type where typname = 'test_new_temp_table';
   xmin
-----------
 236080536
(1 row)


> Can you find any entries in pg_depend that show the above
> OID as either objid or refobjid?

No:

test03=> select * from pg_depend where objid = 58293995 or refobjid =
58293995;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid |
deptype
---------+-------+----------+------------+----------+-------------+---------
(0 rows)

> Also, it's fairly likely (not certain) that the table associated with
> this pg_type entry had OID one less, ie 58293994.  Can you find any
> trace of that OID in pg_depend,

No trace. I executed:

test03=> select count(*) from pg_depend where classid between 58293995 -
100 and 58293995 + 100;
 count
-------
     0
(1 row)

... and then with each of (objid objsubid refclassid refobjid
refobjsubid deptype) instead of classid, with the same result.

> ... or in pg_class for that matter?

test03=> select count(*) from pg_class where oid between 58293995 - 100
and 58293995 + 100;
 count
-------
     0
(1 row)


Any other place to check ?

Cheers,
Csaba.



Re: Strange error related to temporary tables

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
>> Is this xmin quite a bit older than what you get for a freshly-created
>> temp table?

> I would say yes, this is a test system which is highly stressed from
> time to time, but does not get continuous load.

OK, so it seems we have an old pg_type entry that for some reason was
not dropped when the owning table was dropped.  Weird.  As you know,
we've seen one report of this before.

There's probably not much more we can learn at this point --- given
that the entry is old, any other evidence is probably long gone.
I'd suggest just doing a manual DELETE of that pg_type row to get out
of the problem.

And yes, you ought to update that 8.1.0 installation ;-).  But I have
no idea if this problem is related to any solved bugs.

            regards, tom lane

Re: Strange error related to temporary tables

From
Csaba Nagy
Date:
> There's probably not much more we can learn at this point --- given
> that the entry is old, any other evidence is probably long gone.

I'm afraid I'll never be able to get soon enough to the evidence even if
it happens again, unless I manage to reproduce it myself, which I tried
but didn't succeed (100 threads competing for ~10 connections
creating/dropping the table 1000 times, with ~10 percent of the cases
rolling back before the drop).

> I'd suggest just doing a manual DELETE of that pg_type row to get out
> of the problem.

OK. This was not a huge problem, it's a test system, but enough ground
for the QA to scream ;-)

> And yes, you ought to update that 8.1.0 installation ;-).  But I have
> no idea if this problem is related to any solved bugs.

The QA installed the right version now.

Thanks for the assistance.

Cheers,
Csaba.