Thread: Error while altering an inheritance hierarchy in mid-query

Error while altering an inheritance hierarchy in mid-query

From
Bob Lunney
Date:
My database implements partitioning using inheritance and constraint exclusion, as described in the fine manual.  Each
partitionholds data for a single day, and there are well over 300 child partitions.   

The exception mentioned below notwithstanding, the technique works exceedingly well and is a real life saver when you
havehalf a billion rows and climbing.   

While a query was running on the partition for today a maintenance job was run that summarizes data in a different
partitionusing the technique described in section 5.9.3. -  "Managing Partitions" of the manual, with a twist.
Basically,it goes like this: 

1.  A select into query is run which summarizes the data from a partition into a table outside the inheritance
hierarchy,which is then indexed. 
2.  Then
  a.  a transaction is begun,
  b.  the original partition is dropped,
  c.  the new table renamed to the original partition's name,
  d.  the new table's unique index is renamed,
  e.  the appropriate check constraint is added,
  f.  select privilege is granted, and
  g.  the transaction is committed.

So far so good, the maintenance job works fine and is quite speedy.  The problem occurs when a select query is started
priorto the sequence of events above, those steps complete, and an error is thrown saying "PGRES_FATAL_ERROR - ERROR:
couldnot open relation with OID 64412".  Of course, seconds later the query runs fine, and there is no object with OID
64412in the database. 

It seems to me that removing and inserting partitions into the inheritance hierarchy may not be a consistent and/or
isolatedaction with respect to queries that may be using the inheritance hierarchy.  To be fair, this sequence of
eventsdoesn't normally happen in daily processing, but the behavior is not what I expected, nor is it mentioned in the
manual. Am I missing or misinterpreting something? 

Thanks in advance for your advice!

Bob Lunney

Re: Error while altering an inheritance hierarchy in mid-query

From
Tom Lane
Date:
Bob Lunney <bob_lunney@yahoo.com> writes:
> 1.  A select into query is run which summarizes the data from a partition into a table outside the inheritance
hierarchy,which is then indexed. 
> 2.  Then
>   a.  a transaction is begun,
>   b.  the original partition is dropped,
>   c.  the new table renamed to the original partition's name,
>   d.  the new table's unique index is renamed,
>   e.  the appropriate check constraint is added,
>   f.  select privilege is granted, and
>   g.  the transaction is committed.

I'd suggest taking an exclusive lock on the inheritance hierarchy's
parent table between 2a and 2b.  The "could not open relation with OID
nnn" error is to be expected when a table is dropped just as a query
is in the act of trying to open it, which is what could happen here if
a query on the parent table runs concurrently with the DROP.
You're also at risk that a concurrent query might see both or neither
of the old and new versions of the partition, leading to bogus answers.
Both of these things would be fixed if incoming queries are blocked
while trying to open the parent table, rather than while iterating
through the list of inheritance children for it.

            regards, tom lane

Re: Error while altering an inheritance hierarchy in mid-query

From
Bob Lunney
Date:
So I did miss something!  Thanks for the brilliant explanation and simple s=
olution, Tom.

Bob

--- On Thu, 3/18/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Subject: Re: [BUGS] Error while altering an inheritance hierarchy in mid-=
query
> To: "Bob Lunney" <bob_lunney@yahoo.com>
> Cc: pgsql-bugs@postgresql.org
> Date: Thursday, March 18, 2010, 3:26 PM
> Bob Lunney <bob_lunney@yahoo.com>
> writes:
> > 1.=A0 A select into query is run which summarizes
> the data from a partition into a table outside the
> inheritance hierarchy, which is then indexed.
> > 2.=A0 Then=20
> >=A0=A0=A0a.=A0 a transaction is begun,
> >=A0=A0=A0b.=A0 the original partition is
> dropped,=20
> >=A0=A0=A0c.=A0 the new table renamed to the
> original partition's name,=20
> >=A0=A0=A0d.=A0 the new table's unique index
> is renamed,
> >=A0=A0=A0e.=A0 the appropriate check
> constraint is added,=20
> >=A0=A0=A0f.=A0 select privilege is granted,
> and
> >=A0=A0=A0g.=A0 the transaction is
> committed.
>=20
> I'd suggest taking an exclusive lock on the inheritance
> hierarchy's
> parent table between 2a and 2b.=A0 The "could not open
> relation with OID
> nnn" error is to be expected when a table is dropped just
> as a query
> is in the act of trying to open it, which is what could
> happen here if
> a query on the parent table runs concurrently with the
> DROP.
> You're also at risk that a concurrent query might see both
> or neither
> of the old and new versions of the partition, leading to
> bogus answers.
> Both of these things would be fixed if incoming queries are
> blocked
> while trying to open the parent table, rather than while
> iterating
> through the list of inheritance children for it.
>=20
> =A0=A0=A0 =A0=A0=A0 =A0=A0=A0
> regards, tom lane
>=20