Thread: BUG #2830: Wrong results for prepared statements while clustering target table

The following bug has been logged online:

Bug reference:      2830
Logged by:
Email address:      martin.pihlak@gmail.com
PostgreSQL version: 8.2.0, older
Operating system:   Ubuntu 6.10 2.6.17-10-386, libc6 2.4-1ubuntu12
Description:        Wrong results for prepared statements while clustering
target table
Details:

Prepared SELECT/UPDATE/DELETE statements produce wrong results if executed
while target table is being clustered. Only seems to happens when CLUSTER
and statement are executed in different sessions. Tested on 8.0.6, 8.0.8,
8.1.4 and 8.2.0, reproducible with the following test case:

create table test(i integer);
-- generate large enough table, so that clustering takes more than 1 sec
insert into test select * from generate_series(1,1000000);
create index test_idx on test(i);

prepare c as select count(*) from test;

\! psql -c "cluster test_idx on test" &
-- sleep to make sure we stay behind cluster
\! sleep 1

-- should produce same values
execute c;
select count(1) from test;

Executing 'c' after CLUSTER returns 0 instead of 1000000.
"" <martin.pihlak@gmail.com> writes:
> Prepared SELECT/UPDATE/DELETE statements produce wrong results if executed
> while target table is being clustered.

The short answer is "don't CLUSTER while the table is in live use" ...

CLUSTER re-inserts all the rows in the table into a fresh table.  This
means that all the rows appear to have been inserted by the CLUSTER
transaction, and therefore that a transaction that scans the table
afterward with a snapshot taken before the CLUSTER committed will not
see those rows.

It'd be better if CLUSTER preserved the rows' MVCC state but don't hold
your breath for that; any such change is certainly not going to get
back-patched into stable releases.

The difference between EXECUTE and SELECT behavior here is just a chance
matter of exactly where the snap is taken during the parse/execute code
path --- your SELECT works because it blocks for AccessShareLock on the
table before it sets the snap.  But SELECT would fail just the same way
within a serializable transaction that had already set its snapshot.

            regards, tom lane

Re: BUG #2830: Wrong results for prepared statements while clustering target table

From
"Martin Pihlak"
Date:
> The short answer is "don't CLUSTER while the table is in live use" ...
>
This is kind of difficult on a busy database, more so if it's a 24x7
environment. And
unfortunately there aren't any good alternatives either.

> The difference between EXECUTE and SELECT behavior here is just a chance
> matter of exactly where the snap is taken during the parse/execute code
> path --- your SELECT works because it blocks for AccessShareLock on the
> table before it sets the snap.  But SELECT would fail just the same way
> within a serializable transaction that had already set its snapshot.
>

Ok, makes sense. The same reasoning probably applies to INSERT and UPDATE as
well. Still the problem remains - how to cluster a table on a busy
system without losing
data or getting wrong results. Perhaps the issue should be documented,
although a fix
would be preferrable ;)

Martin

Re: BUG #2830: Wrong results for prepared statements

From
"Kevin Grittner"
Date:
>>> On Sat, Dec 16, 2006 at  5:15 PM, in message
<8738.1166310917@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "" <martin.pihlak@gmail.com> writes:
>> Prepared SELECT/UPDATE/DELETE statements produce wrong results if
executed
>> while target table is being clustered.
>
> The short answer is "don't CLUSTER while the table is in live use"
...
>
> CLUSTER re- inserts all the rows in the table into a fresh table.
This
> means that all the rows appear to have been inserted by the CLUSTER
> transaction, and therefore that a transaction that scans the table
> afterward with a snapshot taken before the CLUSTER committed will
not
> see those rows.

This really should be documented in the CLUSTER command.  Having been
falsely reassured by the following statement in the documentation, I
have clustered tables in live use.

"When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired
on it. This prevents any other database operations (both reads and
writes) from operating on the table until the CLUSTER is finished."

We have crontab jobs to do daily clusters on some of our small tables
with high update rates.  The databases are 24/7 with fairly high usage
(query requests in the 10s of millions per day).  Should we be load
shifting off of a server before doing the sub-second CLUSTER on a small,
high update table?

> The difference between EXECUTE and SELECT behavior here is just a
chance
> matter of exactly where the snap is taken during the parse/execute
code
> path ---  your SELECT works because it blocks for AccessShareLock on
the
> table before it sets the snap.  But SELECT would fail just the same
way
> within a serializable transaction that had already set its snapshot.

Ouch!  I don't suppose there is any reasonable way to cover this with
locks?

-Kevin