Thread: BUG #2830: Wrong results for prepared statements while clustering target table
The following bug has been logged online: Bug reference: 2830 Logged by: Email address: martin.pihlak@gmail.com PostgreSQL version: 8.2.0, older Operating system: Ubuntu 6.10 2.6.17-10-386, libc6 2.4-1ubuntu12 Description: Wrong results for prepared statements while clustering target table Details: Prepared SELECT/UPDATE/DELETE statements produce wrong results if executed while target table is being clustered. Only seems to happens when CLUSTER and statement are executed in different sessions. Tested on 8.0.6, 8.0.8, 8.1.4 and 8.2.0, reproducible with the following test case: create table test(i integer); -- generate large enough table, so that clustering takes more than 1 sec insert into test select * from generate_series(1,1000000); create index test_idx on test(i); prepare c as select count(*) from test; \! psql -c "cluster test_idx on test" & -- sleep to make sure we stay behind cluster \! sleep 1 -- should produce same values execute c; select count(1) from test; Executing 'c' after CLUSTER returns 0 instead of 1000000.
Re: BUG #2830: Wrong results for prepared statements while clustering target table
From
Tom Lane
Date:
"" <martin.pihlak@gmail.com> writes: > Prepared SELECT/UPDATE/DELETE statements produce wrong results if executed > while target table is being clustered. The short answer is "don't CLUSTER while the table is in live use" ... CLUSTER re-inserts all the rows in the table into a fresh table. This means that all the rows appear to have been inserted by the CLUSTER transaction, and therefore that a transaction that scans the table afterward with a snapshot taken before the CLUSTER committed will not see those rows. It'd be better if CLUSTER preserved the rows' MVCC state but don't hold your breath for that; any such change is certainly not going to get back-patched into stable releases. The difference between EXECUTE and SELECT behavior here is just a chance matter of exactly where the snap is taken during the parse/execute code path --- your SELECT works because it blocks for AccessShareLock on the table before it sets the snap. But SELECT would fail just the same way within a serializable transaction that had already set its snapshot. regards, tom lane
Re: BUG #2830: Wrong results for prepared statements while clustering target table
From
"Martin Pihlak"
Date:
> The short answer is "don't CLUSTER while the table is in live use" ... > This is kind of difficult on a busy database, more so if it's a 24x7 environment. And unfortunately there aren't any good alternatives either. > The difference between EXECUTE and SELECT behavior here is just a chance > matter of exactly where the snap is taken during the parse/execute code > path --- your SELECT works because it blocks for AccessShareLock on the > table before it sets the snap. But SELECT would fail just the same way > within a serializable transaction that had already set its snapshot. > Ok, makes sense. The same reasoning probably applies to INSERT and UPDATE as well. Still the problem remains - how to cluster a table on a busy system without losing data or getting wrong results. Perhaps the issue should be documented, although a fix would be preferrable ;) Martin
>>> On Sat, Dec 16, 2006 at 5:15 PM, in message <8738.1166310917@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "" <martin.pihlak@gmail.com> writes: >> Prepared SELECT/UPDATE/DELETE statements produce wrong results if executed >> while target table is being clustered. > > The short answer is "don't CLUSTER while the table is in live use" ... > > CLUSTER re- inserts all the rows in the table into a fresh table. This > means that all the rows appear to have been inserted by the CLUSTER > transaction, and therefore that a transaction that scans the table > afterward with a snapshot taken before the CLUSTER committed will not > see those rows. This really should be documented in the CLUSTER command. Having been falsely reassured by the following statement in the documentation, I have clustered tables in live use. "When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished." We have crontab jobs to do daily clusters on some of our small tables with high update rates. The databases are 24/7 with fairly high usage (query requests in the 10s of millions per day). Should we be load shifting off of a server before doing the sub-second CLUSTER on a small, high update table? > The difference between EXECUTE and SELECT behavior here is just a chance > matter of exactly where the snap is taken during the parse/execute code > path --- your SELECT works because it blocks for AccessShareLock on the > table before it sets the snap. But SELECT would fail just the same way > within a serializable transaction that had already set its snapshot. Ouch! I don't suppose there is any reasonable way to cover this with locks? -Kevin