Thread: Hmmm ... isn't count_nondeletable_pages all wet?
While looking at the HOT patch I suddenly started to question the sanity of vacuumlazy.c's count_nondeletable_pages(). It sits there and does a HeapTupleSatisfiesVacuum on any tuples it finds, and is willing to truncate away a page that contains only DEAD tuples. The problem with this theory is that any index entries linking to those tuples won't have been cleaned up, and will therefore emerge as index corruption once the table grows again (because they will link to tuples that in all probability don't match the index entries). Now since this test is made using the same OldestXmin threshold that we used in the vacuuming pass, tuples that were RECENTLY_DEAD will still be that way. However, it seems to me that there's a race condition anyway: 1. VACUUM scans and cleans a page. 2. Some other transaction inserts a tuple into that page. 3. The inserting transaction aborts. 4. VACUUM returns to the page and sees the tuple as HEAPTUPLE_DEAD. In this scenario we could truncate the page away and not have cleaned up the index entries linking to it. I'm thinking that count_nondeletable_pages should not bother itself with visibility tests, but just forget truncation if it finds any items whatsoever on the page. Is this analysis accurate, or am I missing something? If it is accurate, do we need to postpone the upcoming releases to fix it? I am thinking that some previously unexplained reports of index corruption might now be explained ... regards, tom lane
I wrote: > Is this analysis accurate, or am I missing something? If it is > accurate, do we need to postpone the upcoming releases to fix it? > I am thinking that some previously unexplained reports of index > corruption might now be explained ... Yeah, it's broken. Reproducing the race condition is a bit tricky, but if you have a debug-enabled build at hand it's simple. Here's a test case (turn off autovacuum if it's on, to prevent it from getting in there ahead of the manual vacuum): In session 1: regression=# create table foo (f1 int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" CREATE TABLE regression=# insert into foo select g from generate_series(1,10) g; INSERT 0 10 regression=# delete from foo; DELETE 10 Now start session 2, and attach to it with gdb, and set a breakpoint at lazy_truncate_heap(). Continue, and in session 2 do regression=# vacuum verbose foo; INFO: vacuuming "public.foo" INFO: scanned index "foo_pkey" to remove 10 row versions DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "foo": removed 10 row versions in 1 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "foo_pkey" now contains 0 row versions in 2 pages DETAIL: 10 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "foo": found 10 removable, 0 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. [ hangs here at the breakpoint ] In session 1 do: regression=# begin; BEGIN regression=# insert into foo select g+100 from generate_series(1,10) g; INSERT 0 10 regression=# abort; ROLLBACK Now let gdb continue from the breakpoint, and notice session 2 thinks it can truncate away the whole table: INFO: "foo": truncated 1 to 0 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec. VACUUM regression=# Back to session 1, insert some conflicting data: regression=# insert into foo select g+1000 from generate_series(1,10) g; INSERT 0 10 and now we have a corrupt index with multiple pointers to these rows: regression=# select * from foo where f1 = 105; f1 ------1005 (1 row) regression=# select * from foo where f1 = 1005; f1 ------1005 (1 row) I am fairly sure that this bug explains problems previously reported by Merlin Moncure: http://archives.postgresql.org/pgsql-general/2006-10/msg01312.php and Florian Weimer: http://archives.postgresql.org/pgsql-general/2006-11/msg00305.php In both those cases, off-list investigation showed that the symptoms were caused by multiple index entries pointing to the same heap tuples, where one index entry matched the actual contents of the row and the other did not. In both cases this occurred for a fairly small number of rows that were clumped together into small ranges of blocks. It looks to me like this is perfectly explained by the theory that that range of blocks had been truncated away by a VACUUM at some point in the table's history, and that the non-matching index entries stemmed from an insert or update that occurred and then aborted after VACUUM had examined the blocks the first time but before it could return to check whether the blocks were still empty. It's a corner case, but I say it's a must-fix. Those bug reports have been bothering me for most of a year ... regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > It's a corner case, but I say it's a must-fix. Of course > Those bug reports have been bothering me for most of a year ... Are there any other outstanding reports like that? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Those bug reports have been bothering me for most of a year ... > Are there any other outstanding reports like that? Couldn't say. Those two were on my mind because the reporters had allowed me to troll through their table and index files and verify that there were indeed multiple index pointers to the same table row, and yet the index itself did not show any indication of corruption. I had noticed the "clumping" pattern of the doubly-linked rows in both cases, but hadn't found an explanation. regards, tom lane
On 9/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Couldn't say. Those two were on my mind because the reporters had
allowed me to troll through their table and index files and verify
that there were indeed multiple index pointers to the same table row,
and yet the index itself did not show any indication of corruption.
I had noticed the "clumping" pattern of the doubly-linked rows in
both cases, but hadn't found an explanation.
This was a nice catch indeed! I remember we fixed a similar issue
with VACUUM FULL few months back. Such issues would become
prominent with HOT because of early vacuuming and reuse of
tuples. Also unlike current VACUUM which runs exclusively on a table
(i.e. no two vacuums are allowed to run on a table simultaneously),
HOT makes concurrent pruning and vacuuming of a table possible.
In fact, during HOT development and testing cycles, I faced
similar race conditions a few times. This is just to keep ourselves aware of
the changes introduced by HOT. Hopefully by now we have fixed most of
the outstanding bugs in the HOT code (and Tom's eyes would catch the
remaining, if any :-)). Fingers crossed.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
* Tom Lane: > I am fairly sure that this bug explains problems previously reported > by Merlin Moncure: > http://archives.postgresql.org/pgsql-general/2006-10/msg01312.php > and Florian Weimer: > http://archives.postgresql.org/pgsql-general/2006-11/msg00305.php > In both those cases, off-list investigation showed that the symptoms > were caused by multiple index entries pointing to the same heap tuples, > where one index entry matched the actual contents of the row and > the other did not. In both cases this occurred for a fairly small > number of rows that were clumped together into small ranges of blocks. > It looks to me like this is perfectly explained by the theory that > that range of blocks had been truncated away by a VACUUM at some point > in the table's history, and that the non-matching index entries stemmed > from an insert or update that occurred and then aborted after VACUUM had > examined the blocks the first time but before it could return to check > whether the blocks were still empty. We did have auto-vacuum running, and while the table in question was supposedly INSERT-only, some rollback might have occurred before the corruption hit us, resulting in the dead tuples. So your explanation makes sense to me (but I'm not really familiar with PostgreSQL internals). Regarding Scott's commment of other reports, I don't think we've experienced the issue again; we've switched servers since then, and the usage patterns have changed over time. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Hi All, I am giving the command cat config.log|grep -w 'PG_VERSION' Which gives the following Output: | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" | #define PG_VERSION "8.3beta2" #define PG_VERSION "8.3beta2" But the output that I would require is PG_VERSION "8.3beta2" that should occur only once, can anyone please tell me the command to get this output. Thanks in advance Regards Cinu Kuriakose
Leaving aside the question of why one might want to do this, Unix 101 should show you many ways to do it. For example, sed -n -e 's/.*PG_VERSION /PG_VERSION /p' -e /PG_VERSION/q config.log Please don't cross-post questions like this, especially when it's not really a PostgreSQL question at all. cheers andrew Kuriakose, Cinu Cheriyamoozhiyil wrote: > Hi All, > > I am giving the command > > cat config.log|grep -w 'PG_VERSION' > > Which gives the following Output: > > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > | #define PG_VERSION "8.3beta2" > #define PG_VERSION "8.3beta2" > > But the output that I would require is > > PG_VERSION "8.3beta2" that should occur only once, can anyone please > tell me the command to get this output. > > Thanks in advance > > Regards > Cinu Kuriakose > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > >
Or ... ask the application not the OS ....
psql> select version() ;
Cheers
Medi
psql> select version() ;
Cheers
Medi
On 10/29/07, Andrew Dunstan < andrew@dunslane.net> wrote:
Leaving aside the question of why one might want to do this, Unix 101
should show you many ways to do it. For example,
sed -n -e 's/.*PG_VERSION /PG_VERSION /p' -e /PG_VERSION/q config.log
Please don't cross-post questions like this, especially when it's not
really a PostgreSQL question at all.
cheers
andrew
Kuriakose, Cinu Cheriyamoozhiyil wrote:
> Hi All,
>
> I am giving the command
>
> cat config.log|grep -w 'PG_VERSION'
>
> Which gives the following Output:
>
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION " 8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> | #define PG_VERSION "8.3beta2"
> #define PG_VERSION "8.3beta2"
>
> But the output that I would require is
>
> PG_VERSION "8.3beta2" that should occur only once, can anyone please
> tell me the command to get this output.
>
> Thanks in advance
>
> Regards
> Cinu Kuriakose
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
>
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster