Thread: Multicolumn index corruption on 8.4 beta 2

Multicolumn index corruption on 8.4 beta 2

From
Floris Bos / Maxnet
Date:
Hi,

I pgdump'ed a 8.3.7 database and loaded the dump to a different server 
running PostgreSQL 8.4 beta 2 (compiled from source) under Opensolaris.

One of the tables has about 6 million records, and a Btree index that 
spans 3 columns.

I am having the problem that some queries are unable to find rows when 
using the index.
When I force a sequential scan, by doing "set enable_indexscan=false; 
set enable_bitmapscan=false;", the same queries work fine.


In addition, while running "vacuum full analyze" I got the following 
error a couple times:

==
ERROR:  failed to re-find parent key in index "pgb_idx" for deletion 
target page 25470
===

Doing "reindex" or dropping and creating the index, makes the error go 
away for a while.
However it does not solve the problem of the missing rows, making me 
believe the index Postgresql generates is still corrupt.

According to memtest the memory of the server is fine, and according to 
"zpool status" there are no disk or ZFS checksum errors.



Any idea how to solve or debug this issue?


Yours sincerely,

Floris Bos


Re: Multicolumn index corruption on 8.4 beta 2

From
Richard Huxton
Date:
Floris Bos / Maxnet wrote:
> I am having the problem that some queries are unable to find rows when 
> using the index.
> When I force a sequential scan, by doing "set enable_indexscan=false; 
> set enable_bitmapscan=false;", the same queries work fine.

Not a hacker myself, but I can tell you that the first question you'll 
be asked is "can you produce a test case"? If you can generate the 
problem from a test table+generated data that will let people figure out 
the problem for you.

If not, details of the table schema will be needed, and is there any 
pattern to the missed rows? Also - compile settings, character set and 
locale details might be relevant too.

--   Richard Huxton  Archonet Ltd


Re: Multicolumn index corruption on 8.4 beta 2

From
Greg Stark
Date:
And can you post an explain plan for the incorrect scan? In particular  
is it using a bitmap index scan or a regular index scan? Or does it  
happen with either?



-- 
Greg


On 9 Jun 2009, at 09:43, Richard Huxton <dev@archonet.com> wrote:

> Floris Bos / Maxnet wrote:
>> I am having the problem that some queries are unable to find rows  
>> when using the index.
>> When I force a sequential scan, by doing "set  
>> enable_indexscan=false; set enable_bitmapscan=false;", the same  
>> queries work fine.
>
> Not a hacker myself, but I can tell you that the first question  
> you'll be asked is "can you produce a test case"? If you can  
> generate the problem from a test table+generated data that will let  
> people figure out the problem for you.
>
> If not, details of the table schema will be needed, and is there any  
> pattern to the missed rows? Also - compile settings, character set  
> and locale details might be relevant too.
>
> -- 
>  Richard Huxton
>  Archonet Ltd
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: Multicolumn index corruption on 8.4 beta 2

From
Floris Bos / Maxnet
Date:
Hi,

Richard Huxton wrote:
> Not a hacker myself, but I can tell you that the first question you'll 
> be asked is "can you produce a test case"? If you can generate the 
> problem from a test table+generated data that will let people figure out 
> the problem for you.

Unfortunately, I have not been able to produce a test case (yet) on a 
small data set.
While the data in the database is public information, the whole database 
is about 100 GB, and therefore kinda hard to share.

> If not, details of the table schema will be needed, and is there any 
> pattern to the missed rows? Also - compile settings, character set and 
> locale details might be relevant too.

==
Compile settings
==

No fancy settings.
- Clean Opensolaris 2009.06 installation
- Installed gcc and gmake packages.
- Downloaded source and did a ./configure --disable-readline ; gmake ; 
gmake install

==
Postgresql settings
==

The following settings differ from the defaults:

--
shared_buffers=3500MB
maintenance_work_mem = 128MB
fsync = off
synchronous_commit = off
checkpoint_segments = 25
--

The locale used when creating the database is SQL_ASCII

==
Hardware
==

Tyan barebone
2x Opteron 2376 quadcore
32 GB reg ecc memory
1x Intel X25-E 32 GB SSD for OS and pg_xlog directory
2x Intel X25-E 64 GB SSD (ZFS striping) for the database

==
Table layout
==

--                                   Table "public.posts_index"   Column   |          Type          | 
Modifiers
------------+------------------------+----------------------------------------------------------- cid        | integer
             | not null default 
 
nextval('posts_index_cid
_seq'::regclass) groupid    | integer                | not null startdate  | integer                | not null poster
 | character varying(64)  | not null basefile   | character varying(64)  | not null subject    | character varying(255)
|not null size       | real                   | nfo        | boolean                | c          | boolean
 | parts      | integer                | totalparts | integer                | imdb       | integer                |
ng1       | boolean                | default false g2         | integer                | default 0 g3         | integer
              | default 0 data       | bytea                  |
 
Indexes:    "posts_index5_pkey" PRIMARY KEY, btree (cid) CLUSTER    "gr_idx" btree (groupid, (- cid))    "pgb_idx"
btree(poster, groupid, basefile)
 
--

Only noticed problems with the pgb_idx index so far.

The problem only occurs on a subset of the rows, at a time.
After adding/updating rows and doing a reindex, the rows that were 
missing before sometimes suddenly do work, but then different ones do not.


> And can you post an explain plan for the incorrect scan? In particular is it using a bitmap index scan or a regular
indexscan? Or does it happen with either?
 

Happens with both.

Index scan:

===
=> explain SELECT count(*) FROM posts_index WHERE 
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND 
basefile='NIB8124849';
QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.25..11.26 rows=1 width=0)   ->  Index Scan using pgb_idx on posts_index  (cost=0.00..11.25 
 
rows=1 width=0)         Index Cond: (((poster)::text = 'Yenc@power-post.org 
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text = 
'NIB8124849'::text))


=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org 
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849'; count
-------     0
===

When I disable index scan, it uses bitmap without luck:

==
=> set enable_indexscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE 
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND 
basefile='NIB8124849'; QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.26..11.27 rows=1 width=0)   ->  Bitmap Heap Scan on posts_index  (cost=7.24..11.26 rows=1 width=0)
     Recheck Cond: (((poster)::text = 'Yenc@power-post.org 
 
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text = 
'NIB8124849'::text))         ->  Bitmap Index Scan on pgb_idx  (cost=0.00..7.24 rows=1 width=0)               Index
Cond:(((poster)::text = 'Yenc@power-post.org 
 
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text = 
'NIB8124849'::text))


=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org 
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849'; count
-------     0
==


Sequential scan does find the row:

==
=> set enable_indexscan=false;
SET
=> set enable_bitmapscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE 
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND 
basefile='NIB8124849';
QUERY PLAN 


------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=288153.28..288153.29 rows=1 width=0)   ->  Seq Scan on posts_index  (cost=0.00..288153.28 rows=1
width=0)        Filter: (((poster)::text = 'Yenc@power-post.org 
 
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text = 
'NIB8124849'::text))
(3 rows)

=> SELECT count(*) FROM posts_index WHERE poster='Yenc@power-post.org 
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849'; count
-------     1
==


Yours sincerely,

Floris Bos


Re: Multicolumn index corruption on 8.4 beta 2

From
Tom Lane
Date:
Floris Bos / Maxnet <bos@je-eigen-domein.nl> writes:
> Richard Huxton wrote:
>> Not a hacker myself, but I can tell you that the first question you'll 
>> be asked is "can you produce a test case"? If you can generate the 
>> problem from a test table+generated data that will let people figure out 
>> the problem for you.

> Unfortunately, I have not been able to produce a test case (yet) on a 
> small data set.
> While the data in the database is public information, the whole database 
> is about 100 GB, and therefore kinda hard to share.

Seems like we'd only need a dump of the one problem table, not the
entire database.
        regards, tom lane


Re: Multicolumn index corruption on 8.4 beta 2

From
Simon Riggs
Date:
On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:

> fsync = off

That's a bad plan if you care about your database.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Multicolumn index corruption on 8.4 beta 2

From
Florian Weimer
Date:
* Simon Riggs:

> On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:
>
>> fsync = off
>
> That's a bad plan if you care about your database.

It shouldn't introduce this type of corruption, though.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99


Re: Multicolumn index corruption on 8.4 beta 2

From
Floris Bos / Maxnet
Date:
Hi,

Simon Riggs wrote:
> On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:
>> fsync = off
> 
> That's a bad plan if you care about your database.

I am aware of the risk of dataloss in case of power failure, etc.

However fsync=on is simply too slow for my purpose, and it concerns data 
that can be regenerated from its source.

The website this setup is for has been running various previous versions 
of PostgreSQL with fsync=off since 2005.
So I still expect it to work.


Yours sincerely,

Floris Bos


Re: Multicolumn index corruption on 8.4 beta 2

From
Greg Stark
Date:
Well sure it could -- once. It wouldn't be reproducible in a freshly
rebuilt index unless he's crashing his machine every time.



--
Greg


On 9 Jun 2009, at 17:12, Florian Weimer <fweimer@bfk.de> wrote:

> * Simon Riggs:
>
>> On Tue, 2009-06-09 at 13:40 +0200, Floris Bos / Maxnet wrote:
>>
>>> fsync = off
>>
>> That's a bad plan if you care about your database.
>
> It shouldn't introduce this type of corruption, though.
>
> --
> Florian Weimer                <fweimer@bfk.de>
> BFK edv-consulting GmbH       http://www.bfk.de/
> Kriegsstraße 100              tel: +49-721-96201-1
> D-76133 Karlsruhe             fax: +49-721-96201-99
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: Multicolumn index corruption on 8.4 beta 2

From
Josh Berkus
Date:
Floris,


> The website this setup is for has been running various previous versions
> of PostgreSQL with fsync=off since 2005.
> So I still expect it to work.

You've been lucky, that's all.

Our documentation has been clear, back to version 7.0, that turning 
fsync=off carries the risk that you will have to recreate your entire 
database in the event of unexpected shutdown.  That's not new.

So, the operative question is: was 8.4 shut down with -immediate or 
otherwise unexpectedly?  If so, then we don't have a bug.  If 8.4 was 
never shut down, then we have some strange behavior which bears looking 
into.  And you've found a wierd corner case, which is what we count on 
our users for.

Thanks for testing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Multicolumn index corruption on 8.4 beta 2

From
Florian Weimer
Date:
* Josh Berkus:

> Our documentation has been clear, back to version 7.0, that turning
> fsync=off carries the risk that you will have to recreate your entire
> database in the event of unexpected shutdown.  That's not new.

The documentation does not say this.  Instead, there's the following
rather explicit explanation that only OS crashes matter:

| (Crashes of the database software itself are not a risk factor
| here. Only an operating-system-level crash creates a risk of
| corruption.)

If it really matters how PostgreSQL is shut down in "fsync = off" mode
(while the operating system keeps running), the documentation is
seriously wrong here.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99


Re: Multicolumn index corruption on 8.4 beta 2

From
Alvaro Herrera
Date:
Florian Weimer wrote:
> * Josh Berkus:
> 
> > Our documentation has been clear, back to version 7.0, that turning
> > fsync=off carries the risk that you will have to recreate your entire
> > database in the event of unexpected shutdown.  That's not new.
> 
> The documentation does not say this.  Instead, there's the following
> rather explicit explanation that only OS crashes matter:
> 
> | (Crashes of the database software itself are not a risk factor
> | here. Only an operating-system-level crash creates a risk of
> | corruption.)
> 
> If it really matters how PostgreSQL is shut down in "fsync = off" mode
> (while the operating system keeps running), the documentation is
> seriously wrong here.

Yeah, AFAICT the writes are handed off to the operating system (just not
synced), so if it flushes its caches sanely at all there shouldn't be a
problem.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Multicolumn index corruption on 8.4 beta 2

From
"Kevin Grittner"
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Yeah, AFAICT the writes are handed off to the operating system (just
> not synced), so if it flushes its caches sanely at all there
> shouldn't be a problem.
I would certainly *hope* that's the case.  We sometimes use fsync=off
for conversions, where we plan to just start over if the conversion
crashes, and set it to on when the conversion is done.  It would be
disturbing to discover that fsync=off also means "don't bother to
write dirty buffers to the OS before shutdown."
-Kevin


Re: Multicolumn index corruption on 8.4 beta 2

From
Josh Berkus
Date:
Alvaro, Kevin,

>> Yeah, AFAICT the writes are handed off to the operating system (just
>> not synced), so if it flushes its caches sanely at all there
>> shouldn't be a problem.
>
> I would certainly *hope* that's the case.  We sometimes use fsync=off
> for conversions, where we plan to just start over if the conversion
> crashes, and set it to on when the conversion is done.  It would be
> disturbing to discover that fsync=off also means "don't bother to
> write dirty buffers to the OS before shutdown."

It doesn't.  But what I don't trust, and the *first* place I'd look for 
problems, is whether the OS flushes *all* dirty buffers to disk in the 
event the application gets killed.

That's why I want more information on Floris' case.  Was 8.4 killed or 
shut down with -m immediate?  Or the os rebooted with 8.4 running?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Multicolumn index corruption on 8.4 beta 2

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> It doesn't.  But what I don't trust, and the *first* place I'd look for 
> problems, is whether the OS flushes *all* dirty buffers to disk in the 
> event the application gets killed.

Why wouldn't you trust it?  The sort of thing you seem to be thinking
about would require tracking which process(es) wrote each dirty buffer
and then going back and dropping selected dirty buffers when a process
exits abnormally.  I can hardly imagine any OS wishing to do that.
        regards, tom lane


Re: Multicolumn index corruption on 8.4 beta 2

From
Floris Bos / Maxnet
Date:
Hi,

Josh Berkus wrote:
> It doesn't.  But what I don't trust, and the *first* place I'd look for 
> problems, is whether the OS flushes *all* dirty buffers to disk in the 
> event the application gets killed.
> 
> That's why I want more information on Floris' case.  Was 8.4 killed or 
> shut down with -m immediate?  Or the os rebooted with 8.4 running?

The only reboots I have done on that server were with the "reboot"
system command, which should send a SIGTERM to all processes first
including PostgreSQL, before pulling the plug.

I do recall that during the execution of "vacuum full" the psql client
program once did report that it lost connection with the server, but was
able to reconnect. Maybe the server processes handling the connection
died then, but I am not sure of that, and it only happened once.


Anyway, the problem also occurs when there is no reboot or unexpected
event between the reindex and the query.

After a REINDEX it is able to find the row it was missing first, but
then other rows become missing.

All in the same psql session:

===
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849'; count
-------     0
(1 row)

usenet=> reindex index pgb_idx;
REINDEX
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849'; count
-------     1
(1 row)

usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND basefile='frx-fffe' AND
groupid=757; count
-------     0
(1 row)

usenet=> set enable_indexscan=false;
SET
usenet=> set enable_bitmapscan=false;
SET
usenet=> SELECT count(*) FROM posts_index WHERE
poster='Yenc@power-post.org (Yenc-PP-A&A)' AND basefile='frx-fffe' AND
groupid=757; count
-------     1
(1 row)

===


Yours sincerely,

Floris Bos



Re: Multicolumn index corruption on 8.4 beta 2

From
Andy Colson
Date:
Floris Bos / Maxnet wrote:
> The following settings differ from the defaults:
> 
> -- 
> shared_buffers=3500MB
> maintenance_work_mem = 128MB
> fsync = off
> synchronous_commit = off
> checkpoint_segments = 25
> -- 
> 
> 
> ==
> Table layout
> ==
> 
> -- 
>                                    Table "public.posts_index"
>    Column   |          Type          | Modifiers
> ------------+------------------------+----------------------------------------------------------- 
> 
>  cid        | integer                | not null default 
> nextval('posts_index_cid
> _seq'::regclass)
>  groupid    | integer                | not null
>  startdate  | integer                | not null
>  poster     | character varying(64)  | not null
>  basefile   | character varying(64)  | not null
>  subject    | character varying(255) | not null
>  size       | real                   |
>  nfo        | boolean                |
>  c          | boolean                |
>  parts      | integer                |
>  totalparts | integer                |
>  imdb       | integer                |
>  ng1        | boolean                | default false
>  g2         | integer                | default 0
>  g3         | integer                | default 0
>  data       | bytea                  |
> Indexes:
>     "posts_index5_pkey" PRIMARY KEY, btree (cid) CLUSTER
>     "gr_idx" btree (groupid, (- cid))
>     "pgb_idx" btree (poster, groupid, basefile)
> -- 
> 
> Only noticed problems with the pgb_idx index so far.


I have been trying to reproduce the problem but no success so far.  I 
made myself a table that matches yours, then I wrote a little perl 
script to fill it with random data.  (The script also writes out a text 
file I can use to re-query things).

I fill the db, then add the indexes.  Then I test lookup every record I 
added, and find them all.

So, a few questions:

1) did you dump/restore into 8.4beta1 first and then upgrade the 
program?  Or did you dump/restore into 8.4beta2?

2) did you use any of the concurrent restore options?

3) do you do any updates or deletes to the table after you restore it?

4) do you do any other operations on the table (vacuum, cluster, etc..)?

5) got any triggers or stored procs?

6) To the -hackers: I write the records and then refind them in the 
exact same order, would it be a better test to search for records in a 
more random order?  would it make a difference?  Would searching for 
some but not all make a difference?


-Andy


Re: Multicolumn index corruption on 8.4 beta 2

From
Josh Berkus
Date:
Andy,

> 6) To the -hackers: I write the records and then refind them in the
> exact same order, would it be a better test to search for records in a
> more random order? would it make a difference? Would searching for some
> but not all make a difference?

Are you on OpenSolaris?  Can you give your script to Zdenek & Jignesh to 
test in their enviroments?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Multicolumn index corruption on 8.4 beta 2

From
Andy Colson
Date:
Josh Berkus wrote:
> Andy,
>
>> 6) To the -hackers: I write the records and then refind them in the
>> exact same order, would it be a better test to search for records in a
>> more random order? would it make a difference? Would searching for some
>> but not all make a difference?
>
> Are you on OpenSolaris?  Can you give your script to Zdenek & Jignesh to
> test in their enviroments?
>

I am not, and yes I can.

Hopefully I can attach a .tar.bz2

The bigtest.sh is the one to run (it runs all the parts).  You'll need
to edit fill.pl and test.pl and set the dbname and maybe give a
username/password.

In the fill.pl there is a $max variable that's used to set the number of
records to insert.  (its set to 10 million right now)

Oh, this .tar wont create a subdirectory


-Andy



Attachment

Re: Multicolumn index corruption on 8.4 beta 2

From
Josh Berkus
Date:
Floris,

One more question set: what version of OpenSolaris, and what filesystem 
are you using?  Does the OS have any non-default tuning settings?  How 
did you install or compile PostgreSQL?


-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Multicolumn index corruption on 8.4 beta 2

From
Tom Lane
Date:
Floris Bos / Maxnet <bos@je-eigen-domein.nl> writes:
> Hi,
> Tom Lane wrote:
>> Floris Bos / Maxnet <bos@je-eigen-domein.nl> writes:
>>> postgres@db:/data$ /opt/postgres/8.4-beta/bin/64/initdb -E SQL_ASCII -X 
>>> /data/pg_xlog /data/db
>>> The database cluster will be initialized with locale en_US.UTF-8.
>> 
>> Oooh, that doesn't look real good.  You're going to be using strcoll()
>> comparisons that assume the data is in UTF8, but the database is not
>> enforcing valid UTF8 encoding.  I have not checked the dump to see if
>> it's all valid data, but this could be the root of the issue.
>> 
>> If you want to use SQL_ASCII because the data isn't uniformly encoded,
>> it'd be better to use C locale.

> Darn.
> Looks like you are right!
> Works a lot better with "--locale=C"

> My 8.3 PostgreSQL installation ran under FreeBSD, and there the locale 
> is C by default:
> So I was not used to have to add a "--locale=C" option.
> Under Opensolaris it's indeed UTF-8 by default.

Yeah, this is kind of unfortunate.  I'm not sure there is much we could
do about it, unless we want to insist that C locale be used if the
database encoding is SQL_ASCII.  That cure seems worse than the disease
though.  We have locked down encoding/locale combinations pretty
strictly for 8.4, but SQL_ASCII is generally thought to be a "let the
user beware" setting.
        regards, tom lane