Re: 7.0.2 issues / Geocrawler - Mailing list pgsql-hackers

From Mike Mascari
Subject Re: 7.0.2 issues / Geocrawler
Date
Msg-id 396C1B50.365FA789@mascari.com
Whole thread Raw
In response to 7.0.2 issues / Geocrawler  (Tim Perdue <tperdue@valinux.com>)
List pgsql-hackers
Tim Perdue wrote:
> ...
> After that, I get this error from psql:
> 
> ERROR:  btree: index item size 2820 exceeds maximum 2717
> 
> Any way to tell where that item is at?

I've been wondering at the state of the problems you've been
having with PostgreSQL and wondering why I haven't experienced
the same. I think this may very well be it. Earlier versions of
PostgreSQL allowed for the creation of indexes on fields whose
length would not permit at least 2 entries per index page. 95% of
the time, things would work fine. But 5% you would get corrupted
data.

Before creating the index:

SELECT * FROM tbl_main_archive WHERE Length(fld_mail_subject) >
2700;

will get you the list of records which cannot be indexed. You're
attempting to create a multi-key index so I would truncate (or
delete) any record whose fld_mail_subject is > 2700:

UPDATE tbl_main_archive SET fld_mail_subject =
SubStr(fld_mail_subject, 1, 2700);

At this point, your index creation should be relatively quick
(and successful) depending upon how many rows you have. I have a
few tables with ~2 million rows that take about 5 - 10 minutes
(with fsync off, naturally) to index. I would also recommend
letting PostgreSQL determine the correct "ops":

CREATE INDEX idx_mail_archive_list_subject 
ON tbl_mail_archive (fld_mail_list, fld_mail_subject);

Without following the lists every day, most people wouldn't know
about this issue. I'm surprised it took so long for PostgreSQL
7.0.2 to bail on the index creation though. Is this a
particularly large table? At any rate, this is an example of a
bug which *would* allow for the kinds of corruption you've seen
in the past that has been addressed in 7.0.2, as Tom Lane crushed
them by the hundreds. If you can:

psql db_geocrawler < 6_4dump.txt

and it never bails, then you know all your data is "clean". Until
that point, any index you have on a "text" datatype is subject to
similar problems. 

Hope that helps,

Mike Mascari


pgsql-hackers by date:

Previous
From: Alfred Perlstein
Date:
Subject: Re: Connection pooling.
Next
From: Tom Lane
Date:
Subject: Re: Connection pooling.