Re: 7.0.2 issues / Geocrawler - Mailing list pgsql-hackers

From Tim Perdue
Subject Re: 7.0.2 issues / Geocrawler
Date
Msg-id 396C5637.B11DE9AE@valinux.com
Whole thread Raw
In response to 7.0.2 issues / Geocrawler  (Tim Perdue <tperdue@valinux.com>)
List pgsql-hackers
This is a *big* help.

Yes, the table is approx 10-12GB in size and running your length() and
update queries is going to take a lifetime, since it will require a
calculation on 4 million rows.

This doesn't address the serious performance problem I'm finding in
7.0.2 for a multi-key select/order by/limit/offset query, which I sent
in a separate email.

Tim




Mike Mascari wrote:
> 
> Tim Perdue wrote:
> > ...
> > After that, I get this error from psql:
> >
> > ERROR:  btree: index item size 2820 exceeds maximum 2717
> >
> > Any way to tell where that item is at?
> 
> I've been wondering at the state of the problems you've been
> having with PostgreSQL and wondering why I haven't experienced
> the same. I think this may very well be it. Earlier versions of
> PostgreSQL allowed for the creation of indexes on fields whose
> length would not permit at least 2 entries per index page. 95% of
> the time, things would work fine. But 5% you would get corrupted
> data.
> 
> Before creating the index:
> 
> SELECT * FROM tbl_main_archive WHERE Length(fld_mail_subject) >
> 2700;
> 
> will get you the list of records which cannot be indexed. You're
> attempting to create a multi-key index so I would truncate (or
> delete) any record whose fld_mail_subject is > 2700:
> 
> UPDATE tbl_main_archive SET fld_mail_subject =
> SubStr(fld_mail_subject, 1, 2700);
> 
> At this point, your index creation should be relatively quick
> (and successful) depending upon how many rows you have. I have a
> few tables with ~2 million rows that take about 5 - 10 minutes
> (with fsync off, naturally) to index. I would also recommend
> letting PostgreSQL determine the correct "ops":
> 
> CREATE INDEX idx_mail_archive_list_subject
> ON tbl_mail_archive (fld_mail_list, fld_mail_subject);
> 
> Without following the lists every day, most people wouldn't know
> about this issue. I'm surprised it took so long for PostgreSQL
> 7.0.2 to bail on the index creation though. Is this a
> particularly large table? At any rate, this is an example of a
> bug which *would* allow for the kinds of corruption you've seen
> in the past that has been addressed in 7.0.2, as Tom Lane crushed
> them by the hundreds. If you can:
> 
> psql db_geocrawler < 6_4dump.txt
> 
> and it never bails, then you know all your data is "clean". Until
> that point, any index you have on a "text" datatype is subject to
> similar problems.
> 
> Hope that helps,
> 
> Mike Mascari

-- 
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723


pgsql-hackers by date:

Previous
From: Pavel.Janik@linux.cz (Pavel Janík ml.)
Date:
Subject: Re: Distribution making
Next
From: Gunnar R|nning
Date:
Subject: Re: Contacting me