Re: Advice for optimizing queries using Large Tables - Mailing list pgsql-general

From Tom Lane
Subject Re: Advice for optimizing queries using Large Tables
Date
Msg-id 27024.1015776894@sss.pgh.pa.us
Whole thread Raw
In response to Advice for optimizing queries using Large Tables  ("Shaun Grannis" <shaun_grannis@hotmail.com>)
List pgsql-general
"Shaun Grannis" <shaun_grannis@hotmail.com> writes:
> I'm working with a table containing over 65 million records in Postgres v
> 7.1.3.

> This query:
>     SELECT count(*) FROM table WHERE value=1999;
> takes approximately 45 minutes to execute, and returns a count of approx 2.2
> million records.

A query scanning 1/30th of the table almost certainly should use a
seqscan not an indexscan.  Does it get faster if you do "set
enable_seqscan to off"?

>     Aggregate (cost=477861.60..477861.60 rows=1 width=0)
>         -> Index Scan using value_idx on table (cost=0.00..477553.70
> rows=123157 width=0)

Hmm.  The reason that the planner is making the wrong plan choice is the
drastic underestimation of the number of matched rows.  With so few
distinct values in the column I'd have expected 7.1 to get a more
accurate estimate, but it's probably not worth worrying about at this
point.  The short answer is to update to 7.2 --- it has much better
statistics-gathering code and should pick the right plan.

            regards, tom lane

PS: this is a refreshing change from the usual "I want an indexscan,
why aren't I getting one?" type of planner mistake ;-)

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Referential Integrity Triggers
Next
From: Francisco Reyes
Date:
Subject: Wal_buffers memory utilization