Re: limit clause breaks query planner? - Mailing list pgsql-performance

From Gregory Stark
Subject Re: limit clause breaks query planner?
Date
Msg-id 87od363mqx.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: limit clause breaks query planner?  (Guillaume Cottenceau <gc@mnc.ch>)
Responses Re: limit clause breaks query planner?
List pgsql-performance

>>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>>> "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A" column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

pgsql-performance by date:

Previous
From: "David West"
Date:
Subject: Re: limit clause breaks query planner?
Next
From: Guillaume Cottenceau
Date:
Subject: Re: limit clause breaks query planner?