Re: limit clause breaks query planner? - Mailing list pgsql-performance

From David West
Subject Re: limit clause breaks query planner?
Date
Msg-id 006901c90ce1$00fa5330$02eef990$@west@cusppoint.com
Whole thread Raw
In response to Re: limit clause breaks query planner?  (Guillaume Cottenceau <gc@mnc.ch>)
Responses Re: limit clause breaks query planner?  (Guillaume Cottenceau <gc@mnc.ch>)
List pgsql-performance
Yes I inserted values in big batches according to a single value of 'B', so
indeed a sequence scan may have to scan forward many millions of rows before
finding the required value.

I have been doing regular analyse commands on my table.  I don't think my
table is bloated, I haven't been performing updates.  However I'm doing a
vacuum analyse now and I'll see if that makes any difference.

I am using postgres 8.3.1 with a default install on windows - no tweaks to
the configuration at all.

There are many other columns in my table, but none of them are used in this
query.

Guillaume in your example you didn't add the limit clause?  Postgres chooses
the correct index in my case without the limit clause, the problem is with
the limit clause.  One other difference with your example is both my columns
are varchar columns, not integer and text, I don't know if that would make a
difference.

From looking at the plans, it seems to be postgres is assuming it will only
have to sequentially scan 15 rows, which is not true in my case because
column B is not distributed randomly (nor will it be in production).  Would
postgres not be best to ignore the limit when deciding the best index to use
-  in this simple query wouldn't the best plan to use always be the same
with or without a limit?

Thanks to all of you for your interest in my problem
David

-----Original Message-----
From: Guillaume Cottenceau [mailto:gc@mnc.ch]
Sent: 02 September 2008 10:07
To: David West; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case
when random() < 0.05 then 'Today Alcatel-Lucent has announced that P*******
C**** is appointed non-executive Chairman and B** V******** is appointed
Chief Executive Officer.' else null end );
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
                                   QUERY PLAN

----------------------------------------------------------------------------
-----
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
         ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=50000
width=0)
               Index Cond: (bar = 8)
         ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=50000
width=0)
               Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
                                  QUERY PLAN

----------------------------------------------------------------------------
--
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089
width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36


pgsql-performance by date:

Previous
From: Guillaume Cottenceau
Date:
Subject: Re: limit clause breaks query planner?
Next
From: Gregory Stark
Date:
Subject: Re: limit clause breaks query planner?