Re: Query planner is using wrong index. - Mailing list pgsql-performance

From Brian Herlihy
Subject Re: Query planner is using wrong index.
Date
Msg-id 20060406092711.8725.qmail@web52315.mail.yahoo.com
Whole thread Raw
In response to Re: Query planner is using wrong index.  (Ragnar <gnari@hive.is>)
Responses Re: Query planner is using wrong index.
Re: Query planner is using wrong index.
List pgsql-performance
--- Ragnar <gnari@hive.is> wrote:

> On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote:
>
> > I have a problem with the choice of index made by the query planner.
> >
> > My table looks like this:
> >
> > CREATE TABLE t
> > (
> >   p1 varchar not null,
> >   p2 varchar not null,
> >   p3 varchar not null,
> >   i1 integer,
> >   i2 integer,
> >   i3 integer,
> >   i4 integer,
> >   i5 integer,
> >   d1 date,
> >   d2 date,
> >   d3 date,
> >   PRIMARY KEY (p1, p2, p3)
> > );
> >
> > I have also created an index on (p2, p3), as some of my lookups are on
> these
> > only.
>
> > All the integers and dates are data values.
> > The table has around 9 million rows.
> > I am using postgresl 7.4.7
> >
> > I have set statistics to 1000 on the p1, p2 and p3 columns, and run vacuum
> full
> > analyse.  However, I still see
> > query plans like this:
> >
> ...
> > db=# explain select * from t where p2 = 'fairly_common' and p3 =
> > 'fairly_common';
>
> > QUERY PLAN
> >
>

-----------------------------------------------------------------------------------------------------------------------------------
> >  Index Scan using p2p3 on t  (cost=0.00..6.01 rows=1 width=102)
> >    Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text =
> > 'fairly_common'::text))
> > (3 rows)
>
> please show us an actual EXPLAIN ANALYZE
> this will show us more.
>
> > I would like the query planner to use the primary key for all of these
> lookups.
> >  How can I enforce this?
>
> How would that help? have you tested to see if it would
> actualy be better?
>
> gnari
>

Yes, the primary key is far better.  I gave it the ultimate test - I dropped
the (p2, p3) index.  It's blindingly fast when using the PK, which is what I
expect from Postgresql :)  This query is part of an import process, which has
been getting increasingly slow as the table has grown.

I first discovered the problem when I noticed queries which should be simple PK
lookups taking up to 2.5 seconds on an idle system.  I discussed this problem
in the Postgres IRC channel, and it turns out to be due to an inaccurate
selectivity estimate.

The columns p2 and p3 are highly correlated, which is why I often get hundreds
of rows even after specifying values for both these columns.  However, the
query optimizer assumes the columns are not correlated.  It calculates the
selectivity for each column seperately, then multiplies them to get the
combined selectivity for specifying both p2 and p3.  This results in an
estimate of 1 row, which makes the (p2,p3) index look as good as the (p1,p2,p3)
index.

I'm aware now that there is no way to force use of a particular index in
Postgres.  I've also been told that there is no way to have the optimizer take
into account correlation between column values.

My options seem to be
  - Fudge the analysis results so that the selectivity estimate changes.  I
have tested reducing n_distinct, but this doesn't seem to help.
  - Combine the columns into one column, allowing postgres to calculate the
combined selectivity.
  - Drop the (p2, p3) index.  But I need this for other queries.

None of these are good solutions.  So I am hoping that there is a better way to
go about this!

Thanks,
Brian

pgsql-performance by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Sun Fire T2000 and PostgreSQL 8.1.3
Next
From: "Juan Casero \(FL FLC\)"
Date:
Subject: Re: Sun Fire T2000 and PostgreSQL 8.1.3