Thread: Query planner is using wrong index.

Query planner is using wrong index.

From
Brian Herlihy
Date:
Hi,

I have a problem with the choice of index made by the query planner.

My table looks like this:

CREATE TABLE t
(
  p1 varchar not null,
  p2 varchar not null,
  p3 varchar not null,
  i1 integer,
  i2 integer,
  i3 integer,
  i4 integer,
  i5 integer,
  d1 date,
  d2 date,
  d3 date,
  PRIMARY KEY (p1, p2, p3)
);

I have also created an index on (p2, p3), as some of my lookups are on these
only.
All the integers and dates are data values.
The table has around 9 million rows.
I am using postgresl 7.4.7

I have set statistics to 1000 on the p1, p2 and p3 columns, and run vacuum full
analyse.  However, I still see
query plans like this:

db=# explain select * from t where p1 = 'something' and p2 = 'fairly_common'
and p3 = 'fairly_common';
        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2p3 on t  (cost=0.00..6.01 rows=1 width=102)
   Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text =
'fairly_common'::text))
   Filter: ((p1)::text = 'something'::text)
(3 rows)

The problem appears to be this:

db=# explain select * from t where p2 = 'fairly_common' and p3 =
'fairly_common';
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2p3 on t  (cost=0.00..6.01 rows=1 width=102)
   Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text =
'fairly_common'::text))
(3 rows)

The query planner thinks that this will return only 1 row.
In fact, these index lookups sometimes return up to 500 rows, which then must
be filtered by p1.
This can take 2 or 3 seconds to execute for what should be a simple primary key
lookup.

For VERY common values of p2 and p3, the query planner chooses the primary key,
because these values are stored
explicitly in the analyse results.  For rare values there is no problem,
because the query runs quickly.
But for "fairly common" values, there is a problem.

I would like the query planner to use the primary key for all of these lookups.
 How can I enforce this?

Thanks,
Brian

Re: Query planner is using wrong index.

From
Ragnar
Date:
On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote:

> I have a problem with the choice of index made by the query planner.
>
> My table looks like this:
>
> CREATE TABLE t
> (
>   p1 varchar not null,
>   p2 varchar not null,
>   p3 varchar not null,
>   i1 integer,
>   i2 integer,
>   i3 integer,
>   i4 integer,
>   i5 integer,
>   d1 date,
>   d2 date,
>   d3 date,
>   PRIMARY KEY (p1, p2, p3)
> );
>
> I have also created an index on (p2, p3), as some of my lookups are on these
> only.

> All the integers and dates are data values.
> The table has around 9 million rows.
> I am using postgresl 7.4.7
>
> I have set statistics to 1000 on the p1, p2 and p3 columns, and run vacuum full
> analyse.  However, I still see
> query plans like this:
>
...
> db=# explain select * from t where p2 = 'fairly_common' and p3 =
> 'fairly_common';
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using p2p3 on t  (cost=0.00..6.01 rows=1 width=102)
>    Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text =
> 'fairly_common'::text))
> (3 rows)

please show us an actual EXPLAIN ANALYZE
this will show us more.

> I would like the query planner to use the primary key for all of these lookups.
>  How can I enforce this?

How would that help? have you tested to see if it would
actualy be better?

gnari



Re: Query planner is using wrong index.

From
Brian Herlihy
Date:
--- Ragnar <gnari@hive.is> wrote:

> On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote:
>
> > I have a problem with the choice of index made by the query planner.
> >
> > My table looks like this:
> >
> > CREATE TABLE t
> > (
> >   p1 varchar not null,
> >   p2 varchar not null,
> >   p3 varchar not null,
> >   i1 integer,
> >   i2 integer,
> >   i3 integer,
> >   i4 integer,
> >   i5 integer,
> >   d1 date,
> >   d2 date,
> >   d3 date,
> >   PRIMARY KEY (p1, p2, p3)
> > );
> >
> > I have also created an index on (p2, p3), as some of my lookups are on
> these
> > only.
>
> > All the integers and dates are data values.
> > The table has around 9 million rows.
> > I am using postgresl 7.4.7
> >
> > I have set statistics to 1000 on the p1, p2 and p3 columns, and run vacuum
> full
> > analyse.  However, I still see
> > query plans like this:
> >
> ...
> > db=# explain select * from t where p2 = 'fairly_common' and p3 =
> > 'fairly_common';
>
> > QUERY PLAN
> >
>

-----------------------------------------------------------------------------------------------------------------------------------
> >  Index Scan using p2p3 on t  (cost=0.00..6.01 rows=1 width=102)
> >    Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text =
> > 'fairly_common'::text))
> > (3 rows)
>
> please show us an actual EXPLAIN ANALYZE
> this will show us more.
>
> > I would like the query planner to use the primary key for all of these
> lookups.
> >  How can I enforce this?
>
> How would that help? have you tested to see if it would
> actualy be better?
>
> gnari
>

Yes, the primary key is far better.  I gave it the ultimate test - I dropped
the (p2, p3) index.  It's blindingly fast when using the PK, which is what I
expect from Postgresql :)  This query is part of an import process, which has
been getting increasingly slow as the table has grown.

I first discovered the problem when I noticed queries which should be simple PK
lookups taking up to 2.5 seconds on an idle system.  I discussed this problem
in the Postgres IRC channel, and it turns out to be due to an inaccurate
selectivity estimate.

The columns p2 and p3 are highly correlated, which is why I often get hundreds
of rows even after specifying values for both these columns.  However, the
query optimizer assumes the columns are not correlated.  It calculates the
selectivity for each column seperately, then multiplies them to get the
combined selectivity for specifying both p2 and p3.  This results in an
estimate of 1 row, which makes the (p2,p3) index look as good as the (p1,p2,p3)
index.

I'm aware now that there is no way to force use of a particular index in
Postgres.  I've also been told that there is no way to have the optimizer take
into account correlation between column values.

My options seem to be
  - Fudge the analysis results so that the selectivity estimate changes.  I
have tested reducing n_distinct, but this doesn't seem to help.
  - Combine the columns into one column, allowing postgres to calculate the
combined selectivity.
  - Drop the (p2, p3) index.  But I need this for other queries.

None of these are good solutions.  So I am hoping that there is a better way to
go about this!

Thanks,
Brian

Re: Query planner is using wrong index.

From
Ragnar
Date:
On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote:
> --- Ragnar <gnari@hive.is> wrote:
>
> > On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote:
> >
...
> > >   PRIMARY KEY (p1, p2, p3)
...
> > >
> > > I have also created an index on (p2, p3), as some of my lookups are on
> > > these only.
...
> > > db=# explain select * from t where p2 = 'fairly_common' and p3 =
> > > 'fairly_common';

> > please show us an actual EXPLAIN ANALYZE

> > > I would like the query planner to use the primary key for all of these
> > lookups.
> >
> > have you tested to see if it would  actualy be better?
> >

> Yes, the primary key is far better.  I gave it the ultimate test - I dropped
> the (p2, p3) index.  It's blindingly fast when using the PK,

I have problems understanding exactly how an index on
(p1,p2,p3) can be faster than and index on (p2,p3) for
a query not involving p1.
can you demonstrate this with actual EXPLAIN ANALYZES ?
something like:
EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
BEGIN;
DROP INDEX p2p3;
EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
ROLLBACK;

maybe your p2p3 index needs REINDEX ?


> My options seem to be
>   - Fudge the analysis results so that the selectivity estimate changes.  I
> have tested reducing n_distinct, but this doesn't seem to help.
>   - Combine the columns into one column, allowing postgres to calculate the
> combined selectivity.
>   - Drop the (p2, p3) index.  But I need this for other queries.
>
> None of these are good solutions.  So I am hoping that there is a better way to
> go about this!

I think we must detemine exactly what the problem is
before  devising complex solutions

gnari



Re: Query planner is using wrong index.

From
Brian Herlihy
Date:
--- Ragnar <gnari@hive.is> wrote:

> On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote:
>
> > Yes, the primary key is far better.  I gave it the ultimate test - I
> dropped
> > the (p2, p3) index.  It's blindingly fast when using the PK,
>
> I have problems understanding exactly how an index on
> (p1,p2,p3) can be faster than and index on (p2,p3) for
> a query not involving p1.
> can you demonstrate this with actual EXPLAIN ANALYZES ?
> something like:
> EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
> BEGIN;
> DROP INDEX p2p3;
> EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
> ROLLBACK;
>
> maybe your p2p3 index needs REINDEX ?
>

Here's the output.  The timings after caching are repeatable (varying only by
10% or so).

Query before caching:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';
                                                              QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=2793.247..2793.247 rows=0 loops=1)
   Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
   Filter: ((p1)::text = 'a'::text)
 Total runtime: 2793.303 ms
(4 rows)

Query after caching:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';
                                                              QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.617..0.617 rows=0 loops=1)
   Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
   Filter: ((p1)::text = 'a'::text)
 Total runtime: 0.665 ms
(4 rows)

=== At this point I did "DROP INDEX p2_p3_idx"

Query after dropping index:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';

QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=95.188..95.188 rows=0 loops=1)
   Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text =
'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
 Total runtime: 95.239 ms
(3 rows)

Query after dropping index, fully cached:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';

QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.030..0.030 rows=0 loops=1)
   Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text =
'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
 Total runtime: 0.077 ms
(3 rows)



And one where the query planner chooses the primary key instead.  Both p2 and
p3 are present as Most Common Values in pg_statistics:

Query before fully cached:

db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com'
AND p3 = 'search?hl=&lr=&q=';


QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=212.092..212.100 rows=1 loops=1)
   Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text =
'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text))
 Total runtime: 212.159 ms
(3 rows)

Query after fully cached:

db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com'
AND p3 = 'search?hl=&lr=&q=';

QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.034..0.039 rows=1 loops=1)
   Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text =
'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text))
 Total runtime: 0.094 ms
(3 rows)


I have set statistics to 1000 on all of p1, p2 and p3.  The table was recently
vacuumed and analyzed, and the index was recreated (after being dropped) before
these tests were run.  The tests are 100% reproducible, both in postgresql
7.4.7 and 8.1.3.

The indexes are:

t_pkey (p1, p2, p3)  -- UNIQUE, PRIMARY KEY
p2_p3_idx (p2, p3)   -- NOT UNIQUE

The problem is that a lookup which specifies p2 and p3 can return as many as
500 rows.  The optimizer assumes that such a lookup will return 1 row, and so
it chooses a bad plan.  That sums it up.

What I need is a way to make it choose the primary key.

Thanks in advance,
Brian

Re: Query planner is using wrong index.

From
Ragnar
Date:
On fös, 2006-04-07 at 00:01 +1000, Brian Herlihy wrote:
> --- Ragnar <gnari@hive.is> wrote:
>
> > On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote:
> >
> > > Yes, the primary key is far better.  I gave it the ultimate test - I
> > dropped
> > > the (p2, p3) index.  It's blindingly fast when using the PK,
> >
> > I have problems understanding exactly how an index on
> > (p1,p2,p3) can be faster than and index on (p2,p3) for
> > a query not involving p1.

> db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
> AND p3 = 'web/results?itag=&q=&kgs=&kls=';

this is different from what you said earlier. in your
original post you showed a problem query without any
reference to p1 in the WHERE clause. this confused me.

>  Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
> time=2793.247..2793.247 rows=0 loops=1)
>    Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
> 'web/results?itag=&q=&kgs=&kls='::text))
>    Filter: ((p1)::text = 'a'::text)
>  Total runtime: 2793.303 ms
> (4 rows)

try to add an ORDER BY clause:

explain analyze
  select * from t
  WHERE p1 = 'a'
    and p2 = 'uk.altavista.com'
    AND p3 = 'web/results?itag=&q=&kgs=&kls='
  ORDER BY p1,p2,p3;

this might push the planner into using the primary key

gnari





Re: Query planner is using wrong index.

From
Brian Herlihy
Date:
--- Ragnar <gnari@hive.is> wrote:

> On fös, 2006-04-07 at 00:01 +1000, Brian Herlihy wrote:
> >  Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102)
> (actual
> > time=2793.247..2793.247 rows=0 loops=1)
> >    Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
> > 'web/results?itag=&q=&kgs=&kls='::text))
> >    Filter: ((p1)::text = 'a'::text)
> >  Total runtime: 2793.303 ms
> > (4 rows)
>
> try to add an ORDER BY clause:
>
> explain analyze
>   select * from t
>   WHERE p1 = 'a'
>     and p2 = 'uk.altavista.com'
>     AND p3 = 'web/results?itag=&q=&kgs=&kls='
>   ORDER BY p1,p2,p3;
>
> this might push the planner into using the primary key
>
> gnari
>

Thankyou very much, that works very well for select.  However, I need it to
work for update as well.  Is there an equivalent way to force use of an index
for updates?

Here are the results for select:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=' order by p1,p2,p3;

QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=32.519..32.519 rows=0 loops=1)
   Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text =
'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
 Total runtime: 32.569 ms
(3 rows)

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';
                                                              QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=2790.364..2790.364 rows=0 loops=1)
   Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
   Filter: ((p1)::text = 'a'::text)
 Total runtime: 2790.420 ms
(4 rows)


But I cannot add an "order by" to an update.

The other idea I came up with last night was to change p2_p3_idx so it indexes
a value derived from p2 and p3, rather than p2 and p3 themselves.  This would
"hide" this index from the optimizer, forcing it to use the primary key.

I am really surprised that I have to go through such contortions just to use
the primary key!  This area of Postgres needs improvement.

Thanks,
Brian

Re: Query planner is using wrong index.

From
"Dave Dutcher"
Date:

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Brian Herlihy
> Sent: Thursday, April 06, 2006 6:56 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Query planner is using wrong index.
[Snip]
> I am really surprised that I have to go through such contortions just
to
> use
> the primary key!  This area of Postgres needs improvement.
>


Of course you mentioned that you are using 7.4.7.  You might want to try
upgrading to 8.1.3.  There have been a lot of improvements to the
performance since 7.4. I don't know if your specific problem was fixed,
but it's worth a try.

Also you might want to at least upgrade to 7.4.12 for the bug fixes.



Re: Query planner is using wrong index.

From
Brian Herlihy
Date:
--- Dave Dutcher <dave@tridecap.com> wrote:
> > -----Original Message-----
> > To: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Query planner is using wrong index.
> [Snip]
> > I am really surprised that I have to go through such contortions just
> to
> > use
> > the primary key!  This area of Postgres needs improvement.
> >
>
>
> Of course you mentioned that you are using 7.4.7.  You might want to try
> upgrading to 8.1.3.  There have been a lot of improvements to the
> performance since 7.4. I don't know if your specific problem was fixed,
> but it's worth a try.
>
> Also you might want to at least upgrade to 7.4.12 for the bug fixes.

Thanks for the suggestions.  I've verified the same problem in 8.1.3 as well,
after my initial post.  It was actually in 8.1.3 that I first discovered the
problem.

I noticed this item in the TODO list:

- Allow accurate statistics to be collected on indexes with more than one
column or expression indexes, perhaps using per-index statistics

This is what I need!  But until that is added, I need a way to use the primary
key with the current version :)

Thanks,
Brian

Re: Query planner is using wrong index.

From
Tom Lane
Date:
Brian Herlihy <btherl@yahoo.com.au> writes:
> My options seem to be
>   - Fudge the analysis results so that the selectivity estimate changes.  I
> have tested reducing n_distinct, but this doesn't seem to help.
>   - Combine the columns into one column, allowing postgres to calculate the
> combined selectivity.
>   - Drop the (p2, p3) index.  But I need this for other queries.

Have you considered reordering the pkey to be (p2,p3,p1) and then
dropping the (p2,p3) index?

            regards, tom lane

Re: Query planner is using wrong index.

From
Brian Herlihy
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Brian Herlihy <btherl@yahoo.com.au> writes:
> > My options seem to be
> >   - Fudge the analysis results so that the selectivity estimate changes.  I
> > have tested reducing n_distinct, but this doesn't seem to help.
> >   - Combine the columns into one column, allowing postgres to calculate the
> > combined selectivity.
> >   - Drop the (p2, p3) index.  But I need this for other queries.
>
> Have you considered reordering the pkey to be (p2,p3,p1) and then
> dropping the (p2,p3) index?
>
>             regards, tom lane

Hi Tom,

I've considered it.  Unfortunately I need to do lookups on (p1) and (p1,p2) as
well as (p1, p2, p3).

The solution I've gone with is to create an index on (p2 || '/' || p3).  This
is unique for each p2/p3 combination, because p2 cannot contain the '/'
character.  I'm assuming that this index will be no slower to generate than one
on (p2, p3), as concatenation is very cheap.  Having the index on an expression
"hides" it from the optimizer, which is then forced to use the primary key
instead.

It works perfectly now!  There were only 2 queries in the system which need
this index, so it was no problem to change them.

Thankyou very much for all your time and patience!

Before I go, I have a question - From discussions on the Postgresql irc
channel, and from reading the TODO list on the website, I am under the
impression that there are no plans to allow optimizer hints, such as "use index
table_pkey".  Is this really true?  Such a feature would make life inestimably
easier for your end-users, particularly me :)

Thanks,
Brian

Re: Query planner is using wrong index.

From
Tom Lane
Date:
Brian Herlihy <btherl@yahoo.com.au> writes:
> Before I go, I have a question - From discussions on the Postgresql irc
> channel, and from reading the TODO list on the website, I am under the
> impression that there are no plans to allow optimizer hints, such as "use index
> table_pkey".  Is this really true?

I personally don't think it's a good idea: the time spent in designing,
implementing, and maintaining a usable hint system would be significant,
and IMHO the effort is better spent on *fixing* the optimizer problems
than working around them.  There are also good arguments about how hints
wouldn't be future-proof --- for instance, the recent addition of bitmap
indexscan capability would've obsoleted an awful lot of hints, had
anyone had any on their queries.  We'd then be faced with either turning
off the hints or being forced by them to adopt inferior plans.

The direction I'd like to see us go to solve your problem is maintaining
cross-column statistics.  It's not practical to store joint stats for
every set of columns, but the existence of an index on (p2,p3) ought to
cue us that p2/p3 stats would be worth having.

            regards, tom lane

Spotting planner errors (was Re: Query planner is using wrong index.)

From
Richard Huxton
Date:
Tom Lane wrote:
> Brian Herlihy <btherl@yahoo.com.au> writes:
>> Before I go, I have a question - From discussions on the Postgresql irc
>> channel, and from reading the TODO list on the website, I am under the
>> impression that there are no plans to allow optimizer hints, such as "use index
>> table_pkey".  Is this really true?
>
> I personally don't think it's a good idea: the time spent in designing,
> implementing, and maintaining a usable hint system would be significant,
> and IMHO the effort is better spent on *fixing* the optimizer problems
> than working around them.

Tom - does the planner/executor know it's got row estimates wrong? That
is, if I'm not running an EXPLAIN ANALYSE is there a point at which we
could log "planner estimate for X out by factor of Y"?

--
   Richard Huxton
   Archonet Ltd

Richard Huxton <dev@archonet.com> writes:
> Tom - does the planner/executor know it's got row estimates wrong? That
> is, if I'm not running an EXPLAIN ANALYSE is there a point at which we
> could log "planner estimate for X out by factor of Y"?

Not at the moment, but you could certainly imagine changing the executor
to count rows even without EXPLAIN ANALYZE, and then complain during
plan shutdown.

Not sure how helpful that would be; there would be a lot of noise from
common cases such as executing underneath a LIMIT node.

            regards, tom lane

Re: Spotting planner errors (was Re: Query planner is using

From
Richard Huxton
Date:
Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
>> Tom - does the planner/executor know it's got row estimates wrong? That
>> is, if I'm not running an EXPLAIN ANALYSE is there a point at which we
>> could log "planner estimate for X out by factor of Y"?
>
> Not at the moment, but you could certainly imagine changing the executor
> to count rows even without EXPLAIN ANALYZE, and then complain during
> plan shutdown.
>
> Not sure how helpful that would be; there would be a lot of noise from
> common cases such as executing underneath a LIMIT node.

Hmm - thinking about it you'd probably want to record it similarly to
stats too. It's the fact that the planner *repeatedly* gets an estimate
wrong that's of interest.

Would it be prohibitive to total actions taken - to act as raw data for
random_page_cost / cpu_xxx_cost? If you could get a ratio of estimated
vs actual time vs the various page-fetches/index-fetches etc. we could
actually plug some meaningful numbers in.

--
   Richard Huxton
   Archonet Ltd