Thread: Re: Forcing a query plan

Re: Forcing a query plan

From
"Nick Barr"
Date:
Hi,

I am assuming that you already have indexes on the appropriate columns, and
do a vacuum analyze and not just a vacuum. Columns I would normally put
indexes on are

image.imageid            -- primary key
image.containerid
image.state
ancestry.containerid
ancestry.ancestorid

Have you tried doing a

ALTER TABLE image ALTER COLUMN imageid SET STATISTICS 100;
ALTER TABLE image ALTER COLUMN containerid SET STATISTICS 100;

followed by an analyze? It may be that the data contained within the image
table in skewed a little, thereby messing up the planners statistics.

Cant seem to access techdocs at the moment but the page you want to be
looking at is "10.2. Statistics Used by the Planner", and the page name is
"planner-stats.html".

Hope that helps

Nick


----- Original Message -----
From: "Robert Wille" <rwille@iarchives.com>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, July 02, 2003 8:19 PM
Subject: [GENERAL] Forcing a query plan


I have a number of very common queries that the optimizer plans a very
inefficient plan for. I am using postgres 7.2.3. I vacuum hourly. I'm
wondering what I can do to make the queries faster.

Here are the relevant tables:

create table image(
    imageid integer not null,     /* The image's ID */
    containerid integer not null,     /* The container that owns it */
    name varchar(120) not null,     /* Its name */
    state bigint not null default 0,    /* Its state */
    primary key (imageid),
    unique (containerid, name)     /* All images in a container must be
uniquely named */
);

create table ancestry(
    containerid integer not null,     /* The container that has an ancestor
*/
    ancestorid integer not null,     /* The ancestor of the container */
    unique (containerid, ancestorid),
    unique (ancestorid, containerid)
);

I have somewhere around 3M rows in the image table, and 37K rows in the
ancestry table. The following is representative of some of the common
queries I issue:

select * from image natural join ancestry where ancestorid=1000000 and
(state & 7::bigint) = 0::bigint;

When I ask postgres to EXPLAIN it, I get the following:

Merge Join  (cost=81858.22..81900.60 rows=124 width=49)
  ->  Sort  (cost=81693.15..81693.15 rows=16288 width=41)
        ->  Seq Scan on image  (cost=0.00..80279.17 rows=16288 width=41)
  ->  Sort  (cost=165.06..165.06 rows=45 width=8)
        ->  Index Scan using ancestry_ancestorid_key on ancestry
(cost=0.00..163.83 rows=45 width=8)

It appears to me that the query executes as follows:

1. Scan every row in the image table to find those where (state & 7::bigint)
= 0::bigint
2. Sort the results
3. Use an index on ancestry to find rows where ancestorid=1000000
4. Sort the results
5. Join the two

It seems to me that if this query is going to return a small percentage of
the rows (which is the common case), it could be done much faster by first
joining (all columns involved in the join are indexed), and then by applying
the (state & 7::bigint) = 0::bigint constraint to the results. I realize
that the query planner is going to have a difficult time estimating the
number of rows returned by the bit operator. However, I'd be happy forcing
it to always perform the join first, and then apply the state constraint to
the results.

Similarly, when I update, I get the following:

explain update image set state=0 from ancestry where ancestorid=1000000 and
ancestry.containerid=image.containerid and (state & 7::bigint) = 0::bigint;

NOTICE:  QUERY PLAN:

Merge Join  (cost=81841.92..81884.30 rows=124 width=43)
  ->  Sort  (cost=81676.74..81676.74 rows=16288 width=39)
        ->  Seq Scan on image  (cost=0.00..80279.17 rows=16288 width=39)
  ->  Sort  (cost=165.19..165.19 rows=45 width=4)
        ->  Index Scan using ancestry_ancestorid_key on ancestry
(cost=0.00..163.95 rows=45 width=4)

Is there any way to give the planner a hint, or reword the query and update
so that it executes the way I want?

Thanks in advance.

Robert Wille