Controlling the generated plan better - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Controlling the generated plan better
Date
Msg-id 39C74517.119D45CF@cupid.suninternet.com
Whole thread Raw
List pgsql-general
Hi,

We have a database that is growing and has gotten to the stage where
sequential scans are real killers. There are some cases where the query
planner chooses to do sequential scans over extremly larges tables for
no
good reason.

For example, we have two tables, a User table and a Usage table. The
tables
are joined the Username. Very few users actually have usage and we only
want
usage that is flagged as unbilled, which is also a very small portion of
the
total table. What we want to do is summerise the usage for some
customers.

Doing the query directly gives the following plan:

select id, sum(seconds) from usage, user where user.agent = 'AAAA' and
user.username is not null and usage.username = user.username and
usage.billed = 'f' group by id;

Aggregate  (cost=43565.86 rows=23168 width=40)
  ->  Group  (cost=43565.86 rows=23168 width=40)
        ->  Sort  (cost=43565.86 rows=23168 width=40)
              ->  Nested Loop  (cost=43565.86 rows=23168 width=40)
                    ->  Seq Scan on user  (cost=333.06 rows=34 width=24)
                    ->  Seq Scan on usage (cost=1271.55 rows=6814
width=16)

It seems to have drastically overestimated the number of rows involved.
From
the restrictions in the query there are only 6 (out of 2000) users in
the
output and only 1977 (out of 20000) usage records relevent. What I think
would be the ideal plan would be (I don't know this is even a valid
plan):

Nested Loop  (cost=300 rows=60 width=24)
  ->  Seq Scan on user  (cost=60 rows=6 width=24)
  ->  Aggregate  (cost=51.26 rows=189 width=4)
        ->  Index Scan using usage_username on usage (cost=51.26
rows=189 width=4)

(I hope you can userstand what I mean by that. Basically to find the
customers we want output for, that do an index scan for each one of
those).

So, two questions. How can I force/encourage the generation of the
second
plan. And is there anything else I can use to improve the results.

I have indicies on all the relevent fields and everything has been
vacuum
analyzed.
--
Martijn van Oosterhout <kleptog@cupid.suninternet.com>
http://cupid.suninternet.com/~kleptog/

pgsql-general by date:

Previous
From: Steve Heaven
Date:
Subject: Re: Is it possible to search for sub-strings...
Next
From: mikeo
Date:
Subject: Re: Oracle Conversions Tools