Home > mailing lists

bad estimates / non-scanning aggregates - Mailing list pgsql-performance

From	Ken Geis
Subject	bad estimates / non-scanning aggregates
Date	August 29, 2003 00:11:07
Msg-id	3F4E99F7.7080604@speakeasy.org Whole thread Raw
Responses	Re: bad estimates / non-scanning aggregates
List	pgsql-performance

Tree view

I'm surprised at the effort pgsql requires to run one of my queries.  I
don't know how to tune this query.

    Column   |     Type     | Modifiers
------------+--------------+-----------
  the_id     | integer      | not null
  the_date   | date         | not null
  num1       | numeric(9,4) |
  num2       | numeric(9,4) |
  num3       | numeric(9,4) |
  num4       | numeric(9,4) |
  int1       | integer      |
Indexes:
     "the_table_pkey" primary key, btree (the_id, the_date)

---------------------------------------

The query I want to run is

select stock_id, min(price_date) from day_ends group by stock_id;

---------------------------------------

Here's the plan that I get.

GroupAggregate  (cost=3711244.30..3838308.31 rows=6732 width=8)
   ->  Sort  (cost=3711244.30..3753593.36 rows=16939624 width=8)
         Sort Key: stock_id
         ->  Seq Scan on day_ends  (cost=0.00..361892.24 rows=16939624
                                    width=8)

If I set enable_seqscan = false, the plan changes to

  GroupAggregate  (cost=0.00..67716299.91 rows=6732 width=8)
    ->  Index Scan using day_ends_pkey on day_ends
            (cost=0.00..67631584.96 rows=16939624 width=8)

---------------------------------------

Now...  the first plan uses up tons of temporary space for sorting.  The
second one just runs and runs and runs.  I've tried setting the
statistics to 1000 with little effect.

So the query can get everything it needs from the index, and a full scan
of the index should be faster (the index file is less than half the size
of the data file.)  So why does the optimizer estimate so high?

Also, to get the MIN for a given group, not all values of the index need
to be seen.  Must pgsql do a full scan because it treats all aggregates
in the same way?  Are MIN and MAX used often enough to justify special
treatment, and could that be cleanly implemented?  Perhaps the aggregate
function can request the data in a certain order, be told that it is
being passed data in a certain order, and return before seeing the
entire set of data.

Food for thought...


Thanks,

Ken Geis

pgsql-performance by date:

From: Sean Chittenden
Date: 28 August 2003, 22:22:54
Subject: Re: The results of my PostgreSQL/filesystem performance tests

From: "Anders K. Pedersen"
Date: 29 August 2003, 00:50:29
Subject: Re: Queries sometimes take 1000 times the normal time

bad estimates / non-scanning aggregates - Mailing list pgsql-performance

Previous

Next