Thread: sequential scan on select distinct

sequential scan on select distinct

From

Ole Langbehn

Date:

06 October 2004, 10:29:28

Hi,

I'm using Postgres 7.4.5. Tables are analyzed & vacuumed.

I am wondering why postgresql never uses an index on queries of the type
'select distinct ...' while e.g. mysql uses the index on the same query.
See the following explains:


postgresql:

explain analyze select distinct "land" from "customer_dim";

---------------------------------------------------------------------------------------------------------------------------------------+
                                                              QUERY PLAN
              | 

---------------------------------------------------------------------------------------------------------------------------------------+
 Unique  (cost=417261.85..430263.66 rows=18 width=15) (actual time=45875.235..67204.694 rows=103 loops=1)
              | 
   ->  Sort  (cost=417261.85..423762.75 rows=2600362 width=15) (actual time=45875.226..54114.473 rows=2600362 loops=1)
              | 
         Sort Key: land
              | 
         ->  Seq Scan on customer_dim  (cost=0.00..84699.62 rows=2600362 width=15) (actual time=0.048..10733.227
rows=2600362loops=1) | 
 Total runtime: 67246.465 ms
              | 

---------------------------------------------------------------------------------------------------------------------------------------+


mysql:

explain select DISTINCT `customer_dim`.`land` from `customer_dim`;
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
    table     | type  | possible_keys |      key      | key_len |  ref   |  rows   |    Extra    |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
 customer_dim | index | [NULL]        | IDX_cstd_land | 81      | [NULL] | 2600362 | Using index |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
1 row in result (first row: 8 msec; total: 9 msec)



The result set contains 103 rows (but i get this behavior with every query of
this kind). My tables consist of at least a million rows.

The indexes on the column 'land' are standard indexes, so in case of
postgresql, it's a btree-index. I've tried to change the index type, but to no
avail.

So, why doesn't postgresql use the index, and (how) could i persuade postgresql
to use an index for this type of query?

TiA

--
Ole Langbehn

freiheit.com technologies gmbh
Theodorstr. 42-90 / 22761 Hamburg, Germany
fon       +49 (0)40 / 890584-0
fax       +49 (0)40 / 890584-20

Freie Software durch Bücherkauf fördern | http://bookzilla.de/

Re: sequential scan on select distinct

From

Pierre-Frédéric Caillaud

Date:

06 October 2004, 11:16:50

    You could try :

    explain analyze select "land" from "customer_dim" group by "land";
    It will be a lot faster but I can't make it use the index on my machine...

    Example :

    create table dummy as (select id, id%255 as number from a large table
with 1M rows);
    so we have a table with 256 (0-255) disctinct "number" values.

--------------------------------------------------------------------------------
=> explain analyze select distinct number from dummy;
  Unique  (cost=69.83..74.83 rows=200 width=4) (actual
time=13160.490..14414.004 rows=255 loops=1)
    ->  Sort  (cost=69.83..72.33 rows=1000 width=4) (actual
time=13160.483..13955.792 rows=1000000 loops=1)
          Sort Key: number
          ->  Seq Scan on dummy  (cost=0.00..20.00 rows=1000 width=4)
(actual time=0.052..1759.145 rows=1000000 loops=1)
  Total runtime: 14442.872 ms

=>    Horribly slow because it has to sort 1M rows for the Unique.

--------------------------------------------------------------------------------
=> explain analyze select number from dummy group by number;
  HashAggregate  (cost=22.50..22.50 rows=200 width=4) (actual
time=1875.214..1875.459 rows=255 loops=1)
    ->  Seq Scan on dummy  (cost=0.00..20.00 rows=1000 width=4) (actual
time=0.107..1021.014 rows=1000000 loops=1)
  Total runtime: 1875.646 ms

=>    A lot faster because it HashAggregates instead of sorting (but still
seq scan)

--------------------------------------------------------------------------------
Now :
create index dummy_idx on dummy(number);
Let's try again.
--------------------------------------------------------------------------------
explain analyze select distinct number from dummy;
  Unique  (cost=0.00..35301.00 rows=200 width=4) (actual
time=0.165..21781.732 rows=255 loops=1)
    ->  Index Scan using dummy_idx on dummy  (cost=0.00..32801.00
rows=1000000 width=4) (actual time=0.162..21154.752 rows=1000000 loops=1)
  Total runtime: 21782.270 ms

=> Index scan the whole table. argh. I should have ANALYZized.
--------------------------------------------------------------------------------
explain analyze select number from dummy group by number;
  HashAggregate  (cost=17402.00..17402.00 rows=200 width=4) (actual
time=1788.425..1788.668 rows=255 loops=1)
    ->  Seq Scan on dummy  (cost=0.00..14902.00 rows=1000000 width=4)
(actual time=0.048..960.063 rows=1000000 loops=1)
  Total runtime: 1788.855 ms
=>    Still the same...
--------------------------------------------------------------------------------
Let's make a function :
The function starts at the lowest number and advances to the next number
in the index until they are all exhausted.


CREATE OR REPLACE FUNCTION sel_distinct()
    RETURNS SETOF INTEGER
    LANGUAGE plpgsql
    AS '
DECLARE
    pos INTEGER;
BEGIN
    SELECT INTO pos number FROM dummy ORDER BY number ASC LIMIT 1;
    IF NOT FOUND THEN
        RAISE NOTICE ''no records.'';
        RETURN;
    END IF;

    LOOP
        RETURN NEXT pos;
        SELECT INTO pos number FROM dummy WHERE number>pos ORDER BY number ASC
LIMIT 1;
        IF NOT FOUND THEN
            RETURN;
        END IF;
    END LOOP;
END;
';

explain analyze select * from sel_distinct();
  Function Scan on sel_distinct  (cost=0.00..12.50 rows=1000 width=4)
(actual time=215.472..215.696 rows=255 loops=1)
  Total runtime: 215.839 ms

    That's better !
--------------------------------------------------------------------------------
Why not use DESC instead of ASC ?

CREATE OR REPLACE FUNCTION sel_distinct()
    RETURNS SETOF INTEGER
    LANGUAGE plpgsql
    AS '
DECLARE
    pos INTEGER;
BEGIN
    SELECT INTO pos number FROM dummy ORDER BY number DESC LIMIT 1;
    IF NOT FOUND THEN
        RAISE NOTICE ''no records.'';
        RETURN;
    END IF;

    LOOP
        RETURN NEXT pos;
        SELECT INTO pos number FROM dummy WHERE number<pos ORDER BY number DESC
LIMIT 1;
        IF NOT FOUND THEN
            RETURN;
        END IF;
    END LOOP;
END;
';

explain analyze select * from sel_distinct();
  Function Scan on sel_distinct  (cost=0.00..12.50 rows=1000 width=4)
(actual time=13.500..13.713 rows=255 loops=1)
  Total runtime: 13.857 ms

    Hum hum ! Again, a lot better !
    Index scan backwards seems a lot faster than index scan forwards. Why, I
don't know, but here you go from 15 seconds to 14 milliseconds...

    I don't know WHY (oh why) postgres does not use this kind of strategy
when distinct'ing an indexed field... Anybody got an idea ?

Re: sequential scan on select distinct

From

Ole Langbehn

Date:

06 October 2004, 17:08:49

Am Mittwoch, 6. Oktober 2004 12:19 schrieb Pierre-Frédéric Caillaud:
>  You could try :
>
>  explain analyze select "land" from "customer_dim" group by "land";
>  It will be a lot faster but I can't make it use the index on my machine...
this already speeds up my queries to about 1/4th of the time, which is about
the range of mysql and oracle.
>
>  Example :
>
> [..]
>
>  Hum hum ! Again, a lot better !
>  Index scan backwards seems a lot faster than index scan forwards. Why, I
> don't know, but here you go from 15 seconds to 14 milliseconds...
thanks for this very extensive answer, it helped me a lot.
>
>  I don't know WHY (oh why) postgres does not use this kind of strategy
> when distinct'ing an indexed field... Anybody got an idea ?
That's the big question I still would like to see answered too. Can anyone
tell us?

TiA
--
Ole Langbehn

Re: sequential scan on select distinct

From

Greg Stark

Date:

06 October 2004, 17:41:36

Pierre-Frédéric Caillaud <lists@boutiquenumerique.com> writes:

>     I don't know WHY (oh why) postgres does not use this kind of strategy
> when distinct'ing an indexed field... Anybody got an idea ?

Well there are two questions here. Why given the current plans available does
postgres choose a sequential scan instead of an index scan. And why isn't
there this kind of "skip index scan" available.

Postgres chooses a sequential scan with a sort (or hash aggregate) over an
index scan because it expects it to be faster. sequential scans are much
faster than random access scans of indexes, plus index scans need to read many
more blocks. If you're finding the index scan to be just as fast as sequential
scans you might consider lowering random_page_cost closer to 1.0. But note
that you may be getting fooled by a testing methodology where more things are
cached than would be in production.

why isn't a "skip index scan" plan available? Well, nobody's written the code
yet. It would part of the same code needed to get an index scan used for:

    select y,min(x) from bar group by y

And possibly also related to the TODO item:

    Use index to restrict rows returned by multi-key index when used with
    non-consecutive keys to reduce heap accesses

    For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and col3 =
    9, spin though the index checking for col1 and col3 matches, rather than
    just col1

Note that the optimizer would have to make a judgement call based on the
expected number of distinct values. If you had much more than 256 distinct
values then the your plpgsql function wouldn't have performed well at all.

--
greg

Re: sequential scan on select distinct

From

Pierre-Frédéric Caillaud

Date:

06 October 2004, 18:31:28

    There are even three questions here :

    - given that 'SELECT DISTINCT field FROM table' is exactly
the same as 'SELECT field FROM table GROUP BY field", postgres could
transform the first into the second and avoid itself a (potentially
killer) sort.

    On my example the table was not too large but on a very large table,
sorting all the values and then discinct'ing them does not look too
appealing.

    Currently Postgres does Sort+Unique, but there could be a DistinctSort
instead of a Sort, that is a thing that sorts and removes the duplicates
at the same time. Not that much complicated to code than a sort, and much
faster in this case.
    Or there could be a DistinctHash, which would be similar or rather
identical to a HashAggregate and would again skip the sort.

    It would (as a bonus) speed up queries like UNION (not ALL), that kind of
things. For example :

  explain (select number from dummy) union (select number from dummy);
  Unique  (cost=287087.62..297087.62 rows=2000000 width=4)
    ->  Sort  (cost=287087.62..292087.62 rows=2000000 width=4)
          Sort Key: number
          ->  Append  (cost=0.00..49804.00 rows=2000000 width=4)
                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..24902.00
rows=1000000 width=4)
                      ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4)
                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..24902.00
rows=1000000 width=4)
                      ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4)

    This is scary !

I can rewrite it as such (and the planner could, too) :

explain select * from ((select number from dummy) union all (select number
 from dummy)) as foo group by number;
  HashAggregate  (cost=74804.00..74804.00 rows=200 width=4)
    ->  Subquery Scan foo  (cost=0.00..69804.00 rows=2000000 width=4)
          ->  Append  (cost=0.00..49804.00 rows=2000000 width=4)
                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..24902.00
rows=1000000 width=4)
                      ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4)
                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..24902.00
rows=1000000 width=4)
                      ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4)

which avoids a large sort...

However there must be cases in which performing a sort is faster, like
when there are a lot of distinct values and the HashAggregate becomes huge
too.

> Well there are two questions here. Why given the current plans available
> does
> postgres choose a sequential scan instead of an index scan. And why isn't

    Well because it needs to get all the rows in the table in order.
    in this case seq scan+sort is about twice as fast as index scan.
    Interestingly, once I ANALYZED the table, postgres will chooses to
index-scan, which is slower.

> there this kind of "skip index scan" available.

    It would be really nice to have a skip index scan available.

    I have an other idea, lets call it the indexed sequential scan :
    When pg knows there are a lot of rows to access, it will ignore the index
and seqscan. This is because index access is very random, thus slow.
However postgres could implement an "indexed sequential scan" where :
    - the page numbers for the matching rows are looked up in the index
    (this is fast as an index has good locality)
    - the page numbers are grouped so we have a list of pages with one and
only one instance of each page number
    - the list is then sorted so we have page numbers in-order
    - the pages are loaded in sorted order (doing a kind of partial
sequential scan) which would be faster than reading them randomly.

    Other ideas later


> Postgres chooses a sequential scan with a sort (or hash aggregate) over
> an
> index scan because it expects it to be faster. sequential scans are much
> faster than random access scans of indexes, plus index scans need to
> read many
> more blocks. If you're finding the index scan to be just as fast as
> sequential
> scans you might consider lowering random_page_cost closer to 1.0. But
> note
> that you may be getting fooled by a testing methodology where more
> things are
> cached than would be in production.
>
> why isn't a "skip index scan" plan available? Well, nobody's written the
> code
> yet. It would part of the same code needed to get an index scan used for:
>
>     select y,min(x) from bar group by y
>
> And possibly also related to the TODO item:
>
>     Use index to restrict rows returned by multi-key index when used with
>     non-consecutive keys to reduce heap accesses
>
>     For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and
> col3 =
>     9, spin though the index checking for col1 and col3 matches, rather
> than
>     just col1
>
>
> Note that the optimizer would have to make a judgement call based on the
> expected number of distinct values. If you had much more than 256
> distinct
> values then the your plpgsql function wouldn't have performed well at
> all.
>

Re: sequential scan on select distinct

From

Tom Lane

Date:

06 October 2004, 20:40:52

Greg Stark <gsstark@mit.edu> writes:
> why isn't a "skip index scan" plan available? Well, nobody's written the code
> yet.

I don't really think it would be a useful plan anyway.  What *would* be
useful is to support HashAggregate as an implementation alternative for
DISTINCT --- currently I believe we only consider that for GROUP BY.
The DISTINCT planning code is fairly old and crufty and hasn't been
redesigned lately.

            regards, tom lane

Re: sequential scan on select distinct

From

Greg Stark

Date:

06 October 2004, 21:02:40

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > why isn't a "skip index scan" plan available? Well, nobody's written the code
> > yet.
>
> I don't really think it would be a useful plan anyway.

Well it would clearly be useful in this test case, where has a small number of
distinct values in a large table, and an index on the column. His plpgsql
function that emulates such a plan is an order of magnitude faster than the
hash aggregate plan even though it has to do entirely separate index scans for
each key value.

I'm not sure where the break-even point would be, but it would probably be
pretty low. Probably somewhere around the order of 1% distinct values in the
table. That might be uncommon, but certainly not impossible.

But regardless of how uncommon it is, it could be considered important in
another sense: when you need it there really isn't any alternative. It's an
algorithmic improvement with no bound on the performance difference. Nothing
short of using a manually maintained materialized view would bring the
performance into the same ballpark.

So even if it's only useful occasionally, not having the plan available can
leave postgres with no effective plan for what should be an easy query.

--
greg

Re: sequential scan on select distinct

From

Tom Lane

Date:

06 October 2004, 21:20:47

Greg Stark <gsstark@mit.edu> writes:
> But regardless of how uncommon it is, it could be considered important in
> another sense: when you need it there really isn't any alternative. It's an
> algorithmic improvement with no bound on the performance difference.

[ shrug... ]  There are an infinite number of special cases for which
that claim could be made.  The more we load down the planner with
seldom-useful special cases, the *lower* the overall performance will
be, because we'll waste cycles checking for the special cases in every
case ...

In this particular case, it's not merely a matter of the planner, either.
You'd need some new type of plan node in the executor, so there's a
pretty fair amount of added code bulk that will have to be written and
then maintained.

I'm open to being persuaded that this is worth doing, but the bar is
going to be high; I think there are a lot of other more-profitable ways
to invest our coding effort and planning cycles.

            regards, tom lane

Re: sequential scan on select distinct

From

Pierre-Frédéric Caillaud

Date:

07 October 2004, 13:00:25

> I don't really think it would be a useful plan anyway.  What *would* be
> useful is to support HashAggregate as an implementation alternative for
> DISTINCT --- currently I believe we only consider that for GROUP BY.
> The DISTINCT planning code is fairly old and crufty and hasn't been
> redesigned lately.
>
>             regards, tom lane

    I see this as a minor annoyance only because I can write GROUP BY
instead of DISTINCT and get the speed boost. It probably annoys people
trying to port applications to postgres though, forcing them to rewrite
their queries.

* SELECT DISTINCT : 21442.296 ms (by default, uses an index scan)
disabling index_scan => Sort + Unique : 14512.105 ms

* GROUP BY : 1793.651 ms using HashAggregate
* skip index scan by function : 13.833 ms

    The HashAggregate speed boost is good, but rather pathetic compared
to a "skip index scan" ; but it's still worth having if updating the
DISTINCT code is easy.

    Note that it would also benefit UNION queries which apparently use
DISTINCT
internally and currently produce this :
------------------------------------------------------------------------------
explain analyze select number from
    ((select number from dummy) union (select number from dummy)) as foo;

   Subquery Scan foo  (cost=287087.62..317087.62 rows=2000000 width=4)
(actual time=33068.776..35575.330 rows=255 loops=1)
     ->  Unique  (cost=287087.62..297087.62 rows=2000000 width=4) (actual
time=33068.763..35574.126 rows=255 loops=1)
           ->  Sort  (cost=287087.62..292087.62 rows=2000000 width=4)
(actual time=33068.757..34639.180 rows=2000000 loops=1)
                 Sort Key: number
                 ->  Append  (cost=0.00..49804.00 rows=2000000 width=4)
(actual time=0.055..7412.551 rows=2000000 loops=1)
                       ->  Subquery Scan "*SELECT* 1"  (cost=0.00..24902.00
rows=1000000 width=4) (actual time=0.054..3104.165 rows=1000000 loops=1)
                             ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4) (actual time=0.051..1792.348 rows=1000000 loops=1)
                       ->  Subquery Scan "*SELECT* 2"  (cost=0.00..24902.00
rows=1000000 width=4) (actual time=0.048..3034.462 rows=1000000 loops=1)
                             ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4) (actual time=0.044..1718.682 rows=1000000 loops=1)
   Total runtime: 36265.662 ms
------------------------------------------------------------------------------

But could instead do this :
explain analyze select number from
    ((select number from dummy) union all (select number from dummy)) as foo
group by number;

   HashAggregate  (cost=74804.00..74804.00 rows=200 width=4) (actual
time=10753.648..10753.890 rows=255 loops=1)
     ->  Subquery Scan foo  (cost=0.00..69804.00 rows=2000000 width=4)
(actual time=0.059..8992.084 rows=2000000 loops=1)
           ->  Append  (cost=0.00..49804.00 rows=2000000 width=4) (actual
time=0.055..6688.639 rows=2000000 loops=1)
                 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..24902.00
rows=1000000 width=4) (actual time=0.054..2749.708 rows=1000000 loops=1)
                       ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4) (actual time=0.052..1640.427 rows=1000000 loops=1)
                 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..24902.00
rows=1000000 width=4) (actual time=0.038..2751.916 rows=1000000 loops=1)
                       ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=4) (actual time=0.034..1637.818 rows=1000000 loops=1)
   Total runtime: 10754.120 ms
------------------------------------------------------------------------------
    A 3x speedup, but still a good thing to have.
    When I LIMIT the two subqueries to 100k rows instead of a million, the
times are about equal.
    When I LIMIT one of the subqueries to 100k and leave the other to 1M,
        UNION ALL        17949.609 ms
        UNION + GROUP BY    6130.417 ms

    Still some performance to be gained...

------------------------------------------------------------------------------
    Of course it can't use a skip index scan on a subquery, but I could
instead :
    I know it's pretty stupid to use the same table twice but it's just an
example. However, if you think about table partitions and views, a "select
distinct number" from a view having multiple partitions would yield this
type of query, and that table partitioning seems like a hot subject lately.

let's create a dummy example view :
create view dummy_view as (select * from dummy) union all (select * from
dummy);

explain analyze select number from dummy_view group by number;
   HashAggregate  (cost=74804.00..74804.00 rows=200 width=4) (actual
time=10206.456..10206.713 rows=255 loops=1)
     ->  Subquery Scan dummy_view  (cost=0.00..69804.00 rows=2000000
width=4) (actual time=0.060..8431.776 rows=2000000 loops=1)
           ->  Append  (cost=0.00..49804.00 rows=2000000 width=8) (actual
time=0.055..6122.125 rows=2000000 loops=1)
                 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..24902.00
rows=1000000 width=8) (actual time=0.054..2456.566 rows=1000000 loops=1)
                       ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=8) (actual time=0.048..1107.151 rows=1000000 loops=1)
                 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..24902.00
rows=1000000 width=8) (actual time=0.036..2471.748 rows=1000000 loops=1)
                       ->  Seq Scan on dummy  (cost=0.00..14902.00
rows=1000000 width=8) (actual time=0.031..1104.482 rows=1000000 loops=1)
   Total runtime: 10206.945 ms

A smarter planner could rewrite it into this :
select number from ((select distinct number from dummy) union (select
distinct number from dummy)) as foo;

and notice it would index-skip-scan the two partitions (here, example with
my function)

explain analyze select number from ((select sel_distinct as number from
sel_distinct()) union all (select sel_distinct as number
  from sel_distinct())) as foo group by number;
   HashAggregate  (cost=70.00..70.00 rows=200 width=4) (actual
time=29.078..29.332 rows=255 loops=1)
     ->  Subquery Scan foo  (cost=0.00..65.00 rows=2000 width=4) (actual
time=13.378..28.587 rows=510 loops=1)
           ->  Append  (cost=0.00..45.00 rows=2000 width=4) (actual
time=13.373..28.003 rows=510 loops=1)
                 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..22.50 rows=1000
width=4) (actual time=13.373..13.902 rows=255 loops=1)
                       ->  Function Scan on sel_distinct  (cost=0.00..12.50
rows=1000 width=4) (actual time=13.367..13.619 rows=255 loops=1)
                 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..22.50 rows=1000
width=4) (actual time=13.269..13.800 rows=255 loops=1)
                       ->  Function Scan on sel_distinct  (cost=0.00..12.50
rows=1000 width=4) (actual time=13.263..13.512 rows=255 loops=1)
   Total runtime: 29.569 ms

    So, if a query with UNION or UNION ALL+DISTINCT tries to put DISTINCT
inside the subqueries and yields an index skip scan, here is a massive
speedup.

    You will tell me "but if the UNION ALL has 10 subqueries, planning is
going to take forever !"
    Well not necessarily. The above query with 10 subqueries UNIONALLed then
GROUPed takes :
    UNION : 320509.522 ms (the Sort + Unique truly becomes humongous).
    UNION ALL + GROUP : 54586.759 ms (you see there is already interest in
rewiring DISTINCT/UNION)
    skip scan + UNION : 147.941 ms
    skip scan + UNION ALL + group : 147.313 ms

> Well it would clearly be useful in this test case, where has a small
> number of distinct values in a large table, and an index on the column.
> His plpgsql function that emulates such a plan is an order of magnitude
> faster than the hash aggregate plan even though it has to do entirely
> separate index scans for each key value.

    Actually, it is more like two orders of magnitude (100x faster) :
in fact the time for a seq scan is O(N rows) whereas the time for the skip
index scan should be, if I'm not mistaken, something like
O((N distinct values) * (log N rows)) ; in my case there are 256 distinct
values for 1M rows and a speedup of 100x, so if there were 10M rows the
speedup would be like 300x (depending on the base of the log which I assume
is 2). And if the skip index scan is implemented in postgres instead of in
a function, it could be much, much faster...

> [ shrug... ]  There are an infinite number of special cases for which
> that claim could be made.  The more we load down the planner with
> seldom-useful special cases, the *lower* the overall performance will
> be, because we'll waste cycles checking for the special cases in every
> case ...

    In a general way, you are absolutely right... special-casing a case
for a speedup of 2x for instance would be worthless... but we are
considering
a HUGE speedup here. And, if this mode is only used for DISTINCT and
GROUP BY queries, no planning cycles will be wasted at all on queries which
do not use DISTINCT nor GROUP BY.

    Present state is that DISTINCT and UNION are slow with or without using
the GROUP BY trick. Including the index skip scan in the planning options
would only happen when appropriate cases are detected. This detection
would be very fast. The index skip scan would then speed up the query so
much that the additional planning cost would not matter. If there are many
distinct values, so that seq scan is faster than skip scan, the query will
be slow enough anyway so that the additional planning cost does not
matter. The only problem cases are queries with small tables where startup
time is important, but in that case the planner has stats about the number
of rows in the table, and again excluding skip scan from the start would
be fast.

    Lateral thought :
    Create a new index type which only indexes one row for each value. This
index would use very little space and would be very fast to update (on my
table it would index only 256 values). Keep the Index Scan code and all,
but use this index type when you can.
    This solution is less general and also has a few drawbacks.

    Another thought :
\d dummy
                Table «public.dummy»
   Colonne |  Type   | Modificateurs
---------+---------+---------------
   id      | integer |
   number  | integer |
Index :
      «dummy_idx» btree (number)
      «dummy_idx_2» btree (number, id)

explain analyze select * from dummy where id=1;

   Seq Scan on dummy  (cost=0.00..17402.00 rows=1 width=8) (actual
time=274.480..1076.092 rows=1 loops=1)
     Filter: (id = 1)
   Total runtime: 1076.168 ms

explain analyze select * from dummy where number between 0 and 256 and
id=1;
   Index Scan using dummy_idx_2 on dummy  (cost=0.00..6.02 rows=1 width=8)
(actual time=1.449..332.020 rows=1 loops=1)
     Index Cond: ((number >= 0) AND (number <= 256) AND (id = 1))
   Total runtime: 332.112 ms

    In this case we have no index on id, but using a skip index scan,
emulated by the "between" to force use of the (number,id) index, even
though it must look in all the 256 possible values for number, still
speeds it up by 3x. Interestingly, with only 16 distinct values, the time
is quite the same. Thus, the "skip index scan" could be used in cases
where there is a multicolumn index, but the WHERE misses a column.

This would not waste planning cycles because :
- If the index we need exists and there is no "distinct" or "group by"
without aggregate, the planner does not even consider using the skip index
scan.
- If the index we need does not exist, the planner only loses the cycles
needed to check if there is a multicolumn index which may be used. In this
case, either there is no such index, and a seq scan is chosen, which will
be slow, so the time wasted for the check is negligible ; or an index is
found and can be used, and the time gained by the skip index scan is well
amortized.

    Currently one has to carefully consider which queries will be used
frequently and need indexes, and which ones are infrequent and don't
justify an index (but these queries will be very slow). With the skip
index scan, these less frequent queries don't always mean a seq scan. Thus
people will need to create less infrequently used indexes, and will have a
higher INSERT/UPDATE speed/

------------------------------------------------------------------------------------------------

    The skip scan would also be a winner on this type of query which is a
killer, a variant of the famous 'TOP 10' query :
EXPLAIN SELECT max(id), number FROM dummy GROUP BY number; ->  2229.141 ms

    Postgres uses a Seq scan + HashAggregate. Come on, we have an index btree
(number, id), use it ! A simple customization on my skip scan emulation
function takes 13.683 ms...

    I know that Postgres does not translate max() on on indexed column to
ORDER BY column DESC LIMIT 1, because it would be extremely hard to
implement due to the general nature of aggregates which is a very good
thing. It does not bother me because I can still write ORDER BY column
DESC LIMIT 1.

    Where it does bother me is if I want the highest ID from each number,
which can only be expressed by
SELECT max(id), number FROM dummy GROUP BY number;
    and not with LIMITs.

    Suppose I want the first 10 higher id's for each number, which is another
variant on the "killer top 10 query". I'm stuck, I cannot even use max(),
I have to write a custom aggregate which would keep the 10 highest values,
which would be very slow, so I have to use my function and put a LIMIT 10
instead of a LIMIT 1 in each query, along with a FOR and some other
conditions to check if there are less than 10 id's for a number, etc,
which more or less amounts to "select the next number, then select the
associated id's". It'll still be fast a lot faster than seq scan, but it
gets more and more complicated.

    However I'd like to write :

    select number,id from dummy ORDER BY number DESC, id DESC MULTILIMIT
50,10;
    The MULTILIMIT means "I want 50 numbers and 10 id's for each number."
    MULTILIMIT NULL,10 would mean "I want all numbers and 10 id's for each
number."
    NULL is not mandatory, it could also be -1, a keyword or something.
MULTILIMIT could simply be LIMIT too, because LIMIT takes one parameter.
    The OFFSET clause could also evolve accordingly.

    And this would naturally use a skip index scan, and benefit a whole class
of queries which have traditionnaly been difficult to get right...

    Conclusion :

    smarting up the DISTINCT planner has the following benefits :
    - speedup on DISTINCT
    - speedup on UNION which seems to use DISTINCT internally

    index skip scan has the following benefits :
    - massive speedup (x100) on queries involving DISTINCT or its GROUP BY
variant
    - same thing (x300) on UNION queries if the parser tries to rewrite the
query and put the DISTINCT inside the subqueries
    - paves the way for a MULTILIMIT which gives an elegant, and very
efficient way of expressing traditionnaly difficult queries like the "Top
10 by category" which are used quite often and give headaches to dba's.
    - Possibility to use a multicolumn index with a WHERE not including all
left columns

    index skip scan has the following drawbacks :
    - more complexity
    - additional planning time

    This last drawback is in fact, limited because :
    - It is easy and fast to know when the index skip scan will never be
used, so in most queries which won't need it, the possibility can be
eliminated without wasting cycles in planning
    - When it is used, the performance gains are so massive that it is
justified
    - People who use many queries where planning time is significant
comparing to execution time are probably using SQL functions or prepared
queries.

    Enough arguments, maybe not to convince you, but to have a second thought
on it ?

---------------------------------------------------------------

    Side Note :

    What do you think about the idea of an "UniqueSort" which would do
sort+unique in one pass ? This could also be simple to code, and would also
offer advantages to all queries using UNION. The sort would be faster and
consume less storage space because the data size would diminish as
duplicates
are eliminated along the way.

Re: sequential scan on select distinct

From

Tom Lane

Date:

07 October 2004, 16:35:48

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?= <lists@boutiquenumerique.com> writes:
>     Present state is that DISTINCT and UNION are slow with or without using
> the GROUP BY trick. Including the index skip scan in the planning options
> would only happen when appropriate cases are detected. This detection
> would be very fast.

You have no basis whatever for making that last assertion; and since
it's the critical point, I don't intend to let you slide by without
backing it up.  I think that looking for relevant indexes would be
nontrivial; the more so in cases like you've been armwaving about just
above, where you have to find a relevant index for each of several
subqueries.  The fact that the optimization wins a lot when it wins
is agreed, but the time spent trying to apply it when it doesn't work
is a cost that has to be set against that.  I don't accept your premise
that every query for which skip-index isn't relevant is so slow that
planning time does not matter.

            regards, tom lane

Re: sequential scan on select distinct

From

Greg Stark

Date:

07 October 2004, 18:08:21

Pierre-Frédéric Caillaud <lists@boutiquenumerique.com> writes:

>     I see this as a minor annoyance only because I can write GROUP BY
> instead of DISTINCT and get the speed boost. It probably annoys people
> trying to port applications to postgres though, forcing them to rewrite
> their queries.

Yeah, really DISTINCT and DISTINCT ON are just special cases of GROUP BY. It
seems it makes more sense to put the effort into GROUP BY and just have
DISTINCT and DISTINCT ON go through the same code path. Effectively rewriting
it internally as a GROUP BY.

The really tricky part is that a DISTINCT ON needs to know about a first()
aggregate. And to make optimal use of indexes, a last() aggregate as well. And
ideally the planner/executor needs to know something is magic about
first()/last() (and potentially min()/max() at some point) and that they don't
need the complete set of tuples to calculate their results.

--
greg

Re: sequential scan on select distinct

From

Ole Langbehn

Date:

07 October 2004, 18:19:32

Am Donnerstag, 7. Oktober 2004 14:01 schrieb Pierre-Frédéric Caillaud:
>  Side Note :
>
>  What do you think about the idea of an "UniqueSort" which would do
> sort+unique in one pass ?
This is what oracle does and it is quite fast with it...
--
Ole Langbehn

freiheit.com technologies gmbh
Theodorstr. 42-90 / 22761 Hamburg, Germany
fon       +49 (0)40 / 890584-0
fax       +49 (0)40 / 890584-20

Freie Software durch Bücherkauf fördern | http://bookzilla.de/

Re: sequential scan on select distinct

From

Tom Lane

Date:

07 October 2004, 18:26:49

Ole Langbehn <ole@freiheit.com> writes:
>> What do you think about the idea of an "UniqueSort" which would do
>> sort+unique in one pass ?

> This is what oracle does and it is quite fast with it...

Hashing is at least as fast, if not faster.

            regards, tom lane

Re: sequential scan on select distinct

From

Pierre-Frédéric Caillaud

Date:

08 October 2004, 09:51:56

> The really tricky part is that a DISTINCT ON needs to know about a
> first()
> aggregate. And to make optimal use of indexes, a last() aggregate as
> well. And
> ideally the planner/executor needs to know something is magic about
> first()/last() (and potentially min()/max() at some point) and that they
> don't
> need the complete set of tuples to calculate their results.

    I'm going to be accused of hand-waving again, but please pardon me, I'm
enthusiastic, and I like to propose new idead, you can kick me if you
don't like them or if I put out too much uninformed bull !

    Idea :

    The aggregate accumulation function could have a way to say :
"stop ! I've had enough of these values ! Get on with the next item in the
GROUP BY clause !"
    I don't know how, or if, the planner could use this (guess: no) or the
index scan use this (guess: no) but it would at least save the function
calls. I'd guess this idea is quite useless.

    Aggregates could have an additional attribute saying how much values it
will need ('max_rows' maybe). This would prevent the creation of "magic"
aggregates for max() (which is a kind of special-casing), keep it generic
(so users can create magic aggregates like this).
    Aggregates already consist of a bunch of functions (start, accumulate,
return retuls) so this could be just another element in this set.
    This information would be known ahead of time and could influence the
query plans too. I'm going to wave my hand and say "not too much planning
cost" because I guess the aggregate details are fetched during planning so
fetching one more attribute would not be that long...
    For instance first() would have max_rows=1, and users could code a "first
N accumulator-in-array" which would have max_rows=N...
    This does not solve the problem of min() and max() which need max_rows=1
only if the result is sorted... hum... maybe another attribute like
max_rows_sorted = 1 for max() and -1 for min() meaning 'first 1' or 'last
1' (or first N or last N)... according to the "order by" clause it would
be known that the 'first N' of an 'order by ... asc' is the same as the
'last N' from an 'order by ... desc'

    ???

Re: sequential scan on select distinct

From

Pierre-Frédéric Caillaud

Date:

08 October 2004, 10:54:10

> Hashing is at least as fast, if not faster.
>             regards, tom lane

    Probably quite faster if the dataset is not huge...
    UniqueSort would be useful for GROUP BY x ORDER BY x though

Re: sequential scan on select distinct

From

Mischa Sandberg

Date:

11 October 2004, 13:03:48

Tom Lane wrote:
> Ole Langbehn <ole@freiheit.com> writes:
>
>>>What do you think about the idea of an "UniqueSort" which would do
>>>sort+unique in one pass ?
>
>>This is what oracle does and it is quite fast with it...

> Hashing is at least as fast, if not faster.
>
>             regards, tom lane

I got good mileage in a different SQL engine, by combining the
hash-aggregate and sort nodes into a single operator.
The hash table was just an index into the equivalent of the heap used
for generating runs. That gave me partially aggregated data,
or eliminated duplicate keys, without extra memory overhead of the
hash-aggregation node below the sort. Memory was scarce then ... :-)

BTW I'm really puzzled that Oracle is pushing 'index skip scan' as a new
feature. Wasn't this in the original Oracle Rdb --- one of Gennady
Antoshenkov's tweaks?

Re: execute cursor fetch

From

my ho

Date:

12 October 2004, 12:43:58

Hi,
If anyone can help pls, I have a question abt the
execution of cursor create/fetch/move , in particular
about disk cost. When a cursor is created, is the
whole table (with the required columns) got put into
memory? otherwise how does it work? (in term of disk
read and transfer?) after user issues command
move/fetch, how does postgre speed up the query in
compare to normal selection?
Thanks a lot,
regards,
MT Ho




__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail

Re: execute cursor fetch

From

Pierre-Frédéric Caillaud

Date:

12 October 2004, 13:36:25

    I just discovered this :

http://www.postgresql.org/docs/7.4/static/jdbc-query.html#AEN24298


On Tue, 12 Oct 2004 04:43:43 -0700 (PDT), my ho <mthoatbanjo@yahoo.com>
wrote:

> Hi,
> If anyone can help pls, I have a question abt the
> execution of cursor create/fetch/move , in particular
> about disk cost. When a cursor is created, is the
> whole table (with the required columns) got put into
> memory? otherwise how does it work? (in term of disk
> read and transfer?) after user issues command
> move/fetch, how does postgre speed up the query in
> compare to normal selection?
> Thanks a lot,
> regards,
> MT Ho
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail Address AutoComplete - You start. We finish.
> http://promotions.yahoo.com/new_mail
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: execute cursor fetch

From

Stef

Date:

12 October 2004, 14:05:52

Pierre-Frédéric Caillaud mentioned :
=> http://www.postgresql.org/docs/7.4/static/jdbc-query.html#AEN24298

My question is :
Is this only true for postgres versions >= 7.4 ?

I see the same section about "Setting fetch size to turn cursors on and off"
is not in the postgres 7.3.7 docs. Does this mean 7.3 the JDBC driver
for postgres < 7.4 doesn't support this ?

Kind Regards
Stefan

Re: execute cursor fetch

From

Tom Lane

Date:

12 October 2004, 15:08:07

my ho <mthoatbanjo@yahoo.com> writes:
> If anyone can help pls, I have a question abt the
> execution of cursor create/fetch/move , in particular
> about disk cost. When a cursor is created, is the
> whole table (with the required columns) got put into
> memory?

No.  The plan is set up and then incrementally executed each time you
say FETCH.

> how does postgre speed up the query in
> compare to normal selection?

The only difference from a SELECT is that the planner will prefer
"fast-start" plans, on the theory that you may not be intending
to retrieve the whole result.  For instance it might prefer an
indexscan to a seqscan + sort, when it otherwise wouldn't.

            regards, tom lane

Re: execute cursor fetch

From

Kris Jurka

Date:

15 October 2004, 23:32:52

On Tue, 12 Oct 2004, Stef wrote:

> Pierre-Frédéric Caillaud mentioned :
> => http://www.postgresql.org/docs/7.4/static/jdbc-query.html#AEN24298
>
> My question is :
> Is this only true for postgres versions >= 7.4 ?
>
> I see the same section about "Setting fetch size to turn cursors on and off"
> is not in the postgres 7.3.7 docs. Does this mean 7.3 the JDBC driver
> for postgres < 7.4 doesn't support this ?
>

You need the 7.4 JDBC driver, but can run it against a 7.3 (or 7.2)
database.  Also note the 8.0 JDBC driver can only do this against a 7.4 or
8.0 database and not older versions.

Kris Jurka