Re: sequential scan on select distinct - Mailing list pgsql-performance
From | Pierre-Frédéric Caillaud |
---|---|
Subject | Re: sequential scan on select distinct |
Date | |
Msg-id | opsfh00bfrcq72hf@musicbox Whole thread Raw |
In response to | Re: sequential scan on select distinct (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: sequential scan on select distinct
Re: sequential scan on select distinct |
List | pgsql-performance |
> I don't really think it would be a useful plan anyway. What *would* be > useful is to support HashAggregate as an implementation alternative for > DISTINCT --- currently I believe we only consider that for GROUP BY. > The DISTINCT planning code is fairly old and crufty and hasn't been > redesigned lately. > > regards, tom lane I see this as a minor annoyance only because I can write GROUP BY instead of DISTINCT and get the speed boost. It probably annoys people trying to port applications to postgres though, forcing them to rewrite their queries. * SELECT DISTINCT : 21442.296 ms (by default, uses an index scan) disabling index_scan => Sort + Unique : 14512.105 ms * GROUP BY : 1793.651 ms using HashAggregate * skip index scan by function : 13.833 ms The HashAggregate speed boost is good, but rather pathetic compared to a "skip index scan" ; but it's still worth having if updating the DISTINCT code is easy. Note that it would also benefit UNION queries which apparently use DISTINCT internally and currently produce this : ------------------------------------------------------------------------------ explain analyze select number from ((select number from dummy) union (select number from dummy)) as foo; Subquery Scan foo (cost=287087.62..317087.62 rows=2000000 width=4) (actual time=33068.776..35575.330 rows=255 loops=1) -> Unique (cost=287087.62..297087.62 rows=2000000 width=4) (actual time=33068.763..35574.126 rows=255 loops=1) -> Sort (cost=287087.62..292087.62 rows=2000000 width=4) (actual time=33068.757..34639.180 rows=2000000 loops=1) Sort Key: number -> Append (cost=0.00..49804.00 rows=2000000 width=4) (actual time=0.055..7412.551 rows=2000000 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..24902.00 rows=1000000 width=4) (actual time=0.054..3104.165 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=4) (actual time=0.051..1792.348 rows=1000000 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..24902.00 rows=1000000 width=4) (actual time=0.048..3034.462 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=4) (actual time=0.044..1718.682 rows=1000000 loops=1) Total runtime: 36265.662 ms ------------------------------------------------------------------------------ But could instead do this : explain analyze select number from ((select number from dummy) union all (select number from dummy)) as foo group by number; HashAggregate (cost=74804.00..74804.00 rows=200 width=4) (actual time=10753.648..10753.890 rows=255 loops=1) -> Subquery Scan foo (cost=0.00..69804.00 rows=2000000 width=4) (actual time=0.059..8992.084 rows=2000000 loops=1) -> Append (cost=0.00..49804.00 rows=2000000 width=4) (actual time=0.055..6688.639 rows=2000000 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..24902.00 rows=1000000 width=4) (actual time=0.054..2749.708 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=4) (actual time=0.052..1640.427 rows=1000000 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..24902.00 rows=1000000 width=4) (actual time=0.038..2751.916 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=4) (actual time=0.034..1637.818 rows=1000000 loops=1) Total runtime: 10754.120 ms ------------------------------------------------------------------------------ A 3x speedup, but still a good thing to have. When I LIMIT the two subqueries to 100k rows instead of a million, the times are about equal. When I LIMIT one of the subqueries to 100k and leave the other to 1M, UNION ALL 17949.609 ms UNION + GROUP BY 6130.417 ms Still some performance to be gained... ------------------------------------------------------------------------------ Of course it can't use a skip index scan on a subquery, but I could instead : I know it's pretty stupid to use the same table twice but it's just an example. However, if you think about table partitions and views, a "select distinct number" from a view having multiple partitions would yield this type of query, and that table partitioning seems like a hot subject lately. let's create a dummy example view : create view dummy_view as (select * from dummy) union all (select * from dummy); explain analyze select number from dummy_view group by number; HashAggregate (cost=74804.00..74804.00 rows=200 width=4) (actual time=10206.456..10206.713 rows=255 loops=1) -> Subquery Scan dummy_view (cost=0.00..69804.00 rows=2000000 width=4) (actual time=0.060..8431.776 rows=2000000 loops=1) -> Append (cost=0.00..49804.00 rows=2000000 width=8) (actual time=0.055..6122.125 rows=2000000 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..24902.00 rows=1000000 width=8) (actual time=0.054..2456.566 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=8) (actual time=0.048..1107.151 rows=1000000 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..24902.00 rows=1000000 width=8) (actual time=0.036..2471.748 rows=1000000 loops=1) -> Seq Scan on dummy (cost=0.00..14902.00 rows=1000000 width=8) (actual time=0.031..1104.482 rows=1000000 loops=1) Total runtime: 10206.945 ms A smarter planner could rewrite it into this : select number from ((select distinct number from dummy) union (select distinct number from dummy)) as foo; and notice it would index-skip-scan the two partitions (here, example with my function) explain analyze select number from ((select sel_distinct as number from sel_distinct()) union all (select sel_distinct as number from sel_distinct())) as foo group by number; HashAggregate (cost=70.00..70.00 rows=200 width=4) (actual time=29.078..29.332 rows=255 loops=1) -> Subquery Scan foo (cost=0.00..65.00 rows=2000 width=4) (actual time=13.378..28.587 rows=510 loops=1) -> Append (cost=0.00..45.00 rows=2000 width=4) (actual time=13.373..28.003 rows=510 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..22.50 rows=1000 width=4) (actual time=13.373..13.902 rows=255 loops=1) -> Function Scan on sel_distinct (cost=0.00..12.50 rows=1000 width=4) (actual time=13.367..13.619 rows=255 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..22.50 rows=1000 width=4) (actual time=13.269..13.800 rows=255 loops=1) -> Function Scan on sel_distinct (cost=0.00..12.50 rows=1000 width=4) (actual time=13.263..13.512 rows=255 loops=1) Total runtime: 29.569 ms So, if a query with UNION or UNION ALL+DISTINCT tries to put DISTINCT inside the subqueries and yields an index skip scan, here is a massive speedup. You will tell me "but if the UNION ALL has 10 subqueries, planning is going to take forever !" Well not necessarily. The above query with 10 subqueries UNIONALLed then GROUPed takes : UNION : 320509.522 ms (the Sort + Unique truly becomes humongous). UNION ALL + GROUP : 54586.759 ms (you see there is already interest in rewiring DISTINCT/UNION) skip scan + UNION : 147.941 ms skip scan + UNION ALL + group : 147.313 ms > Well it would clearly be useful in this test case, where has a small > number of distinct values in a large table, and an index on the column. > His plpgsql function that emulates such a plan is an order of magnitude > faster than the hash aggregate plan even though it has to do entirely > separate index scans for each key value. Actually, it is more like two orders of magnitude (100x faster) : in fact the time for a seq scan is O(N rows) whereas the time for the skip index scan should be, if I'm not mistaken, something like O((N distinct values) * (log N rows)) ; in my case there are 256 distinct values for 1M rows and a speedup of 100x, so if there were 10M rows the speedup would be like 300x (depending on the base of the log which I assume is 2). And if the skip index scan is implemented in postgres instead of in a function, it could be much, much faster... > [ shrug... ] There are an infinite number of special cases for which > that claim could be made. The more we load down the planner with > seldom-useful special cases, the *lower* the overall performance will > be, because we'll waste cycles checking for the special cases in every > case ... In a general way, you are absolutely right... special-casing a case for a speedup of 2x for instance would be worthless... but we are considering a HUGE speedup here. And, if this mode is only used for DISTINCT and GROUP BY queries, no planning cycles will be wasted at all on queries which do not use DISTINCT nor GROUP BY. Present state is that DISTINCT and UNION are slow with or without using the GROUP BY trick. Including the index skip scan in the planning options would only happen when appropriate cases are detected. This detection would be very fast. The index skip scan would then speed up the query so much that the additional planning cost would not matter. If there are many distinct values, so that seq scan is faster than skip scan, the query will be slow enough anyway so that the additional planning cost does not matter. The only problem cases are queries with small tables where startup time is important, but in that case the planner has stats about the number of rows in the table, and again excluding skip scan from the start would be fast. Lateral thought : Create a new index type which only indexes one row for each value. This index would use very little space and would be very fast to update (on my table it would index only 256 values). Keep the Index Scan code and all, but use this index type when you can. This solution is less general and also has a few drawbacks. Another thought : \d dummy Table «public.dummy» Colonne | Type | Modificateurs ---------+---------+--------------- id | integer | number | integer | Index : «dummy_idx» btree (number) «dummy_idx_2» btree (number, id) explain analyze select * from dummy where id=1; Seq Scan on dummy (cost=0.00..17402.00 rows=1 width=8) (actual time=274.480..1076.092 rows=1 loops=1) Filter: (id = 1) Total runtime: 1076.168 ms explain analyze select * from dummy where number between 0 and 256 and id=1; Index Scan using dummy_idx_2 on dummy (cost=0.00..6.02 rows=1 width=8) (actual time=1.449..332.020 rows=1 loops=1) Index Cond: ((number >= 0) AND (number <= 256) AND (id = 1)) Total runtime: 332.112 ms In this case we have no index on id, but using a skip index scan, emulated by the "between" to force use of the (number,id) index, even though it must look in all the 256 possible values for number, still speeds it up by 3x. Interestingly, with only 16 distinct values, the time is quite the same. Thus, the "skip index scan" could be used in cases where there is a multicolumn index, but the WHERE misses a column. This would not waste planning cycles because : - If the index we need exists and there is no "distinct" or "group by" without aggregate, the planner does not even consider using the skip index scan. - If the index we need does not exist, the planner only loses the cycles needed to check if there is a multicolumn index which may be used. In this case, either there is no such index, and a seq scan is chosen, which will be slow, so the time wasted for the check is negligible ; or an index is found and can be used, and the time gained by the skip index scan is well amortized. Currently one has to carefully consider which queries will be used frequently and need indexes, and which ones are infrequent and don't justify an index (but these queries will be very slow). With the skip index scan, these less frequent queries don't always mean a seq scan. Thus people will need to create less infrequently used indexes, and will have a higher INSERT/UPDATE speed/ ------------------------------------------------------------------------------------------------ The skip scan would also be a winner on this type of query which is a killer, a variant of the famous 'TOP 10' query : EXPLAIN SELECT max(id), number FROM dummy GROUP BY number; -> 2229.141 ms Postgres uses a Seq scan + HashAggregate. Come on, we have an index btree (number, id), use it ! A simple customization on my skip scan emulation function takes 13.683 ms... I know that Postgres does not translate max() on on indexed column to ORDER BY column DESC LIMIT 1, because it would be extremely hard to implement due to the general nature of aggregates which is a very good thing. It does not bother me because I can still write ORDER BY column DESC LIMIT 1. Where it does bother me is if I want the highest ID from each number, which can only be expressed by SELECT max(id), number FROM dummy GROUP BY number; and not with LIMITs. Suppose I want the first 10 higher id's for each number, which is another variant on the "killer top 10 query". I'm stuck, I cannot even use max(), I have to write a custom aggregate which would keep the 10 highest values, which would be very slow, so I have to use my function and put a LIMIT 10 instead of a LIMIT 1 in each query, along with a FOR and some other conditions to check if there are less than 10 id's for a number, etc, which more or less amounts to "select the next number, then select the associated id's". It'll still be fast a lot faster than seq scan, but it gets more and more complicated. However I'd like to write : select number,id from dummy ORDER BY number DESC, id DESC MULTILIMIT 50,10; The MULTILIMIT means "I want 50 numbers and 10 id's for each number." MULTILIMIT NULL,10 would mean "I want all numbers and 10 id's for each number." NULL is not mandatory, it could also be -1, a keyword or something. MULTILIMIT could simply be LIMIT too, because LIMIT takes one parameter. The OFFSET clause could also evolve accordingly. And this would naturally use a skip index scan, and benefit a whole class of queries which have traditionnaly been difficult to get right... Conclusion : smarting up the DISTINCT planner has the following benefits : - speedup on DISTINCT - speedup on UNION which seems to use DISTINCT internally index skip scan has the following benefits : - massive speedup (x100) on queries involving DISTINCT or its GROUP BY variant - same thing (x300) on UNION queries if the parser tries to rewrite the query and put the DISTINCT inside the subqueries - paves the way for a MULTILIMIT which gives an elegant, and very efficient way of expressing traditionnaly difficult queries like the "Top 10 by category" which are used quite often and give headaches to dba's. - Possibility to use a multicolumn index with a WHERE not including all left columns index skip scan has the following drawbacks : - more complexity - additional planning time This last drawback is in fact, limited because : - It is easy and fast to know when the index skip scan will never be used, so in most queries which won't need it, the possibility can be eliminated without wasting cycles in planning - When it is used, the performance gains are so massive that it is justified - People who use many queries where planning time is significant comparing to execution time are probably using SQL functions or prepared queries. Enough arguments, maybe not to convince you, but to have a second thought on it ? --------------------------------------------------------------- Side Note : What do you think about the idea of an "UniqueSort" which would do sort+unique in one pass ? This could also be simple to code, and would also offer advantages to all queries using UNION. The sort would be faster and consume less storage space because the data size would diminish as duplicates are eliminated along the way.
pgsql-performance by date: