Thread: Querying distinct values from a large table
Greetings! I have rather large table with about 5 millions of rows and a dozen of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need to query distinct pairs of ('a';'b') from this table. I use following query: SELECT DISTINCT a, b FROM tbl; but unfortunately, it takes forever to complete. Explaining gives me information that bottleneck is seqscan on 'tbl', which eats much time. Creating compound index on this table using following statement: CREATE INDEX tbl_a_b_idx ON tbl( a, b ); gives no effect, postgres simply ignores it, at least according to the EXPLAIN output. Is there any way to somehow improve the performance of this operation? Table can not be changed. -- Igor Lobanov Internal Development Engineer SWsoft, Inc.
Igor Lobanov wrote: > Greetings! > > I have rather large table with about 5 millions of rows and a dozen of > columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need > to query distinct pairs of ('a';'b') from this table. > Creating compound index on this table using following statement: > > CREATE INDEX tbl_a_b_idx ON tbl( a, b ); > > gives no effect, postgres simply ignores it, at least according to the > EXPLAIN output. What version of PostgreSQL is it? How many distinct values are you getting back from your 5 million rows? If there are too many, an index isn't going to help. Can you share the EXPLAIN ANALYSE output? You might want to try increasing work_mem for this one query to speed any sorting. How often is the table updated? Clustering might buy you some improvements (but not a huge amount I suspect). Richard Huxton Archonet Ltd
Richard Huxton wrote: >> I have rather large table with about 5 millions of rows and a dozen of >> columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I >> need to query distinct pairs of ('a';'b') from this table. > > What version of PostgreSQL is it? 8.1.4 > How many distinct values are you getting back from your 5 million rows? > If there are too many, an index isn't going to help. No more than 10,000. > Can you share the EXPLAIN ANALYSE output? You might want to try > increasing work_mem for this one query to speed any sorting. Real table and colum names are obfuscated because of NDA, sorry. explain analyze select distinct a, b from tbl EXPLAIN ANALYZE output is: Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual time=52719.868..56126.356 rows=5390 loops=1) -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual time=52719.865..54919.989 rows=3378864 loops=1) Sort Key: a, b -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941 width=6) (actual time=16.643..20652.610 rows=3378864 loops=1) Total runtime: 57307.394 ms > How often is the table updated? Clustering might buy you some > improvements (but not a huge amount I suspect). It is updated once per 3-5 seconds. And one more thing. I don't know if it helps, but column 'a' can have value from a limited set: 0, 1 or 2. Column 'b' is also an integer (foreign key, actually). -- Igor Lobanov Internal Development Engineer SWsoft, Inc.
Forgot to mention that our work_mem setting is 20480 Kb. > You might want to try > increasing work_mem for this one query to speed any sorting. -- Igor Lobanov Internal Development Engineer SWsoft, Inc.
Igor Lobanov wrote: > > > Richard Huxton wrote: >>> I have rather large table with about 5 millions of rows and a dozen >>> of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I >>> need to query distinct pairs of ('a';'b') from this table. > > >> What version of PostgreSQL is it? > > 8.1.4 Current release is 8.1.6 - probably worth upgrading when you've got time. It should be a simple matter of replacing the binaries but do check the release notes. >> How many distinct values are you getting back from your 5 million >> rows? If there are too many, an index isn't going to help. > > No more than 10,000. OK. Should be possible to do something then. >> Can you share the EXPLAIN ANALYSE output? You might want to try >> increasing work_mem for this one query to speed any sorting. > > Real table and colum names are obfuscated because of NDA, sorry. Fair enough. > explain analyze select distinct a, b from tbl > > EXPLAIN ANALYZE output is: > > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual > time=52719.868..56126.356 rows=5390 loops=1) > -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual > time=52719.865..54919.989 rows=3378864 loops=1) > Sort Key: a, b > -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941 > width=6) (actual time=16.643..20652.610 rows=3378864 loops=1) > Total runtime: 57307.394 ms Hmm - am I right in thinking (a,b) are the only two columns on this table? That means you'll have a lot of rows per page and an index scan could end up fetching lots of pages to check the rows are visible. Still - I'd expect it to be better than a seq scan. The first thing to try is to put the index back and run "SET enable_seqscan=off" before the explain analyse. That should force it to use the index. Then we'll see what costs it's expecting. >> How often is the table updated? Clustering might buy you some >> improvements (but not a huge amount I suspect). > > It is updated once per 3-5 seconds. OK - forget clustering then. > And one more thing. I don't know if it helps, but column 'a' can have > value from a limited set: 0, 1 or 2. Column 'b' is also an integer > (foreign key, actually). Hmm - might be worth trying distinct on (b,a) with an index that way around - that would give you greater selectivity at the top-level of the btree. Can you repeat the EXPLAIN ANALYSE with that too please. -- Richard Huxton Archonet Ltd
To be sure about the performance of index scan, try forcing the planner to use it instead of seq scan. A way might be to force the planner to use index scan on your table by using a dummy where clause. Try using a condition in your where clause which holds true for all rows. --Imad www.EnterpriseDB.com On 1/30/07, Richard Huxton <dev@archonet.com> wrote: > Igor Lobanov wrote: > > > > > > Richard Huxton wrote: > >>> I have rather large table with about 5 millions of rows and a dozen > >>> of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I > >>> need to query distinct pairs of ('a';'b') from this table. > > > > >> What version of PostgreSQL is it? > > > > 8.1.4 > > Current release is 8.1.6 - probably worth upgrading when you've got > time. It should be a simple matter of replacing the binaries but do > check the release notes. > > >> How many distinct values are you getting back from your 5 million > >> rows? If there are too many, an index isn't going to help. > > > > No more than 10,000. > > OK. Should be possible to do something then. > > >> Can you share the EXPLAIN ANALYSE output? You might want to try > >> increasing work_mem for this one query to speed any sorting. > > > > Real table and colum names are obfuscated because of NDA, sorry. > > Fair enough. > > > explain analyze select distinct a, b from tbl > > > > EXPLAIN ANALYZE output is: > > > > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual > > time=52719.868..56126.356 rows=5390 loops=1) > > -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual > > time=52719.865..54919.989 rows=3378864 loops=1) > > Sort Key: a, b > > -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941 > > width=6) (actual time=16.643..20652.610 rows=3378864 loops=1) > > Total runtime: 57307.394 ms > > Hmm - am I right in thinking (a,b) are the only two columns on this > table? That means you'll have a lot of rows per page and an index scan > could end up fetching lots of pages to check the rows are visible. Still > - I'd expect it to be better than a seq scan. > > The first thing to try is to put the index back and run "SET > enable_seqscan=off" before the explain analyse. That should force it to > use the index. Then we'll see what costs it's expecting. > > >> How often is the table updated? Clustering might buy you some > >> improvements (but not a huge amount I suspect). > > > > It is updated once per 3-5 seconds. > > OK - forget clustering then. > > > And one more thing. I don't know if it helps, but column 'a' can have > > value from a limited set: 0, 1 or 2. Column 'b' is also an integer > > (foreign key, actually). > > Hmm - might be worth trying distinct on (b,a) with an index that way > around - that would give you greater selectivity at the top-level of the > btree. Can you repeat the EXPLAIN ANALYSE with that too please. > > -- > Richard Huxton > Archonet Ltd > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
On Tue, 2007-01-30 at 15:33 +0600, Igor Lobanov wrote: > explain analyze select distinct a, b from tbl > > EXPLAIN ANALYZE output is: > > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual > time=52719.868..56126.356 rows=5390 loops=1) > -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual > time=52719.865..54919.989 rows=3378864 loops=1) > Sort Key: a, b > -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941 > width=6) (actual time=16.643..20652.610 rows=3378864 loops=1) > Total runtime: 57307.394 ms All your time is in the sort, not in the SeqScan. Increase your work_mem. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Tue, 2007-01-30 at 15:33 +0600, Igor Lobanov wrote: > >> explain analyze select distinct a, b from tbl >> >> EXPLAIN ANALYZE output is: >> >> Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual >> time=52719.868..56126.356 rows=5390 loops=1) >> -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual >> time=52719.865..54919.989 rows=3378864 loops=1) >> Sort Key: a, b >> -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941 >> width=6) (actual time=16.643..20652.610 rows=3378864 loops=1) >> Total runtime: 57307.394 ms > > All your time is in the sort, not in the SeqScan. > > Increase your work_mem. Well, even if the sort was instant it's only going to get him down to 20secs. -- Richard Huxton Archonet Ltd
On 1/30/07, Simon Riggs <simon@2ndquadrant.com> wrote:
Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, I guess), there is no need to push 3M rows through a sort algorithm to only shave it down to 1848 unique records.
I am assuming this optimization just isn't implemented in PostgreSQL?
--
Chad
http://www.postgresqlforums.com/
> explain analyze select distinct a, b from tbl
>
> EXPLAIN ANALYZE output is:
>
> Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual
> time=52719.868..56126.356 rows=5390 loops=1)
> -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual
> time=52719.865..54919.989 rows=3378864 loops=1)
> Sort Key: a, b
> -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941
> width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)
> Total runtime: 57307.394 ms
All your time is in the sort, not in the SeqScan.
Increase your work_mem.
Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, I guess), there is no need to push 3M rows through a sort algorithm to only shave it down to 1848 unique records.
I am assuming this optimization just isn't implemented in PostgreSQL?
--
Chad
http://www.postgresqlforums.com/
As I understand, there's no hashing for DISTINCT, but there is for GROUP BY. GROUP BY will choose between a hash and a sort (or maybe other options?) depending on the circumstances. So you can write
SELECT a, b FROM tbl GROUP BY a,b
and the sort/unique part of the query may run faster.
Brian
SELECT a, b FROM tbl GROUP BY a,b
and the sort/unique part of the query may run faster.
Brian
----- Original Message ----
From: Chad Wagner <chad.wagner@gmail.com>
To: Simon Riggs <simon@2ndquadrant.com>
Cc: Igor Lobanov <ilobanov@swsoft.com>; Richard Huxton <dev@archonet.com>; pgsql-performance@postgresql.org
Sent: Tuesday, 30 January, 2007 10:13:27 PM
Subject: Re: [PERFORM] Querying distinct values from a large table
On 1/30/07, Simon Riggs <simon@2ndquadrant.com> wrote:
Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, I guess), there is no need to push 3M rows through a sort algorithm to only shave it down to 1848 unique records.
I am assuming this optimization just isn't implemented in PostgreSQL?
--
Chad
http://www.postgresqlforums.com/
From: Chad Wagner <chad.wagner@gmail.com>
To: Simon Riggs <simon@2ndquadrant.com>
Cc: Igor Lobanov <ilobanov@swsoft.com>; Richard Huxton <dev@archonet.com>; pgsql-performance@postgresql.org
Sent: Tuesday, 30 January, 2007 10:13:27 PM
Subject: Re: [PERFORM] Querying distinct values from a large table
On 1/30/07, Simon Riggs <simon@2ndquadrant.com> wrote:
> explain analyze select distinct a, b from tbl
>
> EXPLAIN ANALYZE output is:
>
> Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual
> time=52719.868..56126.356 rows=5390 loops=1)
> -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual
> time=52719.865..54919.989 rows=3378864 loops=1)
> Sort Key: a, b
> -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941
> width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)
> Total runtime: 57307.394 ms
All your time is in the sort, not in the SeqScan.
Increase your work_mem.
Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, I guess), there is no need to push 3M rows through a sort algorithm to only shave it down to 1848 unique records.
I am assuming this optimization just isn't implemented in PostgreSQL?
--
Chad
http://www.postgresqlforums.com/
Chad, On 1/30/07 6:13 AM, "Chad Wagner" <chad.wagner@gmail.com> wrote: > Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, > I guess), there is no need to push 3M rows through a sort algorithm to only > shave it down to 1848 unique records. > > I am assuming this optimization just isn't implemented in PostgreSQL? Not that it helps Igor, but we've implemented single pass sort/unique, grouping and limit optimizations and it speeds things up to a single seqscan over the data, from 2-5 times faster than a typical external sort. I can't think of a way that indexing would help this situation given the required visibility check of each tuple. - Luke
On 1/30/07, Luke Lonergan <llonergan@greenplum.com> wrote:
Was that integrated back into PostgreSQL, or is that part of Greenplum's offering?
Not that it helps Igor, but we've implemented single pass sort/unique,
grouping and limit optimizations and it speeds things up to a single seqscan
over the data, from 2-5 times faster than a typical external sort.
Was that integrated back into PostgreSQL, or is that part of Greenplum's offering?
I can't think of a way that indexing would help this situation given the
required visibility check of each tuple.
I agree, using indexes as a "skinny" table is a whole other feature that would be nice.
--
Chad
http://www.postgresqlforums.com/
Chad, On 1/30/07 7:03 AM, "Chad Wagner" <chad.wagner@gmail.com> wrote: > On 1/30/07, Luke Lonergan <llonergan@greenplum.com> wrote: >> Not that it helps Igor, but we've implemented single pass sort/unique, >> grouping and limit optimizations and it speeds things up to a single seqscan >> over the data, from 2-5 times faster than a typical external sort. > > Was that integrated back into PostgreSQL, or is that part of Greenplum's > offering? Not yet, we will submit to PostgreSQL along with other executor node enhancements like hybrid hash agg (fixes the memory overflow problem with hash agg) and some other great sort work. These are all "cooked" and in the Greenplum DBMS, and have proven themselves significant on customer workloads with tens of terabytes already. For now it seems that the "Group By" trick Brian suggested in this thread combined with lots of work_mem may speed things up for this case if HashAgg is chosen. Watch out for misestimates of stats though - hash agg may overallocate RAM in some cases. >> I can't think of a way that indexing would help this situation given the >> required visibility check of each tuple. > > I agree, using indexes as a "skinny" table is a whole other feature that would > be nice. Yah - I like Hannu's ideas to make visibility less of a problem. We're thinking about this too. - Luke
"Luke Lonergan" <llonergan@greenplum.com> writes: > Chad, > > On 1/30/07 6:13 AM, "Chad Wagner" <chad.wagner@gmail.com> wrote: > > > Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, > > I guess), there is no need to push 3M rows through a sort algorithm to only > > shave it down to 1848 unique records. > > > > I am assuming this optimization just isn't implemented in PostgreSQL? > > Not that it helps Igor, but we've implemented single pass sort/unique, > grouping and limit optimizations and it speeds things up to a single seqscan > over the data, from 2-5 times faster than a typical external sort. Fwiw I also implemented this last September when I worked on the LIMIT/SORT case. It's blocked on the same issue that that patch is: how do we communicate the info that only unique records are needed down the plan to the sort node? Note that it's not just a boolean flag that we can include like the random access flag: The Unique node could in theory have a different key than the sort node. Coming at this fresh now a few months later I'm thinking instead of trying to put this intelligence into the individual nodes, an optimizer pass could run through the tree looking for Sort nodes that can have this bit tweaked. That doesn't quite help the LIMIT/SORT case because the limit isn't calculated until run-time but perhaps it's possible to finesse that issue by providing either the Limit or Sort node with pointers to other nodes. (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on your data distribution. It's not hard to come up with distributions where it's 1000x as fast and others where there's no speed difference.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, Jan 30, 2007 at 14:33:34 +0600, Igor Lobanov <ilobanov@swsoft.com> wrote: > Greetings! > > I have rather large table with about 5 millions of rows and a dozen of > columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need > to query distinct pairs of ('a';'b') from this table. > > Is there any way to somehow improve the performance of this operation? > Table can not be changed. DISTINCT currently can't use a hash aggregate plan and will use a sort. If there aren't many distinct values, the hash aggregate plan will run much faster. To get around this limitation, rewrite the query as a group by. Something like: SELECT a, b FROM table GROUP BY a, b;
Gregory Stark wrote: > (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on > your data distribution. It's not hard to come up with distributions where it's > 1000x as fast and others where there's no speed difference.) So the figure is really "1-1000x"? I bet this one is more impressive in PHB terms. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro, On 1/30/07 9:04 AM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote: >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on >> your data distribution. It's not hard to come up with distributions where >> it's >> 1000x as fast and others where there's no speed difference.) > > So the figure is really "1-1000x"? I bet this one is more impressive in > PHB terms. You got me - I'll bite - what's PHB? - Luke
On Tue, Jan 30, 2007 at 10:20:28AM -0800, Luke Lonergan wrote: > You got me - I'll bite - what's PHB? Usually the Pointy-Haired Boss, a figure from Dilbert. http://en.wikipedia.org/wiki/Pointy_Haired_Boss /* Steinar */ -- Homepage: http://www.sesse.net/
Luke, > You got me - I'll bite - what's PHB? Pointy Haired Boss. It's a Dilbert reference. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Luke Lonergan wrote: > Alvaro, > > On 1/30/07 9:04 AM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote: > > >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on > >> your data distribution. It's not hard to come up with distributions where > >> it's > >> 1000x as fast and others where there's no speed difference.) > > > > So the figure is really "1-1000x"? I bet this one is more impressive in > > PHB terms. > > You got me - I'll bite - what's PHB? Pointy-Haired Boss, a term (AFAIK) popularized by the Dilbert comic strip. Why, I just noticed there's even a Wikipedia entry: http://en.wikipedia.org/wiki/Pointy_Haired_Boss It seems people enjoy writing hypothetically about their bosses. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Argh! ### ####### ##### #################### ##### ######################### ### ############### ##### ########## ############## ########## ##### ##### ############ ### ################# ## ###### ### # ###### ## # ###### # ## ##### ## ### ####### ## ##### ######## #### #### ################ ########## ############## ###### ######### ## ## ## ## ## ## ## ## ## ## ### ### ######## ######### ################################ ########################## ################## ########## On 1/30/07 3:00 PM, "Josh Berkus" <josh@agliodbs.com> wrote: > > -- > --Josh > > Josh Berkus > PostgreSQL @ Sun > San Francisco >
Alvaro Herrera <alvherre@commandprompt.com> writes: > Gregory Stark wrote: >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on >> your data distribution. It's not hard to come up with distributions where it's >> 1000x as fast and others where there's no speed difference.) > So the figure is really "1-1000x"? I bet this one is more impressive in > PHB terms. Luke has a bad habit of quoting numbers that are obviously derived from narrow benchmarking scenarios as Universal Truths, rather than providing the context they were derived in. I wish he'd stop doing that... regards, tom lane
Tom,
On 1/30/07 9:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Gregory Stark wrote:
>>> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on
>>> your data distribution. It's not hard to come up with distributions where
>>> it's
>>> 1000x as fast and others where there's no speed difference.)
>
>> So the figure is really "1-1000x"? I bet this one is more impressive in
>> PHB terms.
>
> Luke has a bad habit of quoting numbers that are obviously derived from
> narrow benchmarking scenarios as Universal Truths, rather than providing
> the context they were derived in. I wish he'd stop doing that...
In this case I was referring to results obtained using grouping and distinct optimizations within sort where the benefit is from the use of a single pass instead of the multiple merge passes for external sort followed by a UNIQUE operator. In this case, the benefit ranges from 2-5x in many examples as I mentioned: "from 2-5 times faster than a typical external sort". This is also the same range of benefits we see for this optimization with a popular commercial database. With the limit/sort optimization we have seen more dramatic results, but I think those are less typical cases.
Here are some results for a 1GB table and a simple COUNT(DISTINCT) on a column with 7 unique values from my dual CPU laptop running Greenplum DB (PG 8.2.1 compatible) on both CPUs. Note that my laptop has 2GB of RAM so I have the 1GB table loaded into OS I/O cache. The unmodified external sort spills the sorted attribute to disk, but that takes little time. Note that the COUNT(DISTINCT) plan embeds a sort as the transition function in the aggregation node.
=================================================================
================= No Distinct Optimization in Sort ==============
=================================================================
lukelonergan=# select count(distinct l_shipmode) from lineitem;
count
-------
7
(1 row)
Time: 37832.308 ms
lukelonergan=# explain analyze select count(distinct l_shipmode) from lineitem; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=159175.30..159175.31 rows=1 width=8)
Total 1 rows with 40899 ms to end, start offset by 3.189 ms.
-> Gather Motion 2:1 (slice2) (cost=159175.25..159175.28 rows=1 width=8)
recv: Total 2 rows with 39387 ms to first row, 40899 ms to end, start offset by 3.191 ms.
-> Aggregate (cost=159175.25..159175.26 rows=1 width=8)
Avg 1.00 rows x 2 workers. Max 1 rows (seg0) with 39367 ms to end, start offset by 22 ms.
-> Redistribute Motion 2:2 (slice1) (cost=0.00..151672.00 rows=3001300 width=8)
recv: Avg 3000607.50 rows x 2 workers. Max 3429492 rows (seg1) with 0.362 ms to first row, 8643 ms to end, start offset by 23 ms.
Hash Key: lineitem.l_shipmode
-> Seq Scan on lineitem (cost=0.00..91646.00 rows=3001300 width=8)
Avg 3000607.50 rows x 2 workers. Max 3001300 rows (seg0) with 0.049 ms to first row, 2813 ms to end, start offset by 12.998 ms.
Total runtime: 40903.321 ms
(12 rows)
Time: 40904.013 ms
=================================================================
================= With Distinct Optimization in Sort ==============
=================================================================
lukelonergan=# set mpp_sort_flags=1;
SET
Time: 1.425 ms
lukelonergan=# select count(distinct l_shipmode) from lineitem;
count
-------
7
(1 row)
Time: 12846.466 ms
lukelonergan=# explain analyze select count(distinct l_shipmode) from lineitem;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=159175.30..159175.31 rows=1 width=8)
Total 1 rows with 13754 ms to end, start offset by 2.998 ms.
-> Gather Motion 2:1 (slice2) (cost=159175.25..159175.28 rows=1 width=8)
recv: Total 2 rows with 13754 ms to end, start offset by 3.000 ms.
-> Aggregate (cost=159175.25..159175.26 rows=1 width=8)
Avg 1.00 rows x 2 workers. Max 1 rows (seg0) with 13734 ms to end, start offset by 23 ms.
-> Redistribute Motion 2:2 (slice1) (cost=0.00..151672.00 rows=3001300 width=8)
recv: Avg 3000607.50 rows x 2 workers. Max 3429492 rows (seg1) with 0.352 ms to first row, 10145 ms to end, start offset by 26 ms.
Hash Key: lineitem.l_shipmode
-> Seq Scan on lineitem (cost=0.00..91646.00 rows=3001300 width=8)
Avg 3000607.50 rows x 2 workers. Max 3001300 rows (seg0) with 0.032 ms to first row, 4048 ms to end, start offset by 13.037 ms.
Total runtime: 13757.524 ms
(12 rows)
Time: 13758.182 ms
================= Background Information ==============
lukelonergan=# select count(*) from lineitem;
count
---------
6001215
(1 row)
Time: 1661.337 ms
lukelonergan=# \d lineitem
Table "public.lineitem"
Column | Type | Modifiers
-----------------+-----------------------+-----------
l_orderkey | integer | not null
l_partkey | integer | not null
l_suppkey | integer | not null
l_linenumber | integer | not null
l_quantity | double precision | not null
l_extendedprice | double precision | not null
l_discount | double precision | not null
l_tax | double precision | not null
l_returnflag | text | not null
l_linestatus | text | not null
l_shipdate | date | not null
l_commitdate | date | not null
l_receiptdate | date | not null
l_shipinstruct | text | not null
l_shipmode | text | not null
l_comment | character varying(44) | not null
Distributed by: (l_orderkey)
lukelonergan=# select pg_relation_size(oid)/(1000.*1000.) as MB, relname as Table from pg_class order by MB desc limit 10;
mb | table
------------------------+--------------------------------
1009.4755840000000000 | lineitem
230.0559360000000000 | orders
146.7678720000000000 | partsupp
35.8072320000000000 | part
32.8908800000000000 | customer
1.9333120000000000 | supplier
1.1304960000000000 | pg_proc
0.88473600000000000000 | pg_proc_proname_args_nsp_index
0.81100800000000000000 | pg_attribute
0.81100800000000000000 | pg_depend
- Luke
On 1/30/07 9:55 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Gregory Stark wrote:
>>> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on
>>> your data distribution. It's not hard to come up with distributions where
>>> it's
>>> 1000x as fast and others where there's no speed difference.)
>
>> So the figure is really "1-1000x"? I bet this one is more impressive in
>> PHB terms.
>
> Luke has a bad habit of quoting numbers that are obviously derived from
> narrow benchmarking scenarios as Universal Truths, rather than providing
> the context they were derived in. I wish he'd stop doing that...
In this case I was referring to results obtained using grouping and distinct optimizations within sort where the benefit is from the use of a single pass instead of the multiple merge passes for external sort followed by a UNIQUE operator. In this case, the benefit ranges from 2-5x in many examples as I mentioned: "from 2-5 times faster than a typical external sort". This is also the same range of benefits we see for this optimization with a popular commercial database. With the limit/sort optimization we have seen more dramatic results, but I think those are less typical cases.
Here are some results for a 1GB table and a simple COUNT(DISTINCT) on a column with 7 unique values from my dual CPU laptop running Greenplum DB (PG 8.2.1 compatible) on both CPUs. Note that my laptop has 2GB of RAM so I have the 1GB table loaded into OS I/O cache. The unmodified external sort spills the sorted attribute to disk, but that takes little time. Note that the COUNT(DISTINCT) plan embeds a sort as the transition function in the aggregation node.
=================================================================
================= No Distinct Optimization in Sort ==============
=================================================================
lukelonergan=# select count(distinct l_shipmode) from lineitem;
count
-------
7
(1 row)
Time: 37832.308 ms
lukelonergan=# explain analyze select count(distinct l_shipmode) from lineitem; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=159175.30..159175.31 rows=1 width=8)
Total 1 rows with 40899 ms to end, start offset by 3.189 ms.
-> Gather Motion 2:1 (slice2) (cost=159175.25..159175.28 rows=1 width=8)
recv: Total 2 rows with 39387 ms to first row, 40899 ms to end, start offset by 3.191 ms.
-> Aggregate (cost=159175.25..159175.26 rows=1 width=8)
Avg 1.00 rows x 2 workers. Max 1 rows (seg0) with 39367 ms to end, start offset by 22 ms.
-> Redistribute Motion 2:2 (slice1) (cost=0.00..151672.00 rows=3001300 width=8)
recv: Avg 3000607.50 rows x 2 workers. Max 3429492 rows (seg1) with 0.362 ms to first row, 8643 ms to end, start offset by 23 ms.
Hash Key: lineitem.l_shipmode
-> Seq Scan on lineitem (cost=0.00..91646.00 rows=3001300 width=8)
Avg 3000607.50 rows x 2 workers. Max 3001300 rows (seg0) with 0.049 ms to first row, 2813 ms to end, start offset by 12.998 ms.
Total runtime: 40903.321 ms
(12 rows)
Time: 40904.013 ms
=================================================================
================= With Distinct Optimization in Sort ==============
=================================================================
lukelonergan=# set mpp_sort_flags=1;
SET
Time: 1.425 ms
lukelonergan=# select count(distinct l_shipmode) from lineitem;
count
-------
7
(1 row)
Time: 12846.466 ms
lukelonergan=# explain analyze select count(distinct l_shipmode) from lineitem;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=159175.30..159175.31 rows=1 width=8)
Total 1 rows with 13754 ms to end, start offset by 2.998 ms.
-> Gather Motion 2:1 (slice2) (cost=159175.25..159175.28 rows=1 width=8)
recv: Total 2 rows with 13754 ms to end, start offset by 3.000 ms.
-> Aggregate (cost=159175.25..159175.26 rows=1 width=8)
Avg 1.00 rows x 2 workers. Max 1 rows (seg0) with 13734 ms to end, start offset by 23 ms.
-> Redistribute Motion 2:2 (slice1) (cost=0.00..151672.00 rows=3001300 width=8)
recv: Avg 3000607.50 rows x 2 workers. Max 3429492 rows (seg1) with 0.352 ms to first row, 10145 ms to end, start offset by 26 ms.
Hash Key: lineitem.l_shipmode
-> Seq Scan on lineitem (cost=0.00..91646.00 rows=3001300 width=8)
Avg 3000607.50 rows x 2 workers. Max 3001300 rows (seg0) with 0.032 ms to first row, 4048 ms to end, start offset by 13.037 ms.
Total runtime: 13757.524 ms
(12 rows)
Time: 13758.182 ms
================= Background Information ==============
lukelonergan=# select count(*) from lineitem;
count
---------
6001215
(1 row)
Time: 1661.337 ms
lukelonergan=# \d lineitem
Table "public.lineitem"
Column | Type | Modifiers
-----------------+-----------------------+-----------
l_orderkey | integer | not null
l_partkey | integer | not null
l_suppkey | integer | not null
l_linenumber | integer | not null
l_quantity | double precision | not null
l_extendedprice | double precision | not null
l_discount | double precision | not null
l_tax | double precision | not null
l_returnflag | text | not null
l_linestatus | text | not null
l_shipdate | date | not null
l_commitdate | date | not null
l_receiptdate | date | not null
l_shipinstruct | text | not null
l_shipmode | text | not null
l_comment | character varying(44) | not null
Distributed by: (l_orderkey)
lukelonergan=# select pg_relation_size(oid)/(1000.*1000.) as MB, relname as Table from pg_class order by MB desc limit 10;
mb | table
------------------------+--------------------------------
1009.4755840000000000 | lineitem
230.0559360000000000 | orders
146.7678720000000000 | partsupp
35.8072320000000000 | part
32.8908800000000000 | customer
1.9333120000000000 | supplier
1.1304960000000000 | pg_proc
0.88473600000000000000 | pg_proc_proname_args_nsp_index
0.81100800000000000000 | pg_attribute
0.81100800000000000000 | pg_depend
- Luke
Tom Lane <tgl@sss.pgh.pa.us> writes: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Gregory Stark wrote: > >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on > >> your data distribution. It's not hard to come up with distributions where it's > >> 1000x as fast and others where there's no speed difference.) > > > So the figure is really "1-1000x"? I bet this one is more impressive in > > PHB terms. > > Luke has a bad habit of quoting numbers that are obviously derived from > narrow benchmarking scenarios as Universal Truths, rather than providing > the context they were derived in. I wish he'd stop doing that... In fairness I may have exaggerated a bit there. There is a limit to how much of a speedup you can get in valid benchmarking situations. A single sequential scan is always going to be necessary so you're only saving the cost of writing out the temporary file and subsequent merge passes. It's hard to generate lots of intermediate merge passes since there are only O(log(n)) of them. So to get 1000x speedup on a large I/O bound sort you would have to be sorting something on order of 2^1000 records which is ridiculous. Realistically you should only be able to save 2-5 intermediate merge passes. On the other there are some common situations where you could see atypical increases. Consider joining a bunch of small tables to generate a large result set. The small tables are probably all in memory and the result set may only have a small number of distinct values. If you throw out the duplicates early you save *all* the I/O. If you have to do a disk sort it could be many orders slower. This is actually not an uncommon coding idiom for MySQL programmers accustomed to fast DISTINCT working around the lack of subqueries and poor performance of IN and EXISTS. They often just join together all the tables in a big cross join and then toss in a DISTINCT at the top to get rid of the duplicates. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
I strongly encourage anyone who is interested in the general external sorting problem peruse Jim Gray's site: http://research.microsoft.com/barc/SortBenchmark/ Ron Peacetree At 08:24 AM 1/31/2007, Gregory Stark wrote: >Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > Gregory Stark wrote: > > >> (Incidentally I'm not sure where 2-5x comes from. It's > entirely dependant on > > >> your data distribution. It's not hard to come up with > distributions where it's > > >> 1000x as fast and others where there's no speed difference.) > > > > > So the figure is really "1-1000x"? I bet this one is more impressive in > > > PHB terms. > > > > Luke has a bad habit of quoting numbers that are obviously derived from > > narrow benchmarking scenarios as Universal Truths, rather than providing > > the context they were derived in. I wish he'd stop doing that... > >In fairness I may have exaggerated a bit there. There is a limit to how much >of a speedup you can get in valid benchmarking situations. A single sequential >scan is always going to be necessary so you're only saving the cost of writing >out the temporary file and subsequent merge passes. > >It's hard to generate lots of intermediate merge passes since there are only >O(log(n)) of them. So to get 1000x speedup on a large I/O bound sort you would >have to be sorting something on order of 2^1000 records which is ridiculous. >Realistically you should only be able to save 2-5 intermediate merge passes. > >On the other there are some common situations where you could see atypical >increases. Consider joining a bunch of small tables to generate a large result >set. The small tables are probably all in memory and the result set may only >have a small number of distinct values. If you throw out the duplicates early >you save *all* the I/O. If you have to do a disk sort it could be many orders >slower. > >This is actually not an uncommon coding idiom for MySQL programmers accustomed >to fast DISTINCT working around the lack of subqueries and poor performance of >IN and EXISTS. They often just join together all the tables in a big cross >join and then toss in a DISTINCT at the top to get rid of the duplicates. > >-- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > >---------------------------(end of broadcast)--------------------------- >TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Gregory Stark <gsstark@mit.edu> writes: > On the other there are some common situations where you could see > atypical increases. Consider joining a bunch of small tables to > generate a large result set. The small tables are probably all in > memory and the result set may only have a small number of distinct > values. If you throw out the duplicates early you save *all* the > I/O. If you have to do a disk sort it could be many orders slower. Right, we already have support for doing that well, in the form of hashed aggregation. What needs to happen is to get that to work for DISTINCT as well as GROUP BY. IIRC, DISTINCT is currently rather thoroughly intertwined with ORDER BY, and we'd have to figure out some way to decouple them --- without breaking DISTINCT ON, which makes it a lot harder :-( regards, tom lane