Re: How slow is DISTINCT? - Mailing list pgsql-sql
From | Christopher Kings-Lynne |
---|---|
Subject | Re: How slow is DISTINCT? |
Date | |
Msg-id | GNELIHDDFBOCMGBFGEFOIECFCCAA.chriskl@familyhealth.com.au Whole thread Raw |
In response to | Re: How slow is DISTINCT? (Michael Contzen <mcontzen@dohle.com>) |
List | pgsql-sql |
What happens if you do... select artnr from warenausg_ges group by artnr; or even select distinct (artnr) from warenausg_ges; or select distinct on (artnr) from warenausg_ges; ...sort of ideas... Another idea is to maintain a table of the unique values in the column. Create a trigger on the main table to maintian the list of unique values... Chris > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Michael Contzen > Sent: Tuesday, 9 April 2002 6:16 PM > To: pgsql-sql@postgresql.org > Subject: Re: [SQL] How slow is DISTINCT? > > > Hello, > > our problem to 'select distinct' is similar: We have a big table of > about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it > works fine! > > Createing an index on it quite fast (nearly as fast than oracle on the > same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes. > > Then we tried 'select distinct one_field' which would result to about > 200.000 different values. Postgres needed 6 hours while Oracle managed > it in about 30 minutes. > > Looking into the pgsql_tmp directory of this db while doing this > selection showed me a lot of tempfiles nearly as big as the table. > > Does postgres sort the whole table without projection to one column an > performs a unique on this whole table? > This would explain the big amount of disk usage in pgsql_tmp and the big > amount of time. > > The statement, something could be wrong with the data, is not very > useful: This is data of our electronic cash-desks. Unfortunately our > customers buy every day nearly the same articles - therefor the > repeatition of data :-) > > > Kind regards > > M.Contzen > Developer > Dohle Systemberatung > Germany > > > Some facts of our test: > > Table "warenausg_ges" > Column | Type | Modifiers > -----------+---------------+----------- > ean | numeric(13,0) | > menge | numeric(13,3) | > lvkumsatz | numeric(15,3) | > vkumsatz | numeric(15,3) | > ekumsatz | numeric(15,3) | > rabatt | numeric(12,0) | > kdnr | numeric(10,0) | > artnr | numeric(10,0) | > lfnr | numeric(10,0) | > wg | integer | > aktion | character(1) | > datum | date | > status | integer | > Indexes: warenausg_ges_inx > > Index "warenausg_ges_inx" > Column | Type > --------+------ > datum | date > btree > > explain select distinct artnr from warenausg_ges; > NOTICE: QUERY PLAN: > > Unique (cost=224522801.22..225315849.86 rows=31721946 width=12) > -> Sort (cost=224522801.22..224522801.22 rows=317219456 width=12) > -> Seq Scan on warenausg_ges (cost=0.00..165793667.00 > rows=317219456 width=12) > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >