Home > mailing lists

Re: How slow is DISTINCT? - Mailing list pgsql-sql

From	Christopher Kings-Lynne
Subject	Re: How slow is DISTINCT?
Date	April 15, 2002 21:53:24
Msg-id	GNELIHDDFBOCMGBFGEFOIECFCCAA.chriskl@familyhealth.com.au Whole thread
In response to	Re: How slow is DISTINCT? (Michael Contzen <mcontzen@dohle.com>)
List	pgsql-sql

Tree view

What happens if you do...

select artnr from warenausg_ges group by artnr;

or even

select distinct (artnr) from warenausg_ges;

or

select distinct on (artnr) from warenausg_ges;

...sort of ideas...

Another idea is to maintain a table of the unique values in the column.
Create a trigger on the main table to maintian the list of unique values...

Chris

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Michael Contzen
> Sent: Tuesday, 9 April 2002 6:16 PM
> To: pgsql-sql@postgresql.org
> Subject: Re: [SQL] How slow is DISTINCT?
>
>
> Hello,
>
> our problem to 'select distinct' is similar: We have a big table of
> about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
> works fine!
>
> Createing an index on it quite fast (nearly as fast than oracle on the
> same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes.
>
> Then we tried 'select distinct one_field' which would result to about
> 200.000 different values. Postgres needed 6 hours while Oracle managed
> it in about 30 minutes.
>
> Looking into the pgsql_tmp directory of this db while doing this
> selection showed me a lot of tempfiles nearly as big as the table.
>
> Does postgres sort the whole table without projection to one column an
> performs a unique on this whole table?
> This would explain the big amount of disk usage in pgsql_tmp and the big
> amount of time.
>
> The statement, something could be wrong with the data, is not very
> useful: This is data of our electronic cash-desks. Unfortunately our
> customers buy every day nearly the same articles - therefor the
> repeatition of data :-)
>
>
> Kind regards
>
> M.Contzen
> Developer
> Dohle Systemberatung
> Germany
>
>
> Some facts of our test:
>
>          Table "warenausg_ges"
>   Column   |     Type      | Modifiers
> -----------+---------------+-----------
>  ean       | numeric(13,0) |
>  menge     | numeric(13,3) |
>  lvkumsatz | numeric(15,3) |
>  vkumsatz  | numeric(15,3) |
>  ekumsatz  | numeric(15,3) |
>  rabatt    | numeric(12,0) |
>  kdnr      | numeric(10,0) |
>  artnr     | numeric(10,0) |
>  lfnr      | numeric(10,0) |
>  wg        | integer       |
>  aktion    | character(1)  |
>  datum     | date          |
>  status    | integer       |
> Indexes: warenausg_ges_inx
>
> Index "warenausg_ges_inx"
>  Column | Type
> --------+------
>  datum  | date
> btree
>
> explain select distinct artnr from warenausg_ges;
> NOTICE:  QUERY PLAN:
>
> Unique  (cost=224522801.22..225315849.86 rows=31721946 width=12)
>   ->  Sort  (cost=224522801.22..224522801.22 rows=317219456 width=12)
>         ->  Seq Scan on warenausg_ges  (cost=0.00..165793667.00
> rows=317219456 width=12)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

pgsql-sql by date:

From: "jack"
Date: 15 April 2002, 21:25:10
Subject: please advise on column data type

From: "Josh Berkus"
Date: 15 April 2002, 22:51:16
Subject: Re: please advise on column data type

Re: How slow is DISTINCT? - Mailing list pgsql-sql

Previous

Next