Re: Huge Data sets, simple queries - Mailing list pgsql-performance

From Tom Lane
Subject Re: Huge Data sets, simple queries
Date
Msg-id 11814.1138463702@sss.pgh.pa.us
Whole thread Raw
In response to Huge Data sets, simple queries  ("Mike Biamonte" <mike@dbeat.com>)
Responses Re: Huge Data sets, simple queries  ("Jeffrey W. Baker" <jwbaker@acm.org>)
List pgsql-performance
"Mike Biamonte" <mike@dbeat.com> writes:
> The queries I need to run on my 200 million transactions are relatively
> simple:

>    select month, count(distinct(cardnum)) count(*), sum(amount) from
> transactions group by month;

count(distinct) is not "relatively simple", and the current
implementation isn't especially efficient.  Can you avoid that
construct?

Assuming that "month" means what it sounds like, the above would result
in running twelve parallel sort/uniq operations, one for each month
grouping, to eliminate duplicates before counting.  You've got sortmem
set high enough to blow out RAM in that scenario ...

            regards, tom lane

pgsql-performance by date:

Previous
From: "Jeffrey W. Baker"
Date:
Subject: Re: Huge Data sets, simple queries
Next
From: "Jeffrey W. Baker"
Date:
Subject: Re: Huge Data sets, simple queries