Re: suggestions to improve postgresql suitability for data-mining - Mailing list pgsql-hackers

From Christopher Kings-Lynne
Subject Re: suggestions to improve postgresql suitability for data-mining
Date
Msg-id 0efc01c350ba$7ad71140$2800a8c0@mars
Whole thread Raw
In response to suggestions to improve postgresql suitability for data-mining  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
> II) SQL
> -------
>
> The first idea is to ask SQL to do the job with a 'group by' clause:
>
> SELECT area, type, month, SUM(amount), COUNT(*)
> FROM client AS c, invoice AS i
> WHERE c.id=i.client
> GROUP BY area, type, month;
>
> As I am just interested in reading the data, without any transaction, I
> tuned a little bit the database parameters (fsync=false, more shared_mem
> and sort_mem).
>
> It works, but it is quite slow and it requires a lot of disk space.
> Indeed, the result of the join is big, and the aggregation seems to
> require an external sort step so as to sum up data one group after the
> other.
>
> As the resulting table is very small, I wish the optimizer would have
> skipped the sort phase, so as to aggregate the data as they come after the
> join. All may be done on the fly without much additionnal storage (well,
> with some implementation efforts). Maybe it is the "hash evaluation of
> group by aggregates" item listed in the todo list.

As of 7.4CVS it does do this.  You will find this much faster in 7.4
release.

Chris



pgsql-hackers by date:

Previous
From: "Christopher Kings-Lynne"
Date:
Subject: Re: tsearch2 for 7.3.X
Next
From: Ang Chin Han
Date:
Subject: Re: reprise on Linux overcommit handling