Home > mailing lists

Re: suggestions to improve postgresql suitability for data-mining - Mailing list pgsql-hackers

From	Christopher Kings-Lynne
Subject	Re: suggestions to improve postgresql suitability for data-mining
Date	July 22, 2003 22:30:57
Msg-id	0efc01c350ba$7ad71140$2800a8c0@mars Whole thread Raw
In response to	suggestions to improve postgresql suitability for data-mining (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

> II) SQL
> -------
>
> The first idea is to ask SQL to do the job with a 'group by' clause:
>
> SELECT area, type, month, SUM(amount), COUNT(*)
> FROM client AS c, invoice AS i
> WHERE c.id=i.client
> GROUP BY area, type, month;
>
> As I am just interested in reading the data, without any transaction, I
> tuned a little bit the database parameters (fsync=false, more shared_mem
> and sort_mem).
>
> It works, but it is quite slow and it requires a lot of disk space.
> Indeed, the result of the join is big, and the aggregation seems to
> require an external sort step so as to sum up data one group after the
> other.
>
> As the resulting table is very small, I wish the optimizer would have
> skipped the sort phase, so as to aggregate the data as they come after the
> join. All may be done on the fly without much additionnal storage (well,
> with some implementation efforts). Maybe it is the "hash evaluation of
> group by aggregates" item listed in the todo list.

As of 7.4CVS it does do this.  You will find this much faster in 7.4
release.

Chris

pgsql-hackers by date:

From: "Christopher Kings-Lynne"
Date: 22 July 2003, 22:18:27
Subject: Re: tsearch2 for 7.3.X

From: Ang Chin Han
Date: 23 July 2003, 00:22:52
Subject: Re: reprise on Linux overcommit handling

Re: suggestions to improve postgresql suitability for data-mining - Mailing list pgsql-hackers

Previous

Next