Home > mailing lists

Re: huge price database question.. - Mailing list pgsql-general

From	Lee Hachadoorian
Subject	Re: huge price database question..
Date	March 21, 2012 13:45:58
Msg-id	CANnCtnJLK3padcJkv5Y8ieo5R5V_fiu-82CX2SaDQTJxkMy6iQ@mail.gmail.com Whole thread
In response to	Re: huge price database question.. (Jim Green <student.northwestern@gmail.com>)
Responses	Re: huge price database question..
List	pgsql-general

Tree view

On Tue, Mar 20, 2012 at 11:28 PM, Jim Green <student.northwestern@gmail.com> wrote:

On 20 March 2012 22:57, John R Pierce <pierce@hogranch.com> wrote:

> avg() in the database is going to be a lot faster than copying the data into
> memory for an application to process.

I see..

As an example, I ran average on a 700,000 row table with 231 census variables reported by state. Running average on all 231 columns grouping by state inside Postgres beat running it by R by a factor of 130 NOT COUNTING an additional minute or so to pull the table from Postgres to R. To be fair, these numbers are not strictly comparable, because it's running on different hardware. But the setup is not atypical: Postgres is running on a heavy hitting server while R is running on my desktop.

SELECT state, avg(col1), avg(col2), [...] avg(col231)
FROM some_table
GROUP BY state;

5741 ms

aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm = TRUE)

754746 ms

--Lee

pgsql-general by date:

From: Jim Green
Date: 21 March 2012, 13:34:56
Subject: Re: huge price database question..

From: Adrian Klaver
Date: 21 March 2012, 13:53:12
Subject: Re: huge price database question..

Re: huge price database question.. - Mailing list pgsql-general

Previous

Next