Thread: Any way to optimize GROUP BY queries?

Any way to optimize GROUP BY queries?

From
"Cristian Prieto"
Date:

I have the following table:

 

CREATE TABLE mytmp (

            Adv integer,

            Pub integer,

            Web integer,

            Tiempo timestamp,

            Num integer,

            Country varchar(2)

);

 

CREATE INDEX idx_mytmp ON mytmp(adv, pub, web);

 

And with 16M rows this query:

 

SELECT adv, pub, web, country, date_trunc(‘hour’, tiempo), sum(num)

FROM mytmp GROUP BY adv, pub, web, country, date_trunc(‘hour’, tiempo)

 

I’ve tried to create index in different columns but it seems that the group by clause doesn’t use the index in any way.

 

Is around there any stuff to accelerate the group by kind of clauses?

 

Thanks a lot…

 

 

 

Attachment

Re: Any way to optimize GROUP BY queries?

From
Greg Stark
Date:
"Cristian Prieto" <cristian@clickdiario.com> writes:

> SELECT adv, pub, web, country, date_trunc('hour', tiempo), sum(num)
> FROM mytmp GROUP BY adv, pub, web, country, date_trunc('hour', tiempo)
>
> I've tried to create index in different columns but it seems that the group
> by clause doesn't use the index in any way.

If you had an index on < adv,pub,web,country,date_trunc('hour',tiemp) > then
it would be capable of using the index however it would choose not to unless
you forced it to. Using the index would be slower.

> Is around there any stuff to accelerate the group by kind of clauses?

Increase your work_mem (or sort_mem in older postgres versions), you can do
this for the server as a whole or just for this one session and set it back
after this one query. You can increase it up until it starts causing swapping
at which point it would be counter productive.

If increasing work_mem doesn't allow a hash aggregate or at least an in-memory
sort to handle it then putting the pgsql_tmp directory on a separate spindle
might help if you have any available.

--
greg

Re: Any way to optimize GROUP BY queries?

From
"Jim C. Nasby"
Date:
On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote:
> Increase your work_mem (or sort_mem in older postgres versions), you can do
> this for the server as a whole or just for this one session and set it back
> after this one query. You can increase it up until it starts causing swapping
> at which point it would be counter productive.

Just remember that work_memory is per-operation, so it's easy to push
the box into swapping if the workload increases. You didn't say how much
memory you have, but I'd be careful if work_memory * max_connections
gets very much larger than your total memory.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Any way to optimize GROUP BY queries?

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote:
>> Increase your work_mem (or sort_mem in older postgres versions), you can do
>> this for the server as a whole or just for this one session and set it back
>> after this one query. You can increase it up until it starts causing swapping
>> at which point it would be counter productive.

> Just remember that work_memory is per-operation, so it's easy to push
> the box into swapping if the workload increases. You didn't say how much
> memory you have, but I'd be careful if work_memory * max_connections
> gets very much larger than your total memory.

It's considered good practice to have a relatively small default
work_mem setting (in postgresql.conf), and then let individual sessions
push up the value locally with "SET work_mem" if they are going to
execute queries that need it.  This works well as long as you only have
one or a few such "heavy" sessions at a time.

            regards, tom lane