Re: memory usage of group by select - Mailing list pgsql-general

From Anthony
Subject Re: memory usage of group by select
Date
Msg-id 71cd4dd90912291425q76b22e2ak94c432823c6dbfc7@mail.gmail.com
Whole thread Raw
In response to Re: memory usage of group by select  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
On Tue, Dec 29, 2009 at 4:09 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
It's expecting 85k distinct groups.  If that's not accurate, then
HashAggregate would use more memory than expected.

Great diagnosis.  There are actually about 76 million distinct groups.
 
See if you can make it work by setting enable_hashagg = off.

Will do.  Maybe overnight tonight.  The explain is estimating that to take 10 times as long, and I can't afford to do that right now.

If that works, good -- the real solution is different.  Maybe you need
to ANALYZE more, or increase the size of the stats bin for this column.

On Tue, Dec 29, 2009 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If ANALYZE consistently underestimates the number of distinct values,
you may have to force matters with ALTER TABLE SET STATISTICS DISTINCT.

Seems to be an 8.5 feature.  Otherwise, it'd be perfect since I happen to know the actual number (it's the row count of another table).

I've run the analyze again and it's still estimating around 85K distinct pids.  That's with the default setting of 100 for default_statistics_target, but I'm not sure I want to mess with that right now (this table is just going to be dropped in a few days after I manipulate it a bit more).

The only question I really have is this: if I fix this number and leave enable_hashagg on, is it just going to have the same effect (for this one query) of turning enable_hashagg off?  Because if I'm just going to have to bite the bullet and run the query with GroupAggregate (cost=287511359.15..297867743.71), I guess I'll just have to do that.  Or try to figure out another way to get what it is I'm trying to get (I should have just imported the right numbers from the beginning, but the import took 3 days so I don't feel like doing that again).

In any case, thanks a lot for the help, both of you.

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: memory usage of group by select
Next
From: Nick
Date:
Subject: Re: Installation of Postgis/postgresql