Re: yet another q - Mailing list pgsql-performance

From Tom Lane
Subject Re: yet another q
Date
Msg-id 7713.1282235303@sss.pgh.pa.us
Whole thread Raw
In response to yet another q  (Samuel Gendler <sgendler@ideasculptor.com>)
List pgsql-performance
Samuel Gendler <sgendler@ideasculptor.com> writes:
> fast plan: http://explain.depesz.com/s/iZ
> slow plan: http://explain.depesz.com/s/Dv2

Your problem here is that it's switching from hash aggregation to
sort-and-group-aggregate once it decides that the number of aggregate
groups won't fit in work_mem anymore.  While you could brute-force
that by raising work_mem, it'd be a lot better if you could get the
estimated number of groups more in line with the actual.  Notice the
very large discrepancy between the estimated and actual numbers of
rows out of the aggregation steps.

Increasing the stats targets for the GROUP BY columns might help,
but I think what's basically going on here is there's correlation
between the GROUP BY columns that the planner doesn't know about.

One thing I'm wondering is why you're grouping by owner_customer_id
and t_fact.provider_id, when these aren't used in the output.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fwd: Vacuum Full + Cluster + Vacuum full = non removable dead rows
Next
From: Scott Marlowe
Date:
Subject: Re: Performance on new 64bit server compared to my 32bit desktop