Thread: Postgres12 looking for possible HashAggregate issue workarounds?
|
Confidentiality
The information in this message is confidential and privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it is prohibited.
Please contact the sender immediately if you have received this message by mistake.
Thank you for your cooperation.
On Fri, Dec 16, 2022 at 03:24:17PM +0000, João Paulo Luís wrote: > Hi! Sorry to post to this mailing list, but I could not find many tips working around HashAggregate issues. > > In a research project involving text repetition analysis (on top of public documents) > I have a VirtualMachine (CPU AMD Epyc 7502P, 128GB RAM, 12TB HDD, 2TB SSD), > running postgres 12.12 (Ubuntu 12.12-0ubuntu0.20.04.1) > and some tables with many rows: > 1 - the query is making a postgresql project have 76.7 GB resident RAM usage. > Having a WORK_MEM setting of 2GB (and "simple" COUNT() results), > that was not expected. > (I risk oom-killer killing my postgres as soon as I run another concurrent > query.) > The rows=261275 on HashAggregate (cost=26397219.92..26399832.67 rows=261275 width=8) seems VERY WRONG! > I was expecting something like rows=1.0E+09 instead. > I would guess that HashAggregate is behaving very badly (using to much RAM beyond WORK_MEM, amd also badly estimating the#rows and taking forever...) Huge memory use sounds like what was fixed in postgres 13. https://www.postgresql.org/docs/13/release-13.html Allow hash aggregation to use disk storage for large aggregation result sets (Jeff Davis) Previously, hash aggregation was avoided if it was expected to use more than work_mem memory. Now, a hash aggregation plan can be chosen despite that. The hash table will be spilled to disk if it exceeds work_mem times hash_mem_multiplier. This behavior is normally preferable to the old behavior, in which once hash aggregation had been chosen, the hash table would be kept in memory no matter how large it got — which could be very large if the planner had misestimated. If necessary, behavior similar to that can be obtained by increasing hash_mem_multiplier. -- Justin
|
Confidentiality
The information in this message is confidential and privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it is prohibited.
Please contact the sender immediately if you have received this message by mistake.
Thank you for your cooperation.
Enviado: 16 de dezembro de 2022 16:06
Para: João Paulo Luís <joao.luis@pdmfc.com>
Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Assunto: Re: Postgres12 looking for possible HashAggregate issue workarounds?
CAUTION: External E-mail
On Fri, Dec 16, 2022 at 03:24:17PM +0000, João Paulo Luís wrote:
> Hi! Sorry to post to this mailing list, but I could not find many tips working around HashAggregate issues.
>
> In a research project involving text repetition analysis (on top of public documents)
> I have a VirtualMachine (CPU AMD Epyc 7502P, 128GB RAM, 12TB HDD, 2TB SSD),
> running postgres 12.12 (Ubuntu 12.12-0ubuntu0.20.04.1)
> and some tables with many rows:
> 1 - the query is making a postgresql project have 76.7 GB resident RAM usage.
> Having a WORK_MEM setting of 2GB (and "simple" COUNT() results),
> that was not expected.
> (I risk oom-killer killing my postgres as soon as I run another concurrent
> query.)
> The rows=261275 on HashAggregate (cost=26397219.92..26399832.67 rows=261275 width=8) seems VERY WRONG!
> I was expecting something like rows=1.0E+09 instead.
> I would guess that HashAggregate is behaving very badly (using to much RAM beyond WORK_MEM, amd also badly estimating the #rows and taking forever...)
Huge memory use sounds like what was fixed in postgres 13.
https://www.postgresql.org/docs/13/release-13.html
Allow hash aggregation to use disk storage for large aggregation result
sets (Jeff Davis)
Previously, hash aggregation was avoided if it was expected to use more
than work_mem memory. Now, a hash aggregation plan can be chosen despite
that. The hash table will be spilled to disk if it exceeds work_mem
times hash_mem_multiplier.
This behavior is normally preferable to the old behavior, in which once
hash aggregation had been chosen, the hash table would be kept in memory
no matter how large it got — which could be very large if the planner
had misestimated. If necessary, behavior similar to that can be obtained
by increasing hash_mem_multiplier.
--
Justin
On Sun, 18 Dec 2022 at 23:44, João Paulo Luís <joao.luis@pdmfc.com> wrote: > Meanwhile, as a one-time workaround I've disabled the hashagg algorithm, The way the query planner determines if Hash Aggregate's hash table will fit in work_mem or not is based on the n_distinct estimate of the columns being grouped on. You may want to review what analyze set n_distinct to on this table. That can be done by looking at: select attname,n_distinct from pg_Stats where tablename = 'sentencesource' and attname = 'sentence'; If what that's set to does not seem realistic, then you can overwrite this with: ALTER TABLE sentencesource ALTER COLUMN sentence SET (n_distinct = N); Please see the paragraph in [1] about n_distinct. Using an absolute value is likely not a great idea if the table is going to grow. You could maybe give it a better estimate about how many times values are repeated by setting some negative value, as described in the documents. You'll need to analyze the table again after changing this setting. David [1] https://www.postgresql.org/docs/12/sql-altertable.html
|
Confidentiality
The information in this message is confidential and privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it is prohibited.
Please contact the sender immediately if you have received this message by mistake.
Thank you for your cooperation.
Enviado: 18 de dezembro de 2022 11:06
Para: João Paulo Luís <joao.luis@pdmfc.com>
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Assunto: Re: Postgres12 looking for possible HashAggregate issue workarounds?
CAUTION: External E-mail
On Sun, 18 Dec 2022 at 23:44, João Paulo Luís <joao.luis@pdmfc.com> wrote:
> Meanwhile, as a one-time workaround I've disabled the hashagg algorithm,
The way the query planner determines if Hash Aggregate's hash table
will fit in work_mem or not is based on the n_distinct estimate of the
columns being grouped on. You may want to review what analyze set
n_distinct to on this table. That can be done by looking at:
select attname,n_distinct from pg_Stats where tablename =
'sentencesource' and attname = 'sentence';
If what that's set to does not seem realistic, then you can overwrite this with:
ALTER TABLE sentencesource ALTER COLUMN sentence SET (n_distinct = N);
Please see the paragraph in [1] about n_distinct. Using an absolute
value is likely not a great idea if the table is going to grow. You
could maybe give it a better estimate about how many times values are
repeated by setting some negative value, as described in the
documents. You'll need to analyze the table again after changing this
setting.
David
[1] https://www.postgresql.org/docs/12/sql-altertable.html