Re: Death postgres - Mailing list pgsql-general

From Marc Millas
Subject Re: Death postgres
Date
Msg-id CADX_1aYQN95m9BggNAUaWmci0U+weJf1bJybiFSc3qSGF0mbJw@mail.gmail.com
Whole thread Raw
In response to Re: Death postgres  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: Death postgres
List pgsql-general



On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> the 750000 lines in each tables are not NULLs but '' empty varchar, which,
> obviously is not the same thing.
> and which perfectly generates 500 billions lines for the left join.
> So, no planner or statistics pbs. apologies for the time wasted.

No problem. Glad to have solved that puzzle.

> Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> I try to run the select * from table_a left join table_b on the col_a
> (which contains the 750000 '' on both tables)
> then postgres do crash, killed by oom, after having taken 1.1 TB of additional
> disk space. 

My guess is that the amount of parallelism is the problem.

work_mem is a per-node limit. Even a single process can use a multiple of
work_mem if the query contains nested nodes (which almost every query
does, but most nodes don't need much memory). With 5 parallel workers,
the total consumption will be 5 times that. So to prevent the OOM
condition you would need to reduce work_mem or max_parallel_workers (at
least for this query).

we have more than 100GB RAM and only 1 user, with one request running.
work_mem is set to 10MB.  for oom to kill due to work_mem it means that for one request with 2 left join, postgres needs more than 10.000 work_mem buffers.
to me, it seems difficult to believe. but that postgres may need that RAM space for hashing or whatever other similar purpose seems more probable.
no ? 

The description temp_file_limit says "...the maximum amount of disk
space that *a process* can use...". So with 5 workers that's 210*5 =
1050 GB total. Again, you may want to reduce either temp_file_limit or
max_parallel_workers.
Yes, but if so, we may have had a request canceled due to temp_file limit, which was not the case. 

> to my understanding, before postgres 13, hash aggregate did eat RAM limitless
> in such circumstances.
> but in 14.2 ??
> (I know, 14.8 is up...)

Maybe the older version of postgres didn't use as many workers for that
query (or maybe not parallelize it at all)?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

pgsql-general by date:

Previous
From: Thorsten Glaser
Date:
Subject: Re: Adding SHOW CREATE TABLE
Next
From: Thomas Kellerer
Date:
Subject: Re: Adding SHOW CREATE TABLE