Re: Death postgres - Mailing list pgsql-general

From Marc Millas
Subject Re: Death postgres
Date
Msg-id CADX_1aZTekarfBzFOK-3v_eEWsyrqU=bkfzMxH3Xhj6=wjLN-w@mail.gmail.com
Whole thread Raw
In response to Re: Death postgres  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: Death postgres  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general



On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>     On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>     >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
>     97)
>     >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
>     1021522829864 width=97)
>     ...
>     >                ->  Parallel Hash Left Join  (cost=
>     604502.76..1276224253.51 rows=204304565973 width=97)
>     >                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>     ...
>     >
>     > //so.. the planner guess that those 2 join will generate 1000 billions
>     rows...
>
>     Are some of the col_ano values very frequent? If say the value 42 occurs
>     1 million times in both table_a and table_b, the join will create 1
>     trillion rows for that value alone. That doesn't explain the crash or the
>     disk usage, but it would explain the crazy cost (and would probably be a
>     hint that this query is unlikely to finish in any reasonable time).
>
>
> good guess, even if a bit surprising: there is one (and only one) "value" which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.


so, apo...

the 750000 lines in each tables are not NULLs but '' empty varchar, which, obviously is not the same thing.
and which perfectly generates 500 billions lines for the left join.
So, no planner or statistics pbs. apologies for the time wasted.
Back to the initial pb:
if, with temp_file_limit positioned to 210 GB, I try to run the select * from table_a left join table_b on the col_a (which contains the 750000 '' on both tables)
then postgres do crash, killed by oom, after having taken 1.1 TB of additional disk space. 
the explain plan guess 512 planned partitions. (obviously, I cannot provide an explain analyze...)

to my understanding, before postgres 13, hash aggregate did eat RAM limitless in such circumstances.
but in 14.2 ??
(I know, 14.8 is up...)


        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"



Marc MILLAS
Senior Architect
+33607850334


 

pgsql-general by date:

Previous
From: Marc Millas
Date:
Subject: Re: gather merge
Next
From: Ron
Date:
Subject: Re: order by