Re: Postgresql OOM - Mailing list pgsql-hackers

From Radu Radutiu
Subject Re: Postgresql OOM
Date
Msg-id CAG4TxrjjjR3BfRWsV3z0c2BL9XZueFPG0uGxeHrhCC+92g05Gg@mail.gmail.com
Whole thread Raw
In response to Re: Postgresql OOM  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,


That's helpful, thanks!

One thing to note is that postgres' work_mem is confusing - it applies to
individual query execution nodes, not the whole query. Additionally, when you
use parallel workers, each of the parallel workers can use the "full"
work_mem, rather than work_mem being evenly distributed across workers.

Within that, the memory usage in the EXPLAIN ANALYZE isn't entirely unexpected
(I'd say it's unreasonable if you're not a postgres dev, but that's a
different issue).

We can see each of the Hash nodes use ~1GB, which is due to
(1 leader + 4 workers) * work_mem = 5 * 200MB = 1GB.

In this specific query we probably could free the memory in the "lower" hash
join nodes once the node directly above has finished building, but we don't
have the logic for that today.
 
I would understand 1 GB, even 2GB (considering hash_mem_multiplier) but the server ran out of memory with more than 64 GB. 

Of course that doesn't explain why the memory usage / temp file creation is so
extreme with a lower limit / fewer workers.  There aren't any bad statistics
afaics, nor can I really see a potential for a significant skew, it looks to
me that the hashtables are all built on a single text field and that nearly
all the input rows are distinct.

Could you post the table definition (\d+) and the database definition
(\l). One random guess I have is that you ended up with a "non-deterministic"
collation for that column and that we end up with a bad hash due to that.

I was  able to eliminate the columns not involved in the query while still retaining the problematic behavior (that's the reason for the new table names):
postgres=# \d inreq
                                  Table "public.inreq"
         Column          |            Type             | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
 input_sequence          | bigint                      |           | not null |
 msg_type                | character varying(8)        |           |          |
 originalrequest_id      | bigint                      |           |          |
 receive_time            | timestamp without time zone |           |          |
 related_output_sequence | bigint                      |           |          |
 msg_status              | character varying(15)       |           |          |
Indexes:
    "inreq_pkey" PRIMARY KEY, btree (input_sequence)
    "inreq_originalrequest_id_idx" btree (originalrequest_id)

postgres=# \d outreq
                      Table "public.outreq"
         Column         |  Type  | Collation | Nullable | Default
------------------------+--------+-----------+----------+---------
 output_sequence        | bigint |           | not null |
 input_sequence         | bigint |           |          |
 reply_input_sequence   | bigint |           |          |
 related_input_sequence | bigint |           |          |
Indexes:
    "outreq_pkey" PRIMARY KEY, btree (output_sequence)
    "outreq_input_sequence_idx" btree (input_sequence)
    "outreq_reply_input_sequence_idx" btree (reply_input_sequence)
postgres=# SELECT datname, datcollate FROM pg_database WHERE datname = current_database();
 datname  | datcollate  
----------+-------------
 postgres | en_US.UTF-8
(1 row)

A dump of the two tables above can be found at  https://drive.google.com/file/d/1ep1MYjNzlFaICL3GlPZaMdpOxRG6WCGz/view?usp=sharing (compressed size 1GB; size of database after import 14 GB ).

# prepare my_query (timestamp,bigint) as SELECT  t.input_sequence, rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence, rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type  FROM inreq t  LEFT JOIN outreq rec_tro ON rec_tro.input_sequence = t.input_sequence LEFT JOIN inreq r ON r.originalRequest_id = t.input_sequence   LEFT JOIN outreq rpl_rec_tro ON rpl_rec_tro.input_sequence = r.input_sequence  LEFT JOIN outreq rpl_snd_tro ON rpl_snd_tro.reply_input_sequence = r.input_sequence  LEFT JOIN outreq snd_tro ON snd_tro.reply_input_sequence = t.input_sequence  WHERE t.receive_time < $1 AND t.input_sequence < $2  AND t.msg_status IN ('COMPLETED', 'REJECTED')  ORDER BY t.msg_status DESC, t.input_sequence ;
# EXPLAIN (ANALYZE,BUFFERS)  EXECUTE my_query('2024-05-18 00:00:00', 202406020168279904);

Best Regards, 
Radu

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Track the amount of time waiting due to cost_delay
Next
From: Dilip Kumar
Date:
Subject: Re: Logical Replication of sequences