Re: Postgresql OOM - Mailing list pgsql-hackers
From | Radu Radutiu |
---|---|
Subject | Re: Postgresql OOM |
Date | |
Msg-id | CAG4TxrjjjR3BfRWsV3z0c2BL9XZueFPG0uGxeHrhCC+92g05Gg@mail.gmail.com Whole thread Raw |
In response to | Re: Postgresql OOM (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi,
That's helpful, thanks!
One thing to note is that postgres' work_mem is confusing - it applies to
individual query execution nodes, not the whole query. Additionally, when you
use parallel workers, each of the parallel workers can use the "full"
work_mem, rather than work_mem being evenly distributed across workers.
Within that, the memory usage in the EXPLAIN ANALYZE isn't entirely unexpected
(I'd say it's unreasonable if you're not a postgres dev, but that's a
different issue).
We can see each of the Hash nodes use ~1GB, which is due to
(1 leader + 4 workers) * work_mem = 5 * 200MB = 1GB.
In this specific query we probably could free the memory in the "lower" hash
join nodes once the node directly above has finished building, but we don't
have the logic for that today.
I would understand 1 GB, even 2GB (considering hash_mem_multiplier) but the server ran out of memory with more than 64 GB.
Of course that doesn't explain why the memory usage / temp file creation is so
extreme with a lower limit / fewer workers. There aren't any bad statistics
afaics, nor can I really see a potential for a significant skew, it looks to
me that the hashtables are all built on a single text field and that nearly
all the input rows are distinct.
Could you post the table definition (\d+) and the database definition
(\l). One random guess I have is that you ended up with a "non-deterministic"
collation for that column and that we end up with a bad hash due to that.
I was able to eliminate the columns not involved in the query while still retaining the problematic behavior (that's the reason for the new table names):
postgres=# \d inreq
Table "public.inreq"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
input_sequence | bigint | | not null |
msg_type | character varying(8) | | |
originalrequest_id | bigint | | |
receive_time | timestamp without time zone | | |
related_output_sequence | bigint | | |
msg_status | character varying(15) | | |
Indexes:
"inreq_pkey" PRIMARY KEY, btree (input_sequence)
"inreq_originalrequest_id_idx" btree (originalrequest_id)
postgres=# \d outreq
Table "public.outreq"
Column | Type | Collation | Nullable | Default
------------------------+--------+-----------+----------+---------
output_sequence | bigint | | not null |
input_sequence | bigint | | |
reply_input_sequence | bigint | | |
related_input_sequence | bigint | | |
Indexes:
"outreq_pkey" PRIMARY KEY, btree (output_sequence)
"outreq_input_sequence_idx" btree (input_sequence)
"outreq_reply_input_sequence_idx" btree (reply_input_sequence)
Table "public.inreq"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
input_sequence | bigint | | not null |
msg_type | character varying(8) | | |
originalrequest_id | bigint | | |
receive_time | timestamp without time zone | | |
related_output_sequence | bigint | | |
msg_status | character varying(15) | | |
Indexes:
"inreq_pkey" PRIMARY KEY, btree (input_sequence)
"inreq_originalrequest_id_idx" btree (originalrequest_id)
postgres=# \d outreq
Table "public.outreq"
Column | Type | Collation | Nullable | Default
------------------------+--------+-----------+----------+---------
output_sequence | bigint | | not null |
input_sequence | bigint | | |
reply_input_sequence | bigint | | |
related_input_sequence | bigint | | |
Indexes:
"outreq_pkey" PRIMARY KEY, btree (output_sequence)
"outreq_input_sequence_idx" btree (input_sequence)
"outreq_reply_input_sequence_idx" btree (reply_input_sequence)
postgres=# SELECT datname, datcollate FROM pg_database WHERE datname = current_database();
datname | datcollate
----------+-------------
postgres | en_US.UTF-8
(1 row)
datname | datcollate
----------+-------------
postgres | en_US.UTF-8
(1 row)
A dump of the two tables above can be found at https://drive.google.com/file/d/1ep1MYjNzlFaICL3GlPZaMdpOxRG6WCGz/view?usp=sharing (compressed size 1GB; size of database after import 14 GB ).
# prepare my_query (timestamp,bigint) as SELECT t.input_sequence, rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence, rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type FROM inreq t LEFT JOIN outreq rec_tro ON rec_tro.input_sequence = t.input_sequence LEFT JOIN inreq r ON r.originalRequest_id = t.input_sequence LEFT JOIN outreq rpl_rec_tro ON rpl_rec_tro.input_sequence = r.input_sequence LEFT JOIN outreq rpl_snd_tro ON rpl_snd_tro.reply_input_sequence = r.input_sequence LEFT JOIN outreq snd_tro ON snd_tro.reply_input_sequence = t.input_sequence WHERE t.receive_time < $1 AND t.input_sequence < $2 AND t.msg_status IN ('COMPLETED', 'REJECTED') ORDER BY t.msg_status DESC, t.input_sequence ;
# EXPLAIN (ANALYZE,BUFFERS) EXECUTE my_query('2024-05-18 00:00:00', 202406020168279904);Best Regards,
Radu
pgsql-hackers by date: