Re: Postgresql OOM - Mailing list pgsql-hackers

From Radu Radutiu
Subject Re: Postgresql OOM
Date
Msg-id CAG4TxrjhEfZeZ1wKL-OdanbYRnCVBKY3hH513eWuxd2=3YJHig@mail.gmail.com
Whole thread Raw
In response to Re: Postgresql OOM  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


The planner should recognize this situation and avoid use of hash
join in such cases, but maybe the statistics aren't reflecting the
problem, or maybe there's something wrong with the logic specific
to parallel hash join.  You've not really provided enough information
to diagnose why the poor choice of plan.

                        regards, tom lane

Thanks for looking into this. I'm not sure what information would be needed to look at the choice of plan.
The statistics for the join conditions in the query would be:
 join_condition | min_count | max_count |         avg_count          
----------------+-----------+-----------+----------------------------
 snd_tro        |         0 |         0 | 0.000000000000000000000000
 rpl_rec_tro    |         0 |         2 |     0.99869222814474470477
 rec_tro        |         0 |         2 |     0.99869222814474470477
 rpl_snd_tro    |         0 |         0 | 0.000000000000000000000000
 r              |         0 |         1 |     0.49850916663490161653 


The relevant columns for the tables are:
postgres=# \d inputrequest
                               Table "public.inputrequest"
          Column          |            Type             | Collation | Nullable | Default
--------------------------+-----------------------------+-----------+----------+---------
 input_sequence           | bigint                      |           | not null |
 msg_type                 | character varying(8)        |           | not null |
 msg_content              | text                        |           | not null |
 msg_reference            | character varying(35)       |           |          |
 originalrequest_id       | bigint                      |           |          |
 receive_time             | timestamp without time zone |           | not null |
 related_output_sequence  | bigint                      |           |          |
 msg_status               | character varying(15)       |           |          |
 
Indexes:
    "inputrequest_pkey" PRIMARY KEY, btree (input_sequence)
    "inputrequest_originalrequest_id_idx" btree (originalrequest_id)

postgres=# \d outputrequest
                             Table "public.outputrequest"
         Column         |            Type             | Collation | Nullable | Default
------------------------+-----------------------------+-----------+----------+---------
 output_sequence        | bigint                      |           | not null |
 input_sequence         | bigint                      |           |          |
 msg_type               | character varying(8)        |           |          |
 msg_content            | text                        |           | not null |
 msg_reference          | character varying(35)       |           |          |
 reply_input_sequence   | bigint                      |           |          |
 status                 | integer                     |           | not null |
 related_input_sequence | bigint                      |           |          |
Indexes:
    "outputrequest_pkey" PRIMARY KEY, btree (output_sequence)
    "outputrequest_input_sequence_idx" btree (input_sequence)
    "outputrequest_reply_input_sequence_idx" btree (reply_input_sequence)



I wonder if our choice of primary keys (input_sequence and output_sequence) has something to do with the skew in the hash bucket distribution. We use the following format: yyyymmdd????????xx , where ???????? is more or less a sequence and xx is the node generating the id, i.e. 01,02,etc (with only one or two values in the dataset).

I wonder if it would be difficult to have an upper limit on the private memory that can be allocated by one process (or all processes similar to Oracle's pga_aggregate_limit). I would rather have one query failing with an error message instead of postgres eating up all memory and swap on the server. 
  
Best regards,
Radu

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ssl tests fail due to TCP port conflict
Next
From: Tom Lane
Date:
Subject: Re: XACT_EVENT for 'commit prepared'