BUG #17442: Wrong partition pruning when using prepared statement and huge RAM consumption - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17442: Wrong partition pruning when using prepared statement and huge RAM consumption
Date
Msg-id 17442-40d918cb37eeee0c@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17442
Logged by:          Damir Ciganović-Janković
Email address:      damir.ciganovic.jankovic@gmail.com
PostgreSQL version: 13.5
Operating system:   Windows 10, CentOS 7
Description:

We have upgraded Postgres from 11.12 to 13.5 and have started to experience
some problems with one query. After analysis we have found 2 problems and I
am reporting them both at the same time because they might be connected. We
are using Java and postgres jdbc driver, but we have managed to reproduce
the problem with psql. Also, I cannot share our original table, but using
dummy data should be enough hopefully.

-- Starting Schema.
DROP TABLE IF EXISTS test;
DROP TYPE IF EXISTS test_enum;
DEALLOCATE my_prepared_statement;
DEALLOCATE my_prepared_statement_with_enum_inlined;
DEALLOCATE my_prepared_statement_multi_partition;
DROP FUNCTION IF EXISTS create_test_partitions;
 
CREATE TYPE test_enum AS ENUM ('FIRST', 'SECOND');
-- NOTE: Partitions are by range with 2 values, one is enum, other is
timestamp
CREATE TABLE test (enum_col test_enum, timestamp_col timestamp, some_id int,
status int) PARTITION BY RANGE (enum_col, timestamp_col);
 
1. Case: Wrong partition pruning
-- Create 4 partitions of test table:
CREATE TABLE test_FIRST_1 PARTITION OF test FOR VALUES FROM ('FIRST',
'2022-01-01') TO ('FIRST', '2022-01-02');
CREATE TABLE test_FIRST_2 PARTITION OF test FOR VALUES FROM ('FIRST',
'2022-01-02') TO ('FIRST', '2022-01-03');
 
CREATE TABLE test_SECOND_1 PARTITION OF test FOR VALUES FROM ('SECOND',
'2022-01-01') TO ('SECOND', '2022-01-02');
CREATE TABLE test_SECOND_2 PARTITION OF test FOR VALUES FROM ('SECOND',
'2022-01-02') TO ('SECOND', '2022-01-03');

-- Analyzes ALL 4 partitions even though we specified that we want only
'FIRST'. Should have analyzed only test_FIRST_1. NOTE: same result with
EXPLAIN ANALYZE
PREPARE my_prepared_statement(text, text, text) AS UPDATE test SET some_id =
5 WHERE timestamp_col >= CAST($1 AS timestamp(6)) AND timestamp_col <
CAST($2 AS timestamp(6)) AND enum_col = $3::test_enum;
EXPLAIN EXECUTE my_prepared_statement('2022-01-01 01:00:00', '2022-01-01
02:00:00', 'FIRST');

-- Analyzes both test_FIRST_1 and test_FIRST_2 despite looking for only
'2022-01-01'. NOTE: same result with EXPLAIN ANALYZE
PREPARE my_prepared_statement_with_enum_inlined(text, text) AS UPDATE test
SET some_id = 5 WHERE timestamp_col >= CAST($1 AS timestamp(6)) AND
timestamp_col < CAST($2 AS timestamp(6)) AND enum_col = 'FIRST';
EXPLAIN EXECUTE my_prepared_statement_with_enum_inlined('2022-01-01
01:00:00', '2022-01-01 02:00:00');

This causes our query to be really slow. What I noticed is that this also
happens on 11.12, but we didn't notice this until second problem started to
appear (below)

2. Case: Huge RAM consumption
Once we have deployed our version with Postgres 13.5 installed we noticed
huge spikes in RAM usage that were taken until we have killed (closed)
connection that has executed the problematic query. For example, machine
went from 10 GB to around 26 GB RAM usage when query was executed by one
connection and then to around 42 GB RAM when query was executed by other
connection.  Our only configuration that could match these 16GB jumps is
"max_wal_size = 16GB", but I am not sure if it has anything with that. If
you need any other config parameter, please tell me.

It is a bit hard to get exact query that produces the same spike with dummy
data, but what I noticed is that Postgres takes a long time to even Explain
the query, without executing it. 
I noticed also that if there are no partitions, explain does not have a
problem with that, but if there are many partitions (on our production we
have around 1100 partitions of that table), query starts to go wild.


This is how I tried to reproduce:

-- DROP tables used in previous example:
DROP TABLE test_FIRST_1;
DROP TABLE test_FIRST_2;
DROP TABLE test_SECOND_1;
DROP TABLE test_SECOND_2;

-- This is a helper function that will create partition for every 10
minutes. It will create for both test_enum values (FIRST and SECOND). Usage
of method is to define starting and end timestamp and function will create
partitions from start to end
create or replace function create_test_partitions(from_timestamp timestamp
without time zone, to_timestamp timestamp without time zone) returns void
    language plpgsql
as
$$
declare
    start_of_range_timestamp timestamp := from_timestamp;
    end_of_range_timestamp timestamp := from_timestamp + '10
minutes'::interval;
    counter int := 1;
begin
    while start_of_range_timestamp < to_timestamp loop
            execute 'create table test_FIRST_' || counter || E' partition of
test for values from (\'FIRST\', \'' || start_of_range_timestamp || E'\') to
(\'FIRST\', \'' || end_of_range_timestamp || E'\')';
            execute 'create table test_SECOND_' || counter || E' partition of test
for values from (\'SECOND\', \'' || start_of_range_timestamp || E'\') to
(\'SECOND\', \'' || end_of_range_timestamp || E'\')';
            start_of_range_timestamp = end_of_range_timestamp;
            end_of_range_timestamp = end_of_range_timestamp + '10
minutes'::interval;
            counter = counter + 1;
        end loop;
end
$$;

SELECT create_test_partitions('2022-01-01', '2022-01-05');

--  I know that this is a bit too complex to use for an example, but it was
hard for me to replicate same effect with a (dummy) table different than our
real table. 
-- What we are doing here is selecting some rows from test table and then
updating other rows in this test table based on selected columns
PREPARE my_prepared_statement_multi_partition(text, text, text, text) AS
WITH "id_collection" AS (SELECT some_id AS some_id_cte, status AS status_cte
FROM test where timestamp_col >= CAST($1 AS timestamp(6)) AND timestamp_col
< CAST($2 AS timestamp(6))) UPDATE test SET status =
"id_collection".status_cte FROM "id_collection" WHERE timestamp_col >=
CAST($3 AS timestamp(6)) AND timestamp_col < CAST($4 AS timestamp(6)) AND
enum_col = 'FIRST' AND some_id = "id_collection".some_id_cte;

EXPLAIN EXECUTE my_prepared_statement_multi_partition('2022-01-01 01:00:00',
'2022-01-01 02:00:00', '2022-01-01 06:00:00', '2022-01-01 07:00:00');

When I execute explain with command above. It takes really long time to
execute. Note that on 11.12 Postgres, this executes almost instantly. My
testing machine RAM raises for 2-3 GB while executing this explain (centOS
7). 
When execute same on my Windows machine, I don't see any RAM consumption
issues, but I don't get any result just this and I have to restart
session:
```
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
```

This RAM consumption may be related to partition pruning, but I really don't
know, I hope you can clear things up a bit for me. 

MORE BACKGROUND: 
1. While my query here looks like "With X as (SELECT) -> UPDATE ", my
original query looks like "Create unlogged table AS (With X as (SELECT),
with Y as UPDATE returning * -> SELECT FROM Y" which basically means that I
am updating table and creating new table with updated records. I have tried
to simplify it, hopefully I succeeded.
2. Current workaround is to not use prepared statement inside application
that is executing this problematic query, but we would prefer to use them if
this would be fixed.

Thank you in advance, Damir


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17441: shm_mq receive less data than the sender sends
Next
From: PG Bug reporting form
Date:
Subject: BUG #17443: Select command does not use brin index when enable_seqscan = on