Thread: BUG #16260: Repetitive accessing to partitioned table inside transaction causes server process crash
BUG #16260: Repetitive accessing to partitioned table inside transaction causes server process crash
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16260 Logged by: Michael Email address: mingjuwu0505@gmail.com PostgreSQL version: 12.2 Operating system: Mint Linux 18.3 Description: Hello, Accessing to a partitioned table with large partitions (20K) inside a transaction can cause postgres to use large amount of memory (>30GB) and ultimately leading to server process crash. The system that I am using has 48GB physical memory and 72GB swap. Setup and stress testings steps are attached below. With default configuration of "max_locks_per_transaction", it will show: ERROR: out of shared memory HINT: You might need to increase max_locks_per_transaction. After increasing "max_locks_per_transaction" to 6400, while stress() is running, the postgres process will continue to consume more memory. Ultimately, in pgAdmin 4, console will show "Connection to the server has been lost." The following messages can be found in postgres log: 2020-02-16 17:01:35.449 CST [22652] LOG: server process (PID 23149) was terminated by signal 9: Killed 2020-02-16 17:01:35.449 CST [22652] DETAIL: Failed process was running: call stress(); 2020-02-16 17:01:35.449 CST [22652] LOG: terminating any other active server processes 2020-02-16 17:01:35.458 CST [22659] WARNING: terminating connection because of crash of another server process 2020-02-16 17:01:35.458 CST [22659] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2020-02-16 17:01:35.458 CST [22659] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-02-16 17:01:35.487 CST [22664] postgres@17ptt_test_db WARNING: terminating connection because of crash of another server process 2020-02-16 17:01:35.487 CST [22664] postgres@17ptt_test_db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2020-02-16 17:01:35.487 CST [22664] postgres@17ptt_test_db HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-02-16 17:01:35.491 CST [22652] LOG: all server processes terminated; reinitializing 2020-02-16 17:01:35.652 CST [4757] LOG: database system was interrupted; last known up at 2020-02-16 16:56:26 CST 2020-02-16 17:01:35.939 CST [4758] postgres@17ptt_test_db FATAL: the database system is in recovery mode 2020-02-16 17:01:35.940 CST [4759] postgres@17ptt_test_db FATAL: the database system is in recovery mode 2020-02-16 17:01:38.622 CST [4757] LOG: database system was not properly shut down; automatic recovery in progress 2020-02-16 17:01:38.630 CST [4757] LOG: redo starts at 9/D5353028 2020-02-16 17:01:38.630 CST [4757] LOG: invalid record length at 9/D5353110: wanted 24, got 0 2020-02-16 17:01:38.630 CST [4757] LOG: redo done at 9/D53530D8 2020-02-16 17:01:38.684 CST [22652] LOG: database system is ready to accept connections In the stress() procedure, replacing DELETE with UPDATE statement has same effect. Best Regards, Michael --------- -- SETUP --------- CREATE TABLE test_table ( id bigint NOT NULL, category bigint NOT NULL ) PARTITION BY LIST (category); CREATE OR REPLACE PROCEDURE create_test_table_paritions() AS $$ DECLARE counter INTEGER := 1; BEGIN WHILE counter <= 20000 LOOP EXECUTE 'CREATE TABLE test_table_' || counter || ' PARTITION OF test_table FOR VALUES IN (' || counter || ')'; counter := counter + 1; END LOOP; END $$ LANGUAGE plpgsql; call create_test_table_paritions(); -------------- -- STRESS TEST -------------- CREATE OR REPLACE PROCEDURE stress() AS $$ DECLARE counter INTEGER := 1; BEGIN WHILE counter <= 20000 LOOP DELETE FROM test_table WHERE category = counter; raise notice '%', counter; counter := counter + 1; END LOOP; END $$ LANGUAGE plpgsql; call stress();
Re: BUG #16260: Repetitive accessing to partitioned table inside transaction causes server process crash
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Accessing to a partitioned table with large partitions (20K) inside a > transaction can cause postgres to use large amount of memory (>30GB) and > ultimately leading to server process crash. This isn't particularly surprising. > 2020-02-16 17:01:35.449 CST [22652] LOG: server process (PID 23149) was > terminated by signal 9: Killed Notice that this is an external SIGKILL, not an internal-to-postgres crash. Presumably, the process got killed by Linux's infamous OOM killer. We typically recommend that Postgres servers be set up to avoid OOM kills by disabling kernel memory overcommit [1]. However, the fundamental problem here is that lots of partitions translates to lots of memory usage. Our current recommendation is to not exceed "a few thousand" partitions [2]. There has been considerable work already on reducing the memory consumption for heavily-partitioned queries, and we'll continue to work on it, but don't hold your breath --- and definitely don't expect that there will be any back-patched bug fixes for it. Significant forward progress will probably require major rewrites, on the scale of what's discussed in [3]. In short: use fewer partitions. regards, tom lane [1] https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT [2] https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES [3] https://www.postgresql.org/message-id/flat/357.1550612935%40sss.pgh.pa.us