temp table on commit delete rows performance issue - Mailing list pgsql-hackers
From | Floris Van Nee |
---|---|
Subject | temp table on commit delete rows performance issue |
Date | |
Msg-id | 365ab92f507e4ec6addef4286782fe71@Optiver.com Whole thread Raw |
Responses |
Re: temp table on commit delete rows performance issue
Re: temp table on commit delete rows performance issue |
List | pgsql-hackers |
Hi hackers, I'm looking for some input on an issue I've observed. A common pattern I've seen is using temporary tables to put data in before updating the real tables. Something roughly like: On session start: CREATE TEMP TABLE temp_t1 (...) ON COMMIT DELETE ROWS; On update: BEGIN; COPY temp_t1 FROM STDIN (FORMAT BINARY); INSERT INTO t1 (SELECT * FROM temp_t1 ...) ON CONFLICT DO UPDATE SET ...; -- potentially some other operations on temp table to put data into real table t1 COMMIT; This pattern starts to break down under certain exceptional circumstances of high concurrency. The "ON COMMIT DELETE ROWS" does a truncate that is fairly expensive and doesn't work well in high-concurrency scenarios. It's especially noticeable under following circumstances: - high max_connections setting - high number of temp tables per session - concurrent writers at fairly short intervals Impact is on both TPS on primary as well as that the WAL replay process on replica becomes completely overloaded (100% cpu even though not a lot of WAL is being generated) A very simple pgbench example that showcases degradation (taken with max_connections=2000 to clearly show it). test_truncate.sql begin; set client_min_messages=warning; create temp table if not exists ut0 (a int, b text, c text) on commit delete rows; commit; test_no_truncate.sql begin; set client_min_messages=warning; create temp table if not exists ut0 (a int, b text, c text); -- do not do anything on commit commit; pgbench -n -f test_truncate.sql -T 10 postgres -c 1 -j 1 pgbench (18devel) tps = 2693.806376 (without initial connection time) pgbench -n -f test_truncate.sql -T 10 postgres -c10 -j 10 pgbench (18devel) tps = 12119.358458 (without initial connection time) pgbench -n -f test_no_truncate.sql -T 10 postgres -c 1 -j 1 pgbench (18devel) tps = 22685.877281 (without initial connection time) pgbench -n -f test_no_truncate.sql -T 10 postgres -c10 -j 10 pgbench (18devel) tps = 200359.458154 (without initial connection time) For the test_truncate.sql with 10 threads, WAL replay process is spinning at 100% CPU. The reason seems to be that for a large part, the 'on commit delete rows' follows the same code path as a regular truncate. This means: * taking an exclusive lock on the temp table (WAL logged) * invalidating rel cache (WAL logged) On replica process, these exclusive locks are very expensive to replay, leading to spending replica most of its cycles trying to get LWLocks on non-fast-path. On primary, degradation on concurrency I believes comes from the rel cache invalidation. There doesn't seem to be a good reason for this behavior, as the replica has nothing to do for a temporary table truncation, so this lock doesn't need to be WAL logged. The generated WAL is just fully this stuff. LOCK+INVALIDATION+COMMIT. And it's spinning at 100% CPU for trying to replay this. rmgr: Standby len (rec/tot): 42/ 42, tx: 8154875, lsn: 0/3AFFE270, prev 0/3AFFE218, desc: LOCK xid 8154875db 5 rel 18188 rmgr: Standby len (rec/tot): 42/ 42, tx: 8154875, lsn: 0/3AFFE2A0, prev 0/3AFFE270, desc: LOCK xid 8154875db 5 rel 18191 rmgr: Standby len (rec/tot): 42/ 42, tx: 8154875, lsn: 0/3AFFE2D0, prev 0/3AFFE2A0, desc: LOCK xid 8154875db 5 rel 18192 rmgr: Transaction len (rec/tot): 62/ 62, tx: 8154875, lsn: 0/3AFFE300, prev 0/3AFFE2D0, desc: INVALIDATION ; invalmsgs: relcache 18191 relcache 18192 rmgr: Transaction len (rec/tot): 82/ 82, tx: 8154875, lsn: 0/3AFFE340, prev 0/3AFFE300, desc: COMMIT 2024-07-1612:56:42.878147 CEST; inval msgs: relcache 18191 relcache 18192 rmgr: Standby len (rec/tot): 42/ 42, tx: 8154876, lsn: 0/3AFFE398, prev 0/3AFFE340, desc: LOCK xid 8154876db 5 rel 18188 rmgr: Standby len (rec/tot): 42/ 42, tx: 8154876, lsn: 0/3AFFE3C8, prev 0/3AFFE398, desc: LOCK xid 8154876db 5 rel 18191 rmgr: Standby len (rec/tot): 42/ 42, tx: 8154876, lsn: 0/3AFFE3F8, prev 0/3AFFE3C8, desc: LOCK xid 8154876db 5 rel 18192 rmgr: Transaction len (rec/tot): 62/ 62, tx: 8154876, lsn: 0/3AFFE428, prev 0/3AFFE3F8, desc: INVALIDATION ; invalmsgs: relcache 18191 relcache 18192 rmgr: Transaction len (rec/tot): 82/ 82, tx: 8154876, lsn: 0/3AFFE468, prev 0/3AFFE428, desc: COMMIT 2024-07-1612:56:42.878544 CEST; inval msgs: relcache 18191 relcache 18192 Have people observed this behavior before? Would the community be open to accepting a patch that changes the behavior of ON COMMIT ROWS to not WAL log the lock taken and/or the relcache inval? I think: * An exclusive lock needs to be taken on primary, but does not need to be WAL logged * Rel cache invalidation should not be necessary I think (it currently happens just on the TOAST table+index, not on the regular TEMP table) -Floris
pgsql-hackers by date:
Previous
From: YasirDate:
Subject: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated
Next
From: Yasir ShahDate:
Subject: Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows