Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding and replication of sequences, take 2 |
Date | |
Msg-id | e66ff0de-6f2d-baf9-aee9-ba412868df75@enterprisedb.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences, take 2 (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: logical decoding and replication of sequences, take 2
RE: logical decoding and replication of sequences, take 2 |
List | pgsql-hackers |
On 11/29/23 15:41, Tomas Vondra wrote: > ... >> >> One thing that worries me about that approach is that it can suck with >> the workload that has a lot of DDLs that create XLOG_SMGR_CREATE >> records. We have previously fixed some such workloads in logical >> decoding where decoding a transaction containing truncation of a table >> with a lot of partitions (1000 or more) used to take a very long time. >> Don't we face performance issues in such scenarios? >> > > I don't think we do, really. We will have to decode the SMGR records and > add the relfilenodes to the hash table(s), but I think that affects the > lookup performance too much. What I think might be a problem is if we > have many top-level transactions, especially if those transactions do > something that creates a relfilenode. Because then we'll have to do a > hash_search for each of them, and that might be measurable even if each > lookup is O(1). And we do the lookup for every sequence change ... > I did some micro-benchmarking today, trying to identify cases where this would cause unexpected problems, either due to having to maintain all the relfilenodes, or due to having to do hash lookups for every sequence change. But I think it's fine, mostly ... I did all the following tests with 64 clients. I may try more, but even with this there should be fair number of concurrent transactions, which determines the number of top-level transactions in reorderbuffer. I'll try with more clients tomorrow, but I don't think it'll change stuff. The test is fairly simple - run a particular number of transactions (might be 1000 * 64, or more). And then measure how long it takes to decode the changes using test_decoding. Now, the various workloads I tried: 1) "good case" - small OLTP transactions, a couple nextval('s') calls begin; insert into t (1); select nextval('s'); insert into t (1); commit; This is pretty fine, the sequence part of reorderbuffer is really not measurable, it's like 1% of the total CPU time. Which is expected, because we only wal-log every 32-nd increment or so. 2) "good case" - same as (1) but more nextval calls to always do wal begin; insert into t (1); select nextval('s') from generate_series(1,40); insert into t (1); commit; Here sequences are more measurable, it's like 15% of CPU time, but most of that comes to AbortCurrentTransaction() in the non-transactional branch of ReorderBufferQueueSequence. I don't think there's a way around that, and it's entirely unrelated to relfilenodes. The function checking if the change is transactional (ReorderBufferSequenceIsTransactional) is less than 1% of the profile - and this is the version that always walks all top-level transactions. 3) "bad case" - small transactions that generate a lot of relfilenodes select alter_sequence(); where the function is defined like this (I did create 1000 sequences before the test): CREATE OR REPLACE FUNCTION alter_sequence() RETURNS void AS $$ DECLARE v INT; BEGIN v := 1 + (random() * 999)::int; execute format('alter sequence s%s restart with 1000', v); perform nextval('s'); END; $$ LANGUAGE plpgsql; This performs terribly, but it's entirely unrelated to sequences. Current master has exactly the same problem, if transactions do DDL. Like this, for example: CREATE OR REPLACE FUNCTION create_table() RETURNS void AS $$ DECLARE v INT; BEGIN v := 1 + (random() * 999)::int; execute format('create table t%s (a int)', v); execute format('drop table t%s', v); insert into t values (1); END; $$ LANGUAGE plpgsql; This has the same impact on master. The perf report shows this: --98.06%--pg_logical_slot_get_changes_guts | --97.88%--LogicalDecodingProcessRecord | --97.56%--xact_decode | --97.51%--DecodeCommit | |--91.92%--SnapBuildCommitTxn | | | --91.65%--SnapBuildBuildSnapshot | | | --91.14%--pg_qsort The sequence decoding is maybe ~1%. The reason why SnapBuildSnapshot takes so long is because: ----------------- Breakpoint 1, SnapBuildBuildSnapshot (builder=0x21f60f8) at snapbuild.c:498 498 + sizeof(TransactionId) * builder->committed.xcnt (gdb) p builder->committed.xcnt $4 = 11532 ----------------- And with each iteration it grows by 1. That looks quite weird, possibly a bug worth fixing, but unrelated to this patch. I can't investigate this more at the moment, not sure when/if I'll get to that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: