Re: Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded - Mailing list pgsql-general

From Jim Vanns
Subject Re: Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded
Date
Msg-id CAH7vdhP1yi0HvWhaPigZ7xnERBLBBgvG29H0Nv4wPEWODoA4XA@mail.gmail.com
Whole thread Raw
In response to Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded  (Jim Vanns <jvanns@ilm.com>)
List pgsql-general
To clear up any confusion, I tried the reductions as hinted in the
error message and after a few iterations still faced the same error.
Also, re. the sample configuration I provided - this is the result of
said iterations after initially using the stock values from the PGDG
RPM.

Jim

On Fri, 9 Aug 2024 at 11:26, Jim Vanns <jvanns@ilm.com> wrote:
>
> Hi pggen community!
>
> I am struggling with this error almost daily now and despite various efforts, not succeeding in avoiding or dealing
withit; 
>
> ERROR:  multixact "members" limit exceeded
> DETAIL:  This command would create a multixact with 2 members, but the remaining space is only enough for 0 members.
> HINT:  Execute a database-wide VACUUM in database with OID 16467 with reduced vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_agesettings. 
>
> Runtime details follow below, but before that; I am now seeing the above error almost daily after approximately 12
hoursof normal or expected behaviour and throughput. Then it hits and all writes are blocked etc. and the service is
largelyunusable/unable to recover. Restarting PG does allow autovacuum processes to kick in with aggressive vacuuming
tohandle the multixact freezing, but that isn't a suitable solution! Although having read sources that now explain why
multixactXIDs exist and when they're used, I am not able to properly figure out how to plan for it or configure
postgresqlappropriately to handle it given our workload. 
>
> My questions are;
>
> 1) How should I be managing this? Although not new to PG, I am new to this particular problem.
> 2) How can I confirm what processes/tables are contributing to this multixact "members" limit?
> 3) What are the units for vacuum_multixact_freeze_min_age/vacuum_multixact_freeze_table_age and how should I be
>     setting them appropriately for my rates etc.? I can't really find anything that explains this clearly.
> 4) How can I check that autovacuum workers are specifically able to freeze multixact XIDs and thus avoid this?
> 5) Can I check if autovacuum is actually succeeding in its work?
>
> Schema (brief):
> 10 tables
> 1 table (the 'hottest) partitioned by 2h ranges; so an extra 12x partitions/relations a day, though only the most
recentone might be considered 'active' 
>
> System (brief):
> PG: 15.5 w/ TimescaleDB 2.14 extension
> Write heavy workload;
> Mean Txn/s (low):   8k
> Mean Txn/s (high): 10k
> Mean rows/s: 100k
> Concurrency: 32 threads (local socket sessions) for 'upserts' via primary service plus auxiliary processes
(backgroundworkers/autovacuum workers etc.) 
>
> Pattern (brief):
> COPY (binary) to session-local temporary tables (ON COMMIT DELETE)
> INSERT FROM tt TO main_table(s)
> UPDATE FROM tt TO main_table(s)
> VACUUM tt (every 100k txns)
>
> Config (excerpt):
> # - Transactions - (based on a period mean of ~8k txn/s)
> # See/ref;
> # www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
> # www.percona.com/blog/tuning-autovacuum-in-postgresql-and-autovacuum-internals
> # blog.sentry.io/transaction-id-wraparound-in-postgres
> # https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres
> # https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-performance
> # What unit is 'age' here? No. of. transactions?
> vacuum_freeze_min_age = 28800000 # 1h @ 8k/s
> vacuum_multixact_freeze_min_age = 28800000 # 1h @ 8k/s
> autovacuum_freeze_max_age = 157600000 # 2h @ 8k/s
> autovacuum_multixact_freeze_max_age = 57600000 # 2h @ 8k/s
> vacuum_multixact_freeze_table_age = 115200000 # 4h @ 8k/s
>
> # We don't UPDATE or DELETE often; we never DELETE and only perform
> # UPDATE operations every 6h due to internal cache(s). So we set this to
> # zero so the formula doesn't consider it a major factor
> autovacuum_vacuum_threshold = 0 # In tuples/rows UPDATEd or DELETEd
> autovacuum_vacuum_insert_threshold = 5000 # In tuples/rows INSERTed
>
> autovacuum_vacuum_scale_factor = 0.1 # 10%
> autovacuum_analyze_scale_factor = 0.1 # 10%
> autovacuum_vacuum_insert_scale_factor = 0.1 # 10%
>
> autovacuum_naptime = 60
> autovacuum_max_workers = 8
>
> # Give autovacuum more credits to ensure a better chance at scanning
> autovacuum_vacuum_cost_limit = 2000 # 10x the default
>
> Thanks for your help and any guidance/knowledge you can share!
>
> Jim
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London



pgsql-general by date:

Previous
From: Dominique Devienne
Date:
Subject: Re: libpq version macro to use or not PQsocketPoll
Next
From: GF
Date:
Subject: Re: Getting specific partition from the partition name