Thread: many backends hang on MultiXactOffsetSLRU
Hi experts,
we have a Postgresql v14.8 database, almost thousands of backends hang on MultiXactOffsetSLRU at the same time, all of these sessions running same query "SELECT ....", from OS and postgresql slow log, we found all of these query on "BIND" stage.
LOG: duration: 36631.688 ms bind S_813: SELECT
LOG: duration: 36859.786 ms bind S_1111: SELECT
LOG: duration: 35868.148 ms bind <unnamed>: SELECT
LOG: duration: 36906.471 ms bind <unnamed>: SELECT
LOG: duration: 35955.489 ms bind <unnamed>: SELECT
LOG: duration: 36833.510 ms bind <unnamed>: SELECT
LOG: duration: 36839.535 ms bind S_1219: SELECT
LOG: duration: 36859.786 ms bind S_1111: SELECT
LOG: duration: 35868.148 ms bind <unnamed>: SELECT
LOG: duration: 36906.471 ms bind <unnamed>: SELECT
LOG: duration: 35955.489 ms bind <unnamed>: SELECT
LOG: duration: 36833.510 ms bind <unnamed>: SELECT
LOG: duration: 36839.535 ms bind S_1219: SELECT
...
this database hang on MultiXactOffsetSLRU and MultiXactOffsetBuffer long time.
could you direct me why they are hanging on 'BIND‘ stage with MultiXactOffsetSLRU ?
Thanks,
James
Hi,
I encountered this in a project we migrated to PostgreSQL before, and unfortunately, it’s a situation that completely degrades performance. We identified the cause as savepoints being used excessively and without control. Once they reduced the number of savepoints, the issue was resolved. However, the documentation also mentions that it could be caused by foreign keys.
Kind regards..
James Pang <jamespang886@gmail.com>, 10 Eyl 2024 Sal, 10:33 tarihinde şunu yazdı:
Hi experts,we have a Postgresql v14.8 database, almost thousands of backends hang on MultiXactOffsetSLRU at the same time, all of these sessions running same query "SELECT ....", from OS and postgresql slow log, we found all of these query on "BIND" stage.LOG: duration: 36631.688 ms bind S_813: SELECT
LOG: duration: 36859.786 ms bind S_1111: SELECT
LOG: duration: 35868.148 ms bind <unnamed>: SELECT
LOG: duration: 36906.471 ms bind <unnamed>: SELECT
LOG: duration: 35955.489 ms bind <unnamed>: SELECT
LOG: duration: 36833.510 ms bind <unnamed>: SELECT
LOG: duration: 36839.535 ms bind S_1219: SELECT...this database hang on MultiXactOffsetSLRU and MultiXactOffsetBuffer long time.could you direct me why they are hanging on 'BIND‘ stage with MultiXactOffsetSLRU ?Thanks,James
On 2024-Sep-10, James Pang wrote: > Hi experts, > we have a Postgresql v14.8 database, almost thousands of backends hang > on MultiXactOffsetSLRU at the same time, all of these sessions running same > query "SELECT ....", from OS and postgresql slow log, we found all of these > query on "BIND" stage. > LOG: duration: 36631.688 ms bind S_813: SELECT > LOG: duration: 36859.786 ms bind S_1111: SELECT > LOG: duration: 35868.148 ms bind <unnamed>: SELECT > LOG: duration: 36906.471 ms bind <unnamed>: SELECT > LOG: duration: 35955.489 ms bind <unnamed>: SELECT > LOG: duration: 36833.510 ms bind <unnamed>: SELECT > LOG: duration: 36839.535 ms bind S_1219: SELECT > ... > > this database hang on MultiXactOffsetSLRU and MultiXactOffsetBuffer long > time. > > could you direct me why they are hanging on 'BIND‘ stage with > MultiXactOffsetSLRU ? Very likely, it's related to this problem [1] https://thebuild.com/blog/2023/01/18/a-foreign-key-pathology-to-avoid/ This is caused by a suboptimal implementation of what we call SLRU, which multixact uses underneath. For years, many people dodged this problem by recompiling with a changed value for NUM_MULTIXACTOFFSET_BUFFERS in src/include/access/multixact.h (it was originally 8 buffers, which is very small); you'll need to do that in all releases up to pg16. In pg17 this was improved[2] and you'll be able to change the value in postgresql.conf, though the default already being larger than the original (16 instead of 8), you may not need to. [2] https://pgconf.in/files/presentations/2023/Dilip_Kumar_RareExtremelyChallengingPostgresPerformanceProblems.pdf [3] https://www.pgevents.ca/events/pgconfdev2024/schedule/session/53-problem-in-postgresql-slru-and-how-we-are-optimizing-it/ -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La victoria es para quien se atreve a estar solo"
On 2024-Sep-10, Amine Tengilimoglu wrote: > Hi, > > I encountered this in a project we migrated to PostgreSQL > before, and unfortunately, it’s a situation that completely degrades > performance. We identified the cause as savepoints being used excessively > and without control. Once they reduced the number of savepoints, the issue > was resolved. However, the documentation also mentions that it could be > caused by foreign keys. Yeah, it's exactly the same problem; when it comes from savepoints the issue is pg_subtrans, and when foreign keys are involved, it is pg_multixact. Both of those use the SLRU subsystem, which was heavily modified in pg17 as I mentioned in my reply to James. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I think my standards have lowered enough that now I think 'good design' is when the page doesn't irritate the living f*ck out of me." (JWZ)
There is no foreign keys, but there is one session who did transactions to tables with savepoints, one savepoints/per sql in same transaction. But sessions with query "SELECT “ do not use savepoints , just with a lot of sessions running same query and hang on MultiXact suddenly. even only one session doing DML with savepoints , and all other queries sessions can see this kind of "MultiXact" waiting ,right?
James Pang <jamespang886@gmail.com> 於 2024年9月10日週二 下午4:26寫道:
There is no foreign keys, but there are several sessions who did transactions to tables with savepoints, one savepoints/per sql in same transaction. But sessions with query "SELECT “ do not use savepoints , just with a lot of sessions running same query and hang on MultiXact suddenly.Alvaro Herrera <alvherre@alvh.no-ip.org> 於 2024年9月10日週二 下午4:15寫道:On 2024-Sep-10, Amine Tengilimoglu wrote:
> Hi,
>
> I encountered this in a project we migrated to PostgreSQL
> before, and unfortunately, it’s a situation that completely degrades
> performance. We identified the cause as savepoints being used excessively
> and without control. Once they reduced the number of savepoints, the issue
> was resolved. However, the documentation also mentions that it could be
> caused by foreign keys.
Yeah, it's exactly the same problem; when it comes from savepoints the
issue is pg_subtrans, and when foreign keys are involved, it is
pg_multixact. Both of those use the SLRU subsystem, which was heavily
modified in pg17 as I mentioned in my reply to James.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)
I hadn't found a satisfactory explanation about the top limit related to SLRU, so this document will be useful. It's a nice development that the relevant limit has been increased in pg17; I hope I don't encounter a situation where this limit is exceeded in large systems.
Kind regards..
James Pang <jamespang886@gmail.com>, 10 Eyl 2024 Sal, 11:35 tarihinde şunu yazdı:
There is no foreign keys, but there is one session who did transactions to tables with savepoints, one savepoints/per sql in same transaction. But sessions with query "SELECT “ do not use savepoints , just with a lot of sessions running same query and hang on MultiXact suddenly. even only one session doing DML with savepoints , and all other queries sessions can see this kind of "MultiXact" waiting ,right?James Pang <jamespang886@gmail.com> 於 2024年9月10日週二 下午4:26寫道:There is no foreign keys, but there are several sessions who did transactions to tables with savepoints, one savepoints/per sql in same transaction. But sessions with query "SELECT “ do not use savepoints , just with a lot of sessions running same query and hang on MultiXact suddenly.Alvaro Herrera <alvherre@alvh.no-ip.org> 於 2024年9月10日週二 下午4:15寫道:On 2024-Sep-10, Amine Tengilimoglu wrote:
> Hi,
>
> I encountered this in a project we migrated to PostgreSQL
> before, and unfortunately, it’s a situation that completely degrades
> performance. We identified the cause as savepoints being used excessively
> and without control. Once they reduced the number of savepoints, the issue
> was resolved. However, the documentation also mentions that it could be
> caused by foreign keys.
Yeah, it's exactly the same problem; when it comes from savepoints the
issue is pg_subtrans, and when foreign keys are involved, it is
pg_multixact. Both of those use the SLRU subsystem, which was heavily
modified in pg17 as I mentioned in my reply to James.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)
On 2024-Sep-10, James Pang wrote: > There is no foreign keys, but there is one session who did transactions > to tables with savepoints, one savepoints/per sql in same transaction. But > sessions with query "SELECT “ do not use savepoints , just with a lot of > sessions running same query and hang on MultiXact suddenly. even only one > session doing DML with savepoints , and all other queries sessions can see > this kind of "MultiXact" waiting ,right? I think SELECT FOR UPDATE combined with savepoints can create multixacts, in absence of foreign keys. A query that's waiting doesn't need to have *created* the multixact or subtrans -- it is sufficient that it's forced to look it up. If thousands of sessions tried to look up different multixact values (spread across more than 8 pages), then thrashing of the cache would result, with catastrophic performance. This can probably be caused by some operation that creates one multixact per tuple in a few thousand tuples. Maybe you could ease this by doing VACUUM on the table (perhaps with a low multixact freeze age), which might remove some of the multixacts. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Para tener más hay que desear menos"
most of query sessions using jdbc connections, the one who use ODBC one savepoint/per statement, but it does not run any "select for update; savepoint;update", since row lock conflict, so not easy to touch same row with update/delete, no idea how that create multixact? a MultiXact may contain an update or delete Xid. ?
in this server, we see thousands of session hang on ‘MultixactOffsetSLRU" but they are in " bind " stage instead of "execute", why a backend in "bind" need to access Multixact?
Thanks,
James
Alvaro Herrera <alvherre@alvh.no-ip.org> 於 2024年9月10日週二 下午5:00寫道:
On 2024-Sep-10, James Pang wrote:
> There is no foreign keys, but there is one session who did transactions
> to tables with savepoints, one savepoints/per sql in same transaction. But
> sessions with query "SELECT “ do not use savepoints , just with a lot of
> sessions running same query and hang on MultiXact suddenly. even only one
> session doing DML with savepoints , and all other queries sessions can see
> this kind of "MultiXact" waiting ,right?
I think SELECT FOR UPDATE combined with savepoints can create
multixacts, in absence of foreign keys.
A query that's waiting doesn't need to have *created* the multixact or
subtrans -- it is sufficient that it's forced to look it up.
If thousands of sessions tried to look up different multixact values
(spread across more than 8 pages), then thrashing of the cache would
result, with catastrophic performance. This can probably be caused by
some operation that creates one multixact per tuple in a few thousand
tuples.
Maybe you could ease this by doing VACUUM on the table (perhaps with a
low multixact freeze age), which might remove some of the multixacts.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"