Thread: suboverflowed subtransactions concurrency performance optimize
Hi hackers, I wrote a patch to resolve the subtransactions concurrency performance problems when suboverflowed. When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per transaction concurrency, it will lead to subtransactions performance problems. All backend will be stuck at acquiring lock SubtransSLRULock. The reproduce steps in PG master branch: 1, init a cluster, append postgresql.conf as below: max_connections = '2500' max_files_per_process = '2000' max_locks_per_transaction = '64' max_parallel_maintenance_workers = '8' max_parallel_workers = '60' max_parallel_workers_per_gather = '6' max_prepared_transactions = '15000' max_replication_slots = '10' max_wal_senders = '64' max_worker_processes = '250' shared_buffers = 8GB 2, create table and insert some records as below: CREATE UNLOGGED TABLE contend ( id integer, val integer NOT NULL ) WITH (fillfactor='50'); INSERT INTO contend (id, val) SELECT i, 0 FROM generate_series(1, 10000) AS i; VACUUM (ANALYZE) contend; 3, The script subtrans_128.sql in attachment. use pgbench with subtrans_128.sql as below. pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T 3600 4, After for a while, we can get the stuck result. We can query pg_stat_activity. All backends wait event is SubtransSLRULock. We can use pert top and try find the root cause. The result of perf top as below: 66.20% postgres [.] pg_atomic_compare_exchange_u32_impl 29.30% postgres [.] pg_atomic_fetch_sub_u32_impl 1.46% postgres [.] pg_atomic_read_u32 1.34% postgres [.] TransactionIdIsCurrentTransactionId 0.75% postgres [.] SimpleLruReadPage_ReadOnly 0.14% postgres [.] LWLockAttemptLock 0.14% postgres [.] LWLockAcquire 0.12% postgres [.] pg_atomic_compare_exchange_u32 0.09% postgres [.] HeapTupleSatisfiesMVCC 0.06% postgres [.] heapgetpage 0.03% postgres [.] sentinel_ok 0.03% postgres [.] XidInMVCCSnapshot 0.03% postgres [.] slot_deform_heap_tuple 0.03% postgres [.] ExecInterpExpr 0.02% postgres [.] AllocSetCheck 0.02% postgres [.] HeapTupleSatisfiesVisibility 0.02% postgres [.] LWLockRelease 0.02% postgres [.] TransactionIdPrecedes 0.02% postgres [.] SubTransGetParent 0.01% postgres [.] heapgettup_pagemode 0.01% postgres [.] CheckForSerializableConflictOutNeeded After view the subtrans codes, it is easy to find that the global LWLock SubtransSLRULock is the bottleneck of subtrans concurrently. When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions, we will get a snapshot with suboverflowed. A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf output. To resolve this performance problem, we think about a solution which cache SubtransSLRU to local cache. First we can query parent transaction id from SubtransSLRU, and copy the SLRU page to local cache page. After that if we need query parent transaction id again, we can query it from local cache directly. It will reduce the number of acquire and release LWLock SubtransSLRULock observably. From all snapshots, we can get the latest xmin. All transaction id which precedes this xmin, it muse has been commited/abortd. Their parent/top transaction has been written subtrans SLRU. Then we can cache the subtrans SLRU and copy it to local cache. Use the same produce steps above, with our patch we cannot get the stuck result. Note that append our GUC parameter in postgresql.conf. This optimize is off in default. local_cache_subtrans_pages=128 The patch is base on PG master branch 0d906b2c0b1f0d625ff63d9ace906556b1c66a68 Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us, AntDB, AsiaInfo's PG-based distributed database product Thanks Pengcheng
Attachment
Hi Pengcheng! You are solving important problem, thank you! > 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > To resolve this performance problem, we think about a solution which cache > SubtransSLRU to local cache. > First we can query parent transaction id from SubtransSLRU, and copy the > SLRU page to local cache page. > After that if we need query parent transaction id again, we can query it > from local cache directly. A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representationof SLRU? JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0]. Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not passthe moderation. [1] Also it's important to note that there was a community request to move SLRUs to shared_buffers [2]. Thanks! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/34/2627/ [1] https://www.postgresql.org/message-id/flat/BE73A0BB-5929-40F4-BAF8-55323DE39561%40yandex-team.ru [2] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
Hi Andrey, Thanks a lot for your replay and reference information. The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically. If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory. So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache. And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previousemail. I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice. I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULockfirst, then call SubTransGetParent in loop. Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop. After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result. [1] solution. Actually first, we try to use Buffer manager to replace SLRU for subtrans too. And we have implemented it. With the test case which I mentioned in previous mail, It was still stuck. In default there is 2048 subtrans in one page. When some processes get the top transaction in one page, they should pin/unpin and lock/unlock the same page repeatedly. I found than it was stuck at pin/unpin page for some backends. Compare test results, pgbench with subtrans_128.sql Concurrency PG master PG master with path[0] Local cache optimize 300 stuck stuck no stuck 500 stuck stuck no stuck 1000 stuck stuck no stuck Maybe we can test different approach with my test case. For massive concurrency, if it will not be stuck, we can get agood solution. [0] https://commitfest.postgresql.org/34/2627/ [1] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com Thanks Pengcheng -----Original Message----- From: Andrey Borodin <x4mmm@yandex-team.ru> Sent: 2021年8月30日 18:25 To: Pengchengliu <pengchengliu@tju.edu.cn> Cc: pgsql-hackers@postgresql.org Subject: Re: suboverflowed subtransactions concurrency performance optimize Hi Pengcheng! You are solving important problem, thank you! > 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > To resolve this performance problem, we think about a solution which > cache SubtransSLRU to local cache. > First we can query parent transaction id from SubtransSLRU, and copy > the SLRU page to local cache page. > After that if we need query parent transaction id again, we can query > it from local cache directly. A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representationof SLRU? JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0]. Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not passthe moderation. [1] Also it's important to note that there was a community request to move SLRUs to shared_buffers [2]. Thanks! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/34/2627/ [1] https://www.postgresql.org/message-id/flat/BE73A0BB-5929-40F4-BAF8-55323DE39561%40yandex-team.ru [2] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
On Mon, Aug 30, 2021 at 1:43 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
Hi hackers,
I wrote a patch to resolve the subtransactions concurrency performance
problems when suboverflowed.
When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
transaction concurrency, it will lead to subtransactions performance
problems.
All backend will be stuck at acquiring lock SubtransSLRULock.
The reproduce steps in PG master branch:
1, init a cluster, append postgresql.conf as below:
max_connections = '2500'
max_files_per_process = '2000'
max_locks_per_transaction = '64'
max_parallel_maintenance_workers = '8'
max_parallel_workers = '60'
max_parallel_workers_per_gather = '6'
max_prepared_transactions = '15000'
max_replication_slots = '10'
max_wal_senders = '64'
max_worker_processes = '250'
shared_buffers = 8GB
2, create table and insert some records as below:
CREATE UNLOGGED TABLE contend (
id integer,
val integer NOT NULL
)
WITH (fillfactor='50');
INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;
VACUUM (ANALYZE) contend;
3, The script subtrans_128.sql in attachment. use pgbench with
subtrans_128.sql as below.
pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T
3600
4, After for a while, we can get the stuck result. We can query
pg_stat_activity. All backends wait event is SubtransSLRULock.
We can use pert top and try find the root cause. The result of perf top
as below:
66.20% postgres [.] pg_atomic_compare_exchange_u32_impl
29.30% postgres [.] pg_atomic_fetch_sub_u32_impl
1.46% postgres [.] pg_atomic_read_u32
1.34% postgres [.] TransactionIdIsCurrentTransactionId
0.75% postgres [.] SimpleLruReadPage_ReadOnly
0.14% postgres [.] LWLockAttemptLock
0.14% postgres [.] LWLockAcquire
0.12% postgres [.] pg_atomic_compare_exchange_u32
0.09% postgres [.] HeapTupleSatisfiesMVCC
0.06% postgres [.] heapgetpage
0.03% postgres [.] sentinel_ok
0.03% postgres [.] XidInMVCCSnapshot
0.03% postgres [.] slot_deform_heap_tuple
0.03% postgres [.] ExecInterpExpr
0.02% postgres [.] AllocSetCheck
0.02% postgres [.] HeapTupleSatisfiesVisibility
0.02% postgres [.] LWLockRelease
0.02% postgres [.] TransactionIdPrecedes
0.02% postgres [.] SubTransGetParent
0.01% postgres [.] heapgettup_pagemode
0.01% postgres [.] CheckForSerializableConflictOutNeeded
After view the subtrans codes, it is easy to find that the global LWLock
SubtransSLRULock is the bottleneck of subtrans concurrently.
When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
subtransactions, we will get a snapshot with suboverflowed.
A suboverflowed snapshot does not contain all data required to determine
visibility, so PostgreSQL will occasionally have to resort to pg_subtrans.
These pages are cached in shared buffers, but you can see the overhead of
looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
output.
To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache.
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.
It will reduce the number of acquire and release LWLock SubtransSLRULock
observably.
From all snapshots, we can get the latest xmin. All transaction id which
precedes this xmin, it muse has been commited/abortd.
Their parent/top transaction has been written subtrans SLRU. Then we can
cache the subtrans SLRU and copy it to local cache.
Use the same produce steps above, with our patch we cannot get the stuck
result.
Note that append our GUC parameter in postgresql.conf. This optimize is off
in default.
local_cache_subtrans_pages=128
The patch is base on PG master branch
0d906b2c0b1f0d625ff63d9ace906556b1c66a68
Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us,
AntDB, AsiaInfo's PG-based distributed database product
Thanks
Pengcheng
Hi,
+ uint16 valid_offset; /* how many entry is valid */
how many entry is -> how many entries are
+int slru_subtrans_page_num = 32;
Looks like the variable represents the number of subtrans pages. Maybe name the variable slru_subtrans_page_count ?
+ if (lbuffer->in_htab == false)
The condition can be written as 'if (!lbuffer->in_htab)'
For SubtransAllocLocalBuffer(), you can enclose the body of method in while loop so that you don't use goto statement.
Cheers
Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previousmessage as plain text again. Sorry for the noise. > 31 авг. 2021 г., в 11:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > Hi Andrey, > Thanks a lot for your replay and reference information. > > The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically. > If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory. > So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache. > And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previousemail. > > I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice. > I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULockfirst, then call SubTransGetParent in loop. > Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop. > After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result. SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page fromdisk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections. I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock intoarray of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approachwe will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked havingmy own banks, but consistency worth it anyway. Thanks! Best regards, Andrey Borodin.
Hi Andrey, > I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock intoarray of locks I agree with you. If we can resolve the performance issue with this approach, It should be a good solution. > one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch) I have tested with this patch. And I have modified NUM_SUBTRANS_BUFFERS to 128. With 500 concurrence, it would not bestuck indeed. But the performance is very bad. For a sequence scan table, it uses more than one minute. I think it is unacceptable in a production environment. postgres=# select count(*) from contend ; count ------- 10127 (1 row) Time: 86011.593 ms (01:26.012) postgres=# select count(*) from contend ; count ------- 10254 (1 row) Time: 79399.949 ms (01:19.400) With my local subtrans optimize approach, the same env and the same test script and 500 concurrence, a sequence scan, ituses only less than 10 seconds. postgres=# select count(*) from contend ; count ------- 10508 (1 row) Time: 7104.283 ms (00:07.104) postgres=# select count(*) from contend ; count ------- 13175 (1 row) Time: 6602.635 ms (00:06.603) Thanks Pengcheng -----Original Message----- From: Andrey Borodin <x4mmm@yandex-team.ru> Sent: 2021年9月3日 14:51 To: Pengchengliu <pengchengliu@tju.edu.cn> Cc: pgsql-hackers@postgresql.org Subject: Re: suboverflowed subtransactions concurrency performance optimize Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previousmessage as plain text again. Sorry for the noise. > 31 авг. 2021 г., в 11:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > Hi Andrey, > Thanks a lot for your replay and reference information. > > The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically. > If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory. > So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache. > And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previousemail. > > I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice. > I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULockfirst, then call SubTransGetParent in loop. > Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop. > After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result. SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page fromdisk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections. I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock intoarray of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approachwe will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked havingmy own banks, but consistency worth it anyway. Thanks! Best regards, Andrey Borodin.
On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > Hi Pengcheng! > > You are solving important problem, thank you! > > > 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > > > To resolve this performance problem, we think about a solution which cache > > SubtransSLRU to local cache. > > First we can query parent transaction id from SubtransSLRU, and copy the > > SLRU page to local cache page. > > After that if we need query parent transaction id again, we can query it > > from local cache directly. > > A copy of SLRU in each backend's cache can consume a lot of memory. Yes, copying the whole SLRU into local cache seems overkill. > Why create a copy if we can optimise shared representation of SLRU? transam.c uses a single item cache to prevent thrashing from repeated lookups, which reduces problems with shared access to SLRUs. multitrans.c also has similar. I notice that subtrans. doesn't have this, but could easily do so. Patch attached, which seems separate to other attempts at tuning. On review, I think it is also possible that we update subtrans ONLY if someone uses >PGPROC_MAX_CACHED_SUBXIDS. This would make subtrans much smaller and avoid one-entry-per-page which is a major source of cacheing. This would means some light changes in GetSnapshotData(). Let me know if that seems interesting also? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
> 30 нояб. 2021 г., в 17:19, Simon Riggs <simon.riggs@enterprisedb.com> написал(а): > > On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote: >> >> Hi Pengcheng! >> >> You are solving important problem, thank you! >> >>> 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): >>> >>> To resolve this performance problem, we think about a solution which cache >>> SubtransSLRU to local cache. >>> First we can query parent transaction id from SubtransSLRU, and copy the >>> SLRU page to local cache page. >>> After that if we need query parent transaction id again, we can query it >>> from local cache directly. >> >> A copy of SLRU in each backend's cache can consume a lot of memory. > > Yes, copying the whole SLRU into local cache seems overkill. > >> Why create a copy if we can optimise shared representation of SLRU? > > transam.c uses a single item cache to prevent thrashing from repeated > lookups, which reduces problems with shared access to SLRUs. > multitrans.c also has similar. > > I notice that subtrans. doesn't have this, but could easily do so. > Patch attached, which seems separate to other attempts at tuning. I think this definitely makes sense to do. > On review, I think it is also possible that we update subtrans ONLY if > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > This would make subtrans much smaller and avoid one-entry-per-page > which is a major source of cacheing. > This would means some light changes in GetSnapshotData(). > Let me know if that seems interesting also? I'm afraid of unexpected performance degradation. When the system runs fine, you provision a VM of some vCPU\RAM, and thensome backend uses a little more than 64 subtransactions and all the system is stuck. Or will it affect only backend usingmore than 64 subtransactions? Best regards, Andrey Borodin.
On Tue, Nov 30, 2021 at 5:49 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > transam.c uses a single item cache to prevent thrashing from repeated > lookups, which reduces problems with shared access to SLRUs. > multitrans.c also has similar. > > I notice that subtrans. doesn't have this, but could easily do so. > Patch attached, which seems separate to other attempts at tuning. Yeah, this definitely makes sense. > On review, I think it is also possible that we update subtrans ONLY if > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > This would make subtrans much smaller and avoid one-entry-per-page > which is a major source of cacheing. > This would means some light changes in GetSnapshotData(). > Let me know if that seems interesting also? Do you mean to say avoid setting the sub-transactions parent if the number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS? But the TransactionIdDidCommit(), might need to fetch the parent if the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how would we handle that? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, 3 Dec 2021 at 01:27, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On review, I think it is also possible that we update subtrans ONLY if > > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > > This would make subtrans much smaller and avoid one-entry-per-page > > which is a major source of cacheing. > > This would means some light changes in GetSnapshotData(). > > Let me know if that seems interesting also? > > Do you mean to say avoid setting the sub-transactions parent if the > number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS? > But the TransactionIdDidCommit(), might need to fetch the parent if > the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how > would we handle that? TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during final commit. In that case, the top-level xid is still in procarray when nsubxids < PGPROC_MAX_CACHED_SUBXIDS so we need not consult pg_subtrans in that case, see step 4 of TransactionIdIsInProgress() -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, Dec 3, 2021 at 5:00 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Fri, 3 Dec 2021 at 01:27, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On review, I think it is also possible that we update subtrans ONLY if > > > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > > > This would make subtrans much smaller and avoid one-entry-per-page > > > which is a major source of cacheing. > > > This would means some light changes in GetSnapshotData(). > > > Let me know if that seems interesting also? > > > > Do you mean to say avoid setting the sub-transactions parent if the > > number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS? > > But the TransactionIdDidCommit(), might need to fetch the parent if > > the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how > > would we handle that? > > TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during > final commit. > In that case, the top-level xid is still in procarray when nsubxids < > PGPROC_MAX_CACHED_SUBXIDS > so we need not consult pg_subtrans in that case, see step 4 of. > TransactionIdIsInProgress() Okay I see, that there is a rule that before calling TransactionIdDidCommit(), we must consult TransactionIdIsInProgress() for non MVCC snapshot or XidInMVCCSnapshot(). Okay so now I don't have this concern, thanks for clarifying. I will think more about this approach from other aspects. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 1 Dec 2021 at 06:41, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > On review, I think it is also possible that we update subtrans ONLY if > > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > > This would make subtrans much smaller and avoid one-entry-per-page > > which is a major source of cacheing. > > This would means some light changes in GetSnapshotData(). > > Let me know if that seems interesting also? > > I'm afraid of unexpected performance degradation. When the system runs fine, you provision a VM of some vCPU\RAM, and thensome backend uses a little more than 64 subtransactions and all the system is stuck. Or will it affect only backend usingmore than 64 subtransactions? That is the objective: to isolate the effect to only those that overflow. It seems possible. -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, 3 Dec 2021 at 06:27, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 30, 2021 at 5:49 PM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > > transam.c uses a single item cache to prevent thrashing from repeated > > lookups, which reduces problems with shared access to SLRUs. > > multitrans.c also has similar. > > > > I notice that subtrans. doesn't have this, but could easily do so. > > Patch attached, which seems separate to other attempts at tuning. > > Yeah, this definitely makes sense. > > > On review, I think it is also possible that we update subtrans ONLY if > > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > > This would make subtrans much smaller and avoid one-entry-per-page > > which is a major source of cacheing. > > This would means some light changes in GetSnapshotData(). > > Let me know if that seems interesting also? > > Do you mean to say avoid setting the sub-transactions parent if the > number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS? Yes. This patch shows where I'm going, with changes in GetSnapshotData() and XidInMVCCSnapshot() and XactLockTableWait(). Passes make check, but needs much more, so this is review-only at this stage to give a flavour of what is intended. (No where near replacing the subtrans module as I envisage as the final outcome, meaning we don't need ExtendSUBTRANS()). -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Hi, On Wed, Dec 08, 2021 at 04:39:11PM +0000, Simon Riggs wrote: > > This patch shows where I'm going, with changes in GetSnapshotData() > and XidInMVCCSnapshot() and XactLockTableWait(). > Passes make check, but needs much more, so this is review-only at this > stage to give a flavour of what is intended. Thanks a lot to everyone involved in this! I can't find any entry in the commitfest for the work being done here. Did I miss something? If not could you create an entry in the next commitfest to make sure that it doesn't get forgotten?
On Tue, 30 Nov 2021 at 12:19, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Mon, 30 Aug 2021 at 11:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > Hi Pengcheng! > > > > You are solving important problem, thank you! > > > > > 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а): > > > > > > To resolve this performance problem, we think about a solution which cache > > > SubtransSLRU to local cache. > > > First we can query parent transaction id from SubtransSLRU, and copy the > > > SLRU page to local cache page. > > > After that if we need query parent transaction id again, we can query it > > > from local cache directly. > > > > A copy of SLRU in each backend's cache can consume a lot of memory. > > Yes, copying the whole SLRU into local cache seems overkill. > > > Why create a copy if we can optimise shared representation of SLRU? > > transam.c uses a single item cache to prevent thrashing from repeated > lookups, which reduces problems with shared access to SLRUs. > multitrans.c also has similar. > > I notice that subtrans. doesn't have this, but could easily do so. > Patch attached, which seems separate to other attempts at tuning. Re-attached, so that the CFapp isn't confused between the multiple patches on this thread. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
> 17 янв. 2022 г., в 18:44, Simon Riggs <simon.riggs@enterprisedb.com> написал(а): > > Re-attached, so that the CFapp isn't confused between the multiple > patches on this thread. FWIW I've looked into the patch and it looks good to me. Comments describing when the cache is useful seem valid. Thanks! Best regards, Andrey Borodin.
Hi, On Mon, Jan 17, 2022 at 01:44:02PM +0000, Simon Riggs wrote: > > Re-attached, so that the CFapp isn't confused between the multiple > patches on this thread. Thanks a lot for working on this! The patch is simple and overall looks good to me. A few comments though: +/* + * Single-item cache for results of SubTransGetTopmostTransaction. It's worth having + * such a cache because we frequently find ourselves repeatedly checking the + * same XID, for example when scanning a table just after a bulk insert, + * update, or delete. + */ +static TransactionId cachedFetchXid = InvalidTransactionId; +static TransactionId cachedFetchTopmostXid = InvalidTransactionId; The comment is above the 80 chars after s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this comment is valid for subtrans.c. Also, maybe naming the first variable cachedFetchSubXid would make it a bit clearer? It would be nice to see some benchmarks, for both when this change is enough to avoid a contention (when there's a single long-running overflowed backend) and when it's not enough. That will also be useful if/when working on the "rethink_subtrans" patch.
On Mon, 7 Mar 2022 at 09:49, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Mon, Jan 17, 2022 at 01:44:02PM +0000, Simon Riggs wrote: > > > > Re-attached, so that the CFapp isn't confused between the multiple > > patches on this thread. > > Thanks a lot for working on this! > > The patch is simple and overall looks good to me. A few comments though: > > > +/* > + * Single-item cache for results of SubTransGetTopmostTransaction. It's worth having > + * such a cache because we frequently find ourselves repeatedly checking the > + * same XID, for example when scanning a table just after a bulk insert, > + * update, or delete. > + */ > +static TransactionId cachedFetchXid = InvalidTransactionId; > +static TransactionId cachedFetchTopmostXid = InvalidTransactionId; > > The comment is above the 80 chars after > s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this > comment is valid for subtrans.c. What aspect makes it invalid? The comment seems exactly applicable to me; Andrey thinks so also. > Also, maybe naming the first variable cachedFetchSubXid would make it a bit > clearer? Sure, that can be done. > It would be nice to see some benchmarks, for both when this change is > enough to avoid a contention (when there's a single long-running overflowed > backend) and when it's not enough. That will also be useful if/when working on > the "rethink_subtrans" patch. The patch doesn't do anything about the case of when there's a single long-running overflowed backend, nor does it claim that. The patch will speed up calls to SubTransGetTopmostTransaction(), which occur in src/backend/access/heap/heapam.c src/backend/utils/time/snapmgr.c src/backend/storage/lmgr/lmgr.c src/backend/storage/ipc/procarray.c The patch was posted because TransactionLogFetch() has a cache, yet SubTransGetTopmostTransaction() does not, yet the argument should be identical in both cases. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote: > > +/* > > + * Single-item cache for results of SubTransGetTopmostTransaction. It's worth having > > + * such a cache because we frequently find ourselves repeatedly checking the > > + * same XID, for example when scanning a table just after a bulk insert, > > + * update, or delete. > > + */ > > +static TransactionId cachedFetchXid = InvalidTransactionId; > > +static TransactionId cachedFetchTopmostXid = InvalidTransactionId; > > > > The comment is above the 80 chars after > > s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this > > comment is valid for subtrans.c. > > What aspect makes it invalid? The comment seems exactly applicable to > me; Andrey thinks so also. Sorry, I somehow missed the "for example", and was thinking that SubTransGetTopmostTransaction was used in many other places compared to TransactionIdDidCommit and friends. > > It would be nice to see some benchmarks, for both when this change is > > enough to avoid a contention (when there's a single long-running overflowed > > backend) and when it's not enough. That will also be useful if/when working on > > the "rethink_subtrans" patch. > > The patch doesn't do anything about the case of when there's a single > long-running overflowed backend, nor does it claim that. I was thinking that having a cache for SubTransGetTopmostTransaction could help at least to some extent for that problem, sorry if that's not the case. I'm still curious on how much this simple optimization can help in some scenarios, even if they're somewhat artificial. > The patch was posted because TransactionLogFetch() has a cache, yet > SubTransGetTopmostTransaction() does not, yet the argument should be > identical in both cases. I totally agree with that.
On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote: > On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote: >> The patch was posted because TransactionLogFetch() has a cache, yet >> SubTransGetTopmostTransaction() does not, yet the argument should be >> identical in both cases. > > I totally agree with that. Agreed as well. That's worth doing in isolation and that will save some lookups of pg_subtrans anyway while being simple. As mentioned upthread, this needed an indentation, and the renaming of cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all those things. -- Michael
Attachment
On Thu, 7 Apr 2022 at 00:36, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote: > > On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote: > >> The patch was posted because TransactionLogFetch() has a cache, yet > >> SubTransGetTopmostTransaction() does not, yet the argument should be > >> identical in both cases. > > > > I totally agree with that. > > Agreed as well. That's worth doing in isolation and that will save > some lookups of pg_subtrans anyway while being simple. As mentioned > upthread, this needed an indentation, and the renaming of > cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all > those things. Thanks Michael, thanks all. -- Simon Riggs http://www.EnterpriseDB.com/
Hi, On 2022-04-07 14:36:35 +0900, Michael Paquier wrote: > On Mon, Mar 07, 2022 at 10:17:41PM +0800, Julien Rouhaud wrote: > > On Mon, Mar 07, 2022 at 01:27:40PM +0000, Simon Riggs wrote: > >> The patch was posted because TransactionLogFetch() has a cache, yet > >> SubTransGetTopmostTransaction() does not, yet the argument should be > >> identical in both cases. > > > > I totally agree with that. > > Agreed as well. That's worth doing in isolation and that will save > some lookups of pg_subtrans anyway while being simple. As mentioned > upthread, this needed an indentation, and the renaming of > cachedFetchXid to cachedFetchSubXid looks adapted. So.. Applied all > those things. As is, this strikes me as dangerous. At the very least this ought to be structured so it can have assertions verifying that the cache contents are correct. It's far from obvious that it is correct to me, fwiw. Potential issues: 1) The result of SubTransGetTopmostTransaction() can change between subsequent calls. If TransactionXmin advances, the TransactionXmin cutoff can change the result. It might be unreachable or harmless, but it's not obvious that it is, and there's zero comments explaining why it is obvious. 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to be called regularly, so even if a backend isn't idle, the cache could just get more and more outdated until hitting wraparound To me it also seems odd that we cache in SubTransGetTopmostTransaction(), but not in SubTransGetParent(). I think it's at least as common to end up with subtrans access via TransactionIdDidCommit(), which calls SubTransGetParent() rather than SubTransGetTopmostTransaction()? Why is SubTransGetTopmostTransaction() the correct layer for caching? I tried to find a benchmark result for this patch upthread, without success. Has there been validation this helps with anything? Greetings, Andres Freund
On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote: > As is, this strikes me as dangerous. At the very least this ought to be > structured so it can have assertions verifying that the cache contents are > correct. Well, under USE_ASSERT_CHECKING we could force a recalculation of the loop itself before re-checking and sending the cached result, as one thing. > It's far from obvious that it is correct to me, fwiw. Potential issues: > > 1) The result of SubTransGetTopmostTransaction() can change between subsequent > calls. If TransactionXmin advances, the TransactionXmin cutoff can change > the result. It might be unreachable or harmless, but it's not obvious that > it is, and there's zero comments explaining why it is obvious. I am not sure to follow on this one. A change in the TransactionXmin cutoff does not change the result retrieved for parentXid from the SLRU layer, because the xid cached refers to a parent still running. > 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to > be called regularly, so even if a backend isn't idle, the cache could just > get more and more outdated until hitting wraparound Hence, you mean that the non-regularity of the call makes it more exposed to an inconsistent result after a wraparound? > To me it also seems odd that we cache in SubTransGetTopmostTransaction(), but > not in SubTransGetParent(). I think it's at least as common to end up with > subtrans access via TransactionIdDidCommit(), which calls SubTransGetParent() > rather than SubTransGetTopmostTransaction()? Why is > SubTransGetTopmostTransaction() the correct layer for caching? Hmm. I recall thinking about this exact point but left it out of the caching to maintain a symmetry with the setter routine that does the same and reverse operation on those SLRUs. Anyway, one reason to not use SubTransGetParent() is that it may return an invalid XID which we'd better not cache depending on its use (say, a serialized transaction), and SubTransGetTopmostTransaction() looping around to we make sure to never hit this case looks like the correct path to do do. Well, we could also store nothing if an invalid parent is found, but then the previous argument about the symmetry of the routines would not apply. This would be beneficial about cases like the one at the top of the thread about SLRU caches when subxids are overflowing when referring to the same XID. The ODBC driver likes a lot savepoints, for example. > I tried to find a benchmark result for this patch upthread, without > success. Has there been validation this helps with anything? I have been studying that again, and you are right that I should have asked for much more here. A benchmark like what's presented upthread may show some benefits with the case of the same savepoint used across multiple queries, only if with a caching of SubTransGetParent(), with enough subxids exhausted to overflow the snapshots. It would be better to revisit that stuff, and the benefit is limited with only SubTransGetTopmostTransaction(). Point 2) is something I did not consider, and that's a good one. For now, it looks better to revert this part rather than tweak it post beta1. -- Michael
Attachment
On Thu, May 26, 2022 at 12:53 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote: > > > 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to > > be called regularly, so even if a backend isn't idle, the cache could just > > get more and more outdated until hitting wraparound > > Hence, you mean that the non-regularity of the call makes it more > exposed to an inconsistent result after a wraparound? > Won't in theory the similar cache in transam.c is also prone to similar behavior? Anyway, how about if we clear this cache for subtrans whenever TransactionXmin is advanced and cachedFetchSubXid precedes it? The comments atop SubTransGetTopmostTransaction seem to state that we don't care about the exact topmost parent when the intermediate one precedes TransactionXmin. I think it should preserve the optimization because anyway for such cases there is a fast path in SubTransGetTopmostTransaction. -- With Regards, Amit Kapila.
Hi, On 2022-05-27 15:44:39 +0530, Amit Kapila wrote: > On Thu, May 26, 2022 at 12:53 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote: > > > > > 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() to > > > be called regularly, so even if a backend isn't idle, the cache could just > > > get more and more outdated until hitting wraparound > > > > Hence, you mean that the non-regularity of the call makes it more > > exposed to an inconsistent result after a wraparound? > > > > Won't in theory the similar cache in transam.c is also prone to > similar behavior? It's not quite the same risk, because there we are likely to actually hit the cache regularly. Whereas quite normal workloads might not hit this cache for days on end. > Anyway, how about if we clear this cache for subtrans whenever > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > comments atop SubTransGetTopmostTransaction seem to state that we > don't care about the exact topmost parent when the intermediate one > precedes TransactionXmin. I think it should preserve the optimization > because anyway for such cases there is a fast path in > SubTransGetTopmostTransaction. There's not even a proof this does speed up anything useful! There's not a single benchmark for the patch. Greetings, Andres Freund
On Fri, May 27, 2022 at 8:55 AM Andres Freund <andres@anarazel.de> wrote: > > Anyway, how about if we clear this cache for subtrans whenever > > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > > comments atop SubTransGetTopmostTransaction seem to state that we > > don't care about the exact topmost parent when the intermediate one > > precedes TransactionXmin. I think it should preserve the optimization > > because anyway for such cases there is a fast path in > > SubTransGetTopmostTransaction. > > There's not even a proof this does speed up anything useful! There's not a > single benchmark for the patch. I find it hard to believe that there wasn't even a cursory effort at performance validation before this was committed, but that's what it looks like. -- Peter Geoghegan
On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote: > On Fri, May 27, 2022 at 8:55 AM Andres Freund <andres@anarazel.de> wrote: > > > Anyway, how about if we clear this cache for subtrans whenever > > > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > > > comments atop SubTransGetTopmostTransaction seem to state that we > > > don't care about the exact topmost parent when the intermediate one > > > precedes TransactionXmin. I think it should preserve the optimization > > > because anyway for such cases there is a fast path in > > > SubTransGetTopmostTransaction. > > > > There's not even a proof this does speed up anything useful! There's not a > > single benchmark for the patch. > > I find it hard to believe that there wasn't even a cursory effort at > performance validation before this was committed, but that's what it > looks like. Yea. Imo this pretty clearly should be reverted. It has correctness issues, testing issues and we don't know whether it does anything useful.
On Fri, May 27, 2022 at 11:59 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote: > > I find it hard to believe that there wasn't even a cursory effort at > > performance validation before this was committed, but that's what it > > looks like. > > Yea. Imo this pretty clearly should be reverted. It has correctness issues, > testing issues and we don't know whether it does anything useful. It should definitely be reverted. -- Peter Geoghegan
On Fri, May 27, 2022 at 08:55:02AM -0700, Andres Freund wrote: > On 2022-05-27 15:44:39 +0530, Amit Kapila wrote: >> Won't in theory the similar cache in transam.c is also prone to >> similar behavior? TransactionIdDidCommit() and TransactionIdDidAbort() are used in much more code paths for visibility purposes, contrary to the subtrans.c ones. > It's not quite the same risk, because there we are likely to actually hit the > cache regularly. Whereas quite normal workloads might not hit this cache for > days on end. Yeah. In short, this mostly depends on the use of savepoints and the number of XIDs issued until PGPROC_MAX_CACHED_SUBXIDS is reached, and a single cache entry in this code path would reduce the pressure on the SLRU lookups depending on the number of queries issued, for example. One thing I know of that likes to abuse of savepoints and could cause overflows to make this easier to hit is the ODBC driver coupled with short queries in long transactions, where its internals enforce the use of a savepoint each time a query is issued by an application (pretty much what the benchmark at the top of the thread does). In this case, even the single cache approach would not help much because I recall that we finish with one savepoint per query to be able to rollback to any previous state within a given transaction (as the ODBC APIs allow). Doing a caching within SubTransGetParent() would be more interesting, for sure, though the invalidation to clean the cache and to make that robust enough may prove tricky. It took me some time to come back to this thread. The change has now been reverted. -- Michael