RE: suboverflowed subtransactions concurrency performance optimize - Mailing list pgsql-hackers

From Pengchengliu
Subject RE: suboverflowed subtransactions concurrency performance optimize
Date
Msg-id 000d01d79e33$7270ba30$57522e90$@tju.edu.cn
Whole thread Raw
In response to Re: suboverflowed subtransactions concurrency performance optimize  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: suboverflowed subtransactions concurrency performance optimize  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hi Andrey,
  Thanks a lot for your replay and reference information.

  The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
  If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
  So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
  And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in
previousemail. 

  I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.
  I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the
SubtransSLRULockfirst, then call SubTransGetParent in loop. 
  Prevent acquire/release  SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
  After I apply this patch which I  optimize SubTransGetTopmostTransaction,  with my test case, I still get stuck
result.

  [1] solution. Actually first, we try to use Buffer manager to replace SLRU for subtrans too. And we have implemented
it.
  With the test case which I mentioned in previous mail, It was still stuck. In default there is 2048 subtrans in one
page.
  When some processes get the top transaction in one page, they should pin/unpin and lock/unlock the same page
repeatedly.
  I found than it was stuck at pin/unpin page for some backends.

  Compare test results, pgbench with subtrans_128.sql
  Concurrency   PG master    PG master with path[0]       Local cache optimize
  300                 stuck                  stuck                                     no stuck
  500                  stuck                  stuck                                      no stuck
  1000                stuck                  stuck                                     no stuck

  Maybe we can test different approach with my test case. For massive concurrency, if it will not be stuck, we can get
agood solution. 

[0] https://commitfest.postgresql.org/34/2627/
[1] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com

Thanks
Pengcheng

-----Original Message-----
From: Andrey Borodin <x4mmm@yandex-team.ru>
Sent: 2021年8月30日 18:25
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: suboverflowed subtransactions concurrency performance optimize

Hi Pengcheng!

You are solving important problem, thank you!

> 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):
>
> To resolve this performance problem, we think about a solution which
> cache SubtransSLRU to local cache.
> First we can query parent transaction id from SubtransSLRU, and copy
> the SLRU page to local cache page.
> After that if we need query parent transaction id again, we can query
> it from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared
representationof SLRU? 

JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0].
Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not
passthe moderation. [1] 

Also it's important to note that there was a community request to move SLRUs to shared_buffers [2].

Thanks!

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/34/2627/
[1] https://www.postgresql.org/message-id/flat/BE73A0BB-5929-40F4-BAF8-55323DE39561%40yandex-team.ru
[2] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Fix around conn_duration in pgbench
Next
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early