Thread: Retry in pgbench
Currently standard pgbench scenario produces transaction serialize errors "could not serialize access due to concurrent update" if PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the session aborts. In order to achieve meaningful results even in these transaction isolation levels, I would like to propose an automatic retry feature if "could not serialize access due to concurrent update" error occurs. Probably just adding a switch to retry is not enough, maybe retry method (random interval etc.) and max retry number are needed to be added. I would like to hear your thoughts, Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Currently standard pgbench scenario produces transaction serialize > errors "could not serialize access due to concurrent update" if > PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the > session aborts. In order to achieve meaningful results even in these > transaction isolation levels, I would like to propose an automatic > retry feature if "could not serialize access due to concurrent update" > error occurs. > > Probably just adding a switch to retry is not enough, maybe retry > method (random interval etc.) and max retry number are needed to be > added. > > I would like to hear your thoughts, See also: https://www.postgresql.org/message-id/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru
> On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: >> Currently standard pgbench scenario produces transaction serialize >> errors "could not serialize access due to concurrent update" if >> PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the >> session aborts. In order to achieve meaningful results even in these >> transaction isolation levels, I would like to propose an automatic >> retry feature if "could not serialize access due to concurrent update" >> error occurs. >> >> Probably just adding a switch to retry is not enough, maybe retry >> method (random interval etc.) and max retry number are needed to be >> added. >> >> I would like to hear your thoughts, > > See also: > > https://www.postgresql.org/message-id/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru Thanks for the pointer. It seems we need to resume the discussion. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi, On Tue, 13 Apr 2021 16:12:59 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > [...] > [...] > [...] > > Thanks for the pointer. It seems we need to resume the discussion. By the way, I've been playing with the idea of failing gracefully and retry indefinitely (or until given -T) on SQL error AND connection issue. It would be useful to test replicating clusters with a (switch|fail)over procedure. Regards,
> By the way, I've been playing with the idea of failing gracefully and retry > indefinitely (or until given -T) on SQL error AND connection issue. > > It would be useful to test replicating clusters with a (switch|fail)over > procedure. Interesting idea but in general a failover takes sometime (like a few minutes), and it will strongly affect TPS. I think in the end it just compares the failover time. Or are you suggesting to ignore the time spent in failover? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> It would be useful to test replicating clusters with a (switch|fail)over >> procedure. > > Interesting idea but in general a failover takes sometime (like a few > minutes), and it will strongly affect TPS. I think in the end it just > compares the failover time. > > Or are you suggesting to ignore the time spent in failover? Or simply to be able to measure it simply from a client perspective? How much delay is introduced, how long is endured to go back to the previous tps level… My recollection of Marina patch is that it was non trivial, adding such a new and interesting feature suggests a set of patches, not just one patch. -- Fabien.
On Fri, 16 Apr 2021 10:28:48 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > By the way, I've been playing with the idea of failing gracefully and retry > > indefinitely (or until given -T) on SQL error AND connection issue. > > > > It would be useful to test replicating clusters with a (switch|fail)over > > procedure. > > Interesting idea but in general a failover takes sometime (like a few > minutes), and it will strongly affect TPS. I think in the end it just > compares the failover time. This usecase is not about benchmarking. It's about generating constant trafic to be able to practice/train some [auto]switchover procedures while being close to production activity. In this contexte, a max-saturated TPS of one node is not relevant. But being able to add some stats about downtime might be a good addition. Regards,
> This usecase is not about benchmarking. It's about generating constant trafic > to be able to practice/train some [auto]switchover procedures while being close > to production activity. > > In this contexte, a max-saturated TPS of one node is not relevant. But being > able to add some stats about downtime might be a good addition. Oh I see. That makes sense. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp