Thread: [PATCH] fix a performance issue with multiple logical-decoding walsenders
[PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Pierre Ducroquet
Date:
Hello Our current setup uses logical replication to build a BI replication server along our primary clusters (running PG 10.10 so far). This implies having one logical replication slot per database. After some analysis, we identified two hot spots behind this issue. Fixing them gave us a 10 fold performance improvement in decoding speed. We noticed our load had gotten quite bigger on our primary since the introduction of this replication, seeing spikes in system time when a lot of wal were being written (for instance when creating GIN indexes). The first hot spot is PostmasterIsAlive. The implementation reads on a pipe to know if the postmaster is still alive, but this is very expensive kernel-wise. When switching the implementation to a much more primitive (and probably wrong): bool PostmasterIsAliveInternal() { return getppid() == PostmasterPid; } we stopped seeing spikes in system time. But after doing that, the CPU time used by our walsenders increased. We reached a second hot spot, this time in XLogSendLogical, where each walsender was using 100% of user CPU for minutes. After checking with perf, it appears all the decoders are fighting on GetFlushRecPtr. On PostgreSQL 12, the call to PostmasterIsAlive is no longer present in WalSndLoop (thanks to commit cfdf4dc4f), so only the second hot spot is still present, with the same effects. Attached to this email are two patches. The first one, specific to PG 10 for our use-case, simplifies the PostmasterIsAlive function, as described above. I don't know if this implementation is valid, but it was needed to uncover the performance issue in XLogSendLogical. Would it be possible to remove the systematic call to PostmasterIsAlive in WalSndLoop? We are not certain of the behaviour. The second one was tested on PG 10 and PG 12 (with 48 lines offset). It has on PG12 the same effect it has on a PG10+isAlive patch. Instead of calling each time GetFlushRecPtr, we call it only if we notice we have reached the value of the previous call. This way, when the senders are busy decoding, we are no longer fighting for a spinlock to read the FlushRecPtr. Here are some benchmark results. On PG 10, to decode our replication stream, we went from 3m 43s to over 5 minutes after removing the first hot spot, and then down to 22 seconds. On PG 12, we had to change the benchmark (due to GIN indexes creation being more optimized) so we can not compare directly with our previous bench. We went from 15m 11s down to 59 seconds. If needed, we can provide scripts to reproduce this situation. It is quite simple: add ~20 walsenders doing logical replication in database A, and then generate a lot of data in database B. The walsenders will be woken up by the activity on database B, but not sending it thus keeping hitting the same locks. Regards
Attachment
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Julien Rouhaud
Date:
Hello Pierre, On Thu, Dec 26, 2019 at 5:43 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > The second one was tested on PG 10 and PG 12 (with 48 lines offset). It has on > PG12 the same effect it has on a PG10+isAlive patch. Instead of calling each > time GetFlushRecPtr, we call it only if we notice we have reached the value of > the previous call. This way, when the senders are busy decoding, we are no > longer fighting for a spinlock to read the FlushRecPtr. The patch is quite straightforward and looks good to me. - XLogRecPtr flushPtr; + static XLogRecPtr flushPtr = 0; You should use InvalidXLogRecPtr instead though, and maybe adding some comments to explain why the static variable is a life changer here. > Here are some benchmark results. > On PG 10, to decode our replication stream, we went from 3m 43s to over 5 > minutes after removing the first hot spot, and then down to 22 seconds. > On PG 12, we had to change the benchmark (due to GIN indexes creation being > more optimized) so we can not compare directly with our previous bench. We > went from 15m 11s down to 59 seconds. > If needed, we can provide scripts to reproduce this situation. It is quite > simple: add ~20 walsenders doing logical replication in database A, and then > generate a lot of data in database B. The walsenders will be woken up by the > activity on database B, but not sending it thus keeping hitting the same > locks. Quite impressive speedup!
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Pierre Ducroquet
Date:
On Thursday, December 26, 2019 8:18:46 PM CET Julien Rouhaud wrote: > Hello Pierre, > > On Thu, Dec 26, 2019 at 5:43 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > The second one was tested on PG 10 and PG 12 (with 48 lines offset). It > > has on PG12 the same effect it has on a PG10+isAlive patch. Instead of > > calling each time GetFlushRecPtr, we call it only if we notice we have > > reached the value of the previous call. This way, when the senders are > > busy decoding, we are no longer fighting for a spinlock to read the > > FlushRecPtr. > > The patch is quite straightforward and looks good to me. > > - XLogRecPtr flushPtr; > + static XLogRecPtr flushPtr = 0; > > You should use InvalidXLogRecPtr instead though, and maybe adding some > comments to explain why the static variable is a life changer here. > > > Here are some benchmark results. > > On PG 10, to decode our replication stream, we went from 3m 43s to over 5 > > minutes after removing the first hot spot, and then down to 22 seconds. > > On PG 12, we had to change the benchmark (due to GIN indexes creation > > being > > more optimized) so we can not compare directly with our previous bench. We > > went from 15m 11s down to 59 seconds. > > If needed, we can provide scripts to reproduce this situation. It is quite > > simple: add ~20 walsenders doing logical replication in database A, and > > then generate a lot of data in database B. The walsenders will be woken > > up by the activity on database B, but not sending it thus keeping hitting > > the same locks. > > Quite impressive speedup! Hi Thank you for your comments. Attached to this email is a patch with better comments regarding the XLogSendLogical change. We've spent quite some time yesterday benching it again, this time with changes that must be fully processed by the decoder. The speed-up is obviously much smaller, we are only ~5% faster than without the patch. Regards
Attachment
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Julien Rouhaud
Date:
On Sat, Dec 28, 2019 at 1:56 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > Thank you for your comments. > Attached to this email is a patch with better comments regarding the > XLogSendLogical change. Arguably the first test to compare to InvalidXLogRecPtr is unneeded, as any value of EndRecPtr is greater or equal than that value. It will only save at best 1 GetFlushRecPtr() per walsender process lifetime, so I'm not sure it's worth arguing about it. Other than that I still think that it's a straightforward optimization that brings nice speedup, and I don't see any problem with this patch. I think that given the time of the year you should create a commitfest entry for this patch to make sure it won't be forgotten (and obviously I'll mark it as RFC, unless someone objects by then). > We've spent quite some time yesterday benching it again, this time with > changes that must be fully processed by the decoder. The speed-up is obviously > much smaller, we are only ~5% faster than without the patch. I'm assuming that it's benchmarking done with multiple logical slots? Anyway, a 5% speedup in the case that this patch is not aimed to optimize is quite nice!
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Pierre Ducroquet
Date:
On Sunday, December 29, 2019 1:32:31 PM CET Julien Rouhaud wrote: > On Sat, Dec 28, 2019 at 1:56 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > Thank you for your comments. > > Attached to this email is a patch with better comments regarding the > > XLogSendLogical change. > > Arguably the first test to compare to InvalidXLogRecPtr is unneeded, > as any value of EndRecPtr is greater or equal than that value. It > will only save at best 1 GetFlushRecPtr() per walsender process > lifetime, so I'm not sure it's worth arguing about it. Other than > that I still think that it's a straightforward optimization that > brings nice speedup, and I don't see any problem with this patch. I > think that given the time of the year you should create a commitfest > entry for this patch to make sure it won't be forgotten (and obviously > I'll mark it as RFC, unless someone objects by then). > > > We've spent quite some time yesterday benching it again, this time with > > changes that must be fully processed by the decoder. The speed-up is > > obviously much smaller, we are only ~5% faster than without the patch. > > I'm assuming that it's benchmarking done with multiple logical slots? > Anyway, a 5% speedup in the case that this patch is not aimed to > optimize is quite nice! I've created a commitfest entry for this patch. https://commitfest.postgresql.org/26/2403/ I would like to know if it would be acceptable to backport this to PostgreSQL 12. I have to write a clean benchmark for that (our previous benchs are either PG10 or PG12 specific), but the change from Thomas Munro that removed the calls to PostmasterIsAlive is very likely to have the same side-effect we observed in PG10 when patching IsAlive, aka. moving the pressure from the pipe reads to the PostgreSQL locks between processes, and this made the whole process slower: the benchmark showed a serious regression, going from 3m45s to 5m15s to decode the test transactions. Regards Pierre
Pierre Ducroquet <p.psql@pinaraf.info> writes: > Attached to this email is a patch with better comments regarding the > XLogSendLogical change. Hi, This patch entirely fails to apply for me (and for the cfbot too). It looks like (a) it's missing a final newline and (b) all the tabs have been mangled into spaces, and not correctly mangled either. I could probably reconstruct a workable patch if I had to, but it seems likely that it'd be easier for you to resend it with a little more care about attaching an unmodified attachment. As for the question of back-patching, it seems to me that it'd likely be reasonable to put this into v12, but probably not further back. There will be no interest in back-patching commit cfdf4dc4f, and it seems like the argument for this patch is relatively weak without that. regards, tom lane
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Pierre Ducroquet
Date:
On Monday, January 6, 2020 6:57:33 PM CET Tom Lane wrote: > Pierre Ducroquet <p.psql@pinaraf.info> writes: > > Attached to this email is a patch with better comments regarding the > > XLogSendLogical change. > > Hi, > This patch entirely fails to apply for me (and for the cfbot too). > It looks like (a) it's missing a final newline and (b) all the tabs > have been mangled into spaces, and not correctly mangled either. > I could probably reconstruct a workable patch if I had to, but > it seems likely that it'd be easier for you to resend it with a > little more care about attaching an unmodified attachment. > > As for the question of back-patching, it seems to me that it'd > likely be reasonable to put this into v12, but probably not > further back. There will be no interest in back-patching > commit cfdf4dc4f, and it seems like the argument for this > patch is relatively weak without that. > > regards, tom lane Hi My deepest apologies for the patch being broken, I messed up when transferring it between my computers after altering the comments. The verbatim one attached to this email applies with no issue on current HEAD. The patch regarding PostmasterIsAlive is completely pointless since v12 where the function was rewritten, and was included only to help reproduce the issue on older versions. Back-patching the walsender patch further than v12 would imply back-patching all the machinery introduced for PostmasterIsAlive (9f09529952) or another intrusive change there, a too big risk indeed. Regards Pierre
Attachment
Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
From
Julien Rouhaud
Date:
On Mon, Jan 6, 2020 at 7:57 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > On Monday, January 6, 2020 6:57:33 PM CET Tom Lane wrote: > > Pierre Ducroquet <p.psql@pinaraf.info> writes: > > > Attached to this email is a patch with better comments regarding the > > > XLogSendLogical change. > > > > Hi, > > This patch entirely fails to apply for me (and for the cfbot too). > > It looks like (a) it's missing a final newline and (b) all the tabs > > have been mangled into spaces, and not correctly mangled either. > > I could probably reconstruct a workable patch if I had to, but > > it seems likely that it'd be easier for you to resend it with a > > little more care about attaching an unmodified attachment. > > > > As for the question of back-patching, it seems to me that it'd > > likely be reasonable to put this into v12, but probably not > > further back. There will be no interest in back-patching > > commit cfdf4dc4f, and it seems like the argument for this > > patch is relatively weak without that. > > > > regards, tom lane > > Hi > > My deepest apologies for the patch being broken, I messed up when transferring > it between my computers after altering the comments. The verbatim one attached > to this email applies with no issue on current HEAD. > The patch regarding PostmasterIsAlive is completely pointless since v12 where > the function was rewritten, and was included only to help reproduce the issue > on older versions. Back-patching the walsender patch further than v12 would > imply back-patching all the machinery introduced for PostmasterIsAlive > (9f09529952) or another intrusive change there, a too big risk indeed. +1, backpatch to v12 looks sensible.
Hi,
I spent a little bit of time trying to explain the problem we are facing clearly, and provide a reproducible benchmark. So here it is. What triggered our investigation is that we have a PostgreSQL cluster containing about 15 databases, most of them being used as sources for logical replication. This means we have about as many WAL senders active on the cluster at the same time. What we saw was that we had very high spikes of CPU activity, with very high level of SYS (up to 50% of the whole system was SYS, with 100% load on all CPUs) on the server, which is a dual Xeon Silver 4110, so 16 cores, 32 threads. That seemed insane as our usual load isn't that high on average (like 20% total cpu use, with ~ 1% of SYS), and mostly on 2 of those 15 databases. This mostly occured when creating an index, or doing batch updates, copys, etc. And all WAL senders consumed about the same amount, even those connected to databases where nothing happened. And while testing, we noticed things were worse with PG 12 than with PG 10 (that was the cause for the first patch Pierre posted, which was more of a way to get PostmasterIsAlive out of the way for PG 10 to get the same behaviour on both databases and confirm what was different between the two versions). So that was what the first benchmark we did, what Pierre posted a few days ago. With the second patch (reducing calls to GetFlushRecPtr), on PostgreSQL 12, with statements affecting lots of records at once, we managed to reduce the WAL senders' consumption by a factor of 15 (if the patch is correct of course). SYS was down to more sensible (near 0) values. WAL senders for databases which had no decoding to do didn't consume that much anymore, only the one connected to the database doing the work used a lot of CPU, but that's expected. This solves the problem we are facing. Without this, we won't be able to upgrade to PG 12, as the impact of GetFlushRecPtr is even worse than with PG 10. I've now tried to measure the impact of the patch on a more evenly balanced activity on several databases, where the contention on GetFlushRecPtr is less severe, to see if there are wins in all cases. Simple scripts to test this are provided as an attachment. Just set the "max" environment variable to the amount of databases/WAL senders you want, and then run create_logical.sh (creates the databases and the slots), then connect_logical.sh (connects pg_recvlogical processes to these slots), and run_stress.sh (runs a pgbench on each database, each doing 100000 transactions, all in parallel). drop_logical.sh does the cleanup. Sorry, they are very basic... Here are the results on patched/unpatched PG 12. You have the messages from all pgbenchs, then the consumption of all WAL senders at the end of a run. As you can see, pgbench runs are ~ 20% faster. That's because with unpatched, we are CPU starved (the server is 100% CPU), which we aren't with patched. WAL senders needed about half the CPU in the patched case as shown by pidstat. UNPATCHED: transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.943 ms tps = 514.796454 (including connections establishing) tps = 514.805610 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.949 ms tps = 513.130790 (including connections establishing) tps = 513.168135 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.950 ms tps = 512.946425 (including connections establishing) tps = 512.961746 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.951 ms tps = 512.643065 (including connections establishing) tps = 512.678765 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.953 ms tps = 512.159794 (including connections establishing) tps = 512.178075 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.953 ms tps = 512.024962 (including connections establishing) tps = 512.034590 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.953 ms tps = 512.016413 (including connections establishing) tps = 512.034438 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.954 ms tps = 511.728080 (including connections establishing) tps = 511.760138 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.954 ms tps = 511.655046 (including connections establishing) tps = 511.678533 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.955 ms tps = 511.604767 (including connections establishing) tps = 511.617562 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.955 ms tps = 511.525593 (including connections establishing) tps = 511.558150 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.955 ms tps = 511.496498 (including connections establishing) tps = 511.505871 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.956 ms tps = 511.334434 (including connections establishing) tps = 511.363141 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.956 ms tps = 511.256908 (including connections establishing) tps = 511.284577 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.957 ms tps = 511.021219 (including connections establishing) tps = 511.041905 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.957 ms tps = 510.977176 (including connections establishing) tps = 511.004730 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.958 ms tps = 510.818341 (including connections establishing) tps = 510.870735 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.958 ms tps = 510.719611 (including connections establishing) tps = 510.741600 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.959 ms tps = 510.460165 (including connections establishing) tps = 510.504649 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.961 ms tps = 509.962481 (including connections establishing) tps = 509.978043 (excluding connections establishing) ~$ pidstat -C postgres -l -u -T CHILD Linux 4.9.0-11-amd64 (hardy) 01/06/2020 _x86_64_ (32 CPU) 02:39:30 PM UID PID usr-ms system-ms guest-ms Command 02:39:30 PM 111 78644 2232190 284870 0 /tmp/pg-12-unpatched/bin/postgres 02:39:30 PM 111 78646 400 340 0 postgres: checkpointer 02:39:30 PM 111 78647 530 1330 0 postgres: background writer 02:39:30 PM 111 78648 300 150 0 postgres: walwriter 02:39:30 PM 111 78649 30 40 0 postgres: autovacuum launcher 02:39:30 PM 111 78650 130 440 0 postgres: stats collector 02:39:30 PM 111 78790 86340 20560 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78792 89170 23270 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78794 86990 20740 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78795 91900 22540 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78797 92000 23190 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78799 94060 22520 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78801 95300 21500 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78803 93120 21360 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78805 95420 21920 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78807 94400 21350 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78809 88850 20390 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78811 90030 20690 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78812 94310 22660 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78813 94080 22470 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78814 95370 21520 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78815 94780 21470 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78816 92440 21080 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78817 94360 20700 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78818 92230 20760 0 postgres: walsender postgres [local] idle 02:39:30 PM 111 78819 90280 20780 0 postgres: walsender postgres [local] idle PATCHED: transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.680 ms tps = 595.090858 (including connections establishing) tps = 595.131449 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.683 ms tps = 594.156492 (including connections establishing) tps = 594.198624 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.684 ms tps = 593.927387 (including connections establishing) tps = 593.946829 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.686 ms tps = 593.209506 (including connections establishing) tps = 593.257556 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.686 ms tps = 593.144977 (including connections establishing) tps = 593.162018 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.686 ms tps = 593.084403 (including connections establishing) tps = 593.122178 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.686 ms tps = 593.134657 (including connections establishing) tps = 593.199432 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.687 ms tps = 592.908760 (including connections establishing) tps = 592.923386 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.687 ms tps = 592.802027 (including connections establishing) tps = 592.814300 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.687 ms tps = 592.678874 (including connections establishing) tps = 592.769759 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.687 ms tps = 592.642501 (including connections establishing) tps = 592.723261 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.688 ms tps = 592.249597 (including connections establishing) tps = 592.262962 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.690 ms tps = 591.867795 (including connections establishing) tps = 591.958672 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.690 ms tps = 591.835950 (including connections establishing) tps = 591.908940 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.690 ms tps = 591.799816 (including connections establishing) tps = 591.824497 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.690 ms tps = 591.738978 (including connections establishing) tps = 591.780258 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.691 ms tps = 591.530490 (including connections establishing) tps = 591.570876 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.691 ms tps = 591.452142 (including connections establishing) tps = 591.498424 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.693 ms tps = 590.674305 (including connections establishing) tps = 590.708951 (excluding connections establishing) transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 latency average = 1.693 ms tps = 590.517240 (including connections establishing) tps = 590.563531 (excluding connections establishing) ~$ pidstat -C postgres -l -u -T CHILD Linux 4.9.0-11-amd64 (hardy) 01/06/2020 _x86_64_ (32 CPU) 02:29:02 PM UID PID usr-ms system-ms guest-ms Command 02:29:02 PM 111 75810 2185430 294190 0 /tmp/pg-12-patched/bin/postgres 02:29:02 PM 111 75812 410 320 0 postgres: checkpointer 02:29:02 PM 111 75813 410 1230 0 postgres: background writer 02:29:02 PM 111 75814 270 80 0 postgres: walwriter 02:29:02 PM 111 75815 30 20 0 postgres: autovacuum launcher 02:29:02 PM 111 75816 130 360 0 postgres: stats collector 02:29:02 PM 111 75961 35890 27240 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75963 37390 27950 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75965 38360 28110 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75966 38350 28160 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75968 38370 28160 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75970 37820 28110 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75972 38250 27330 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75974 36870 27640 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75976 36890 26850 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75979 36920 26330 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75980 37090 27240 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75981 38040 28210 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75982 36530 27460 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75983 37560 27330 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75984 36660 27170 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75985 36370 27020 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75986 36960 27000 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75987 36460 26820 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75988 36290 27140 0 postgres: walsender postgres [local] idle 02:29:02 PM 111 75989 36320 26750 0 postgres: walsender postgres [local] idle Regard, Marc
Attachment
Julien Rouhaud <rjuju123@gmail.com> writes: > On Mon, Jan 6, 2020 at 7:57 PM Pierre Ducroquet <p.psql@pinaraf.info> wrote: >> My deepest apologies for the patch being broken, I messed up when transferring >> it between my computers after altering the comments. The verbatim one attached >> to this email applies with no issue on current HEAD. >> The patch regarding PostmasterIsAlive is completely pointless since v12 where >> the function was rewritten, and was included only to help reproduce the issue >> on older versions. Back-patching the walsender patch further than v12 would >> imply back-patching all the machinery introduced for PostmasterIsAlive >> (9f09529952) or another intrusive change there, a too big risk indeed. > +1, backpatch to v12 looks sensible. Pushed with some minor cosmetic fiddling. regards, tom lane