Thread: Re: WIP patch for latestCompletedXid method of computing snapshot xmax
I wrote: > This patch implements Florian's idea about how to manage snapshot xmax > without the ugly and performance-losing tactic of taking XidGenLock and > ProcArrayLock at the same time. I had to do a couple of slightly klugy > things to get bootstrap and prepared transactions to work, but on the > whole it seems at least as clean as the code we have now. Comments? I spent a fair amount of time this afternoon trying to measure the performance impact of this patch using pgbench, without a lot of success --- as far as average transaction rates go, it's a wash compared to CVS HEAD. However, it eventually struck me to look at the distribution of transaction times, using pgbench's -l logging output, and on that basis it is clear that getting rid of the XidGenLock interaction has a good deal of use in eliminating outlier times. My desktop machine has a single consumer-grade IDE drive, and even with fsync off and synchronous_commit off, it can barely make 190 tps sustained pgbench throughput --- it's just disk write bound all the time. On a run with 8 clients, 10000 transactions per client, DB scale factor 25, I get this distribution of transaction times from CVS HEAD: postgres=# select usec/1000000, count(*) from plhead group by 1 order by 1;?column? | count ----------+------- 0 | 79306 1 | 290 2 | 116 3 | 65 4 | 82 5 | 30 6 | 31 7 | 32 10 | 8 11 | 8 13 | 16 14 | 5 15 | 3 20 | 8 (14 rows) and this from HEAD plus the patch: postgres=# select usec/1000000, count(*) from plpatch group by 1 order by 1;?column? | count ----------+------- 0 | 79305 1 | 325 2 | 85 3 | 49 4 | 68 5 | 50 6 | 45 7 | 35 8 | 14 9 | 8 10 | 6 11 | 10 (12 rows) The worst-case transaction time has dropped by nearly a factor of 2: postgres=# select * from plhead order by usec desc limit 20;client | trans | usec | f | epoch | lsb --------+-------+----------+---+------------+-------- 2 | 6379 | 20621910 | 0 | 1189280557 | 664207 6 | 5992 |20621175 | 0 | 1189280557 | 665970 7 | 5795 | 20621024 | 0 | 1189280557 | 666353 1 | 6327 | 20620833 | 0 | 1189280557| 663606 3 | 6463 | 20620277 | 0 | 1189280557 | 663895 4 | 6383 | 20620260 | 0 | 1189280557 | 664000 5 | 6209 | 20620077 | 0 | 1189280557 | 665060 0 | 6269 | 20619875 | 0 | 1189280557 | 664935 6 | 8182| 15191784 | 0 | 1189280655 | 87859 3 | 8810 | 15191637 | 0 | 1189280655 | 86802 2 | 8700 | 15185120 | 0| 1189280655 | 86742 5 | 8479 | 14078513 | 0 | 1189280653 | 978339 1 | 8618 | 14077106 | 0 | 1189280653 | 978216 7 | 7930 | 14076905 | 0 | 1189280653 | 978832 4 | 8704 | 14076429 | 0 | 1189280653 | 977877 0 | 8557| 14076249 | 0 | 1189280653 | 977477 0 | 6717 | 13932179 | 0 | 1189280576 | 65288 1 | 6775 | 13931973 | 0| 1189280576 | 65387 6 | 6410 | 13931493 | 0 | 1189280576 | 67192 7 | 6201 | 13931140 | 0 | 1189280576 | 69247 (20 rows) postgres=# select * from plpatch order by usec desc limit 20;client | trans | usec | f | epoch | lsb --------+-------+----------+---+------------+-------- 6 | 6008 | 11833702 | 0 | 1189281093 | 646851 0 | 6140 |11833041 | 0 | 1189281093 | 645738 2 | 6289 | 11809343 | 0 | 1189281093 | 616734 4 | 6315 | 11808044 | 0 | 1189281093| 617505 3 | 6344 | 11807762 | 0 | 1189281093 | 616970 7 | 5802 | 11807641 | 0 | 1189281093 | 617932 5 | 6183 | 11806964 | 0 | 1189281093 | 618060 1 | 6163 | 11805494 | 0 | 1189281093 | 616679 7 | 8175| 11027973 | 0 | 1189281239 | 675499 2 | 8725 | 11019066 | 0 | 1189281239 | 674305 5 | 8828 | 10997331 | 0| 1189281239 | 674953 0 | 8541 | 10987629 | 0 | 1189281239 | 673773 1 | 8799 | 10713734 | 0 | 1189281239 | 673217 4 | 8897 | 10705975 | 0 | 1189281239 | 672743 6 | 8364 | 10702875 | 0 | 1189281239 | 677163 3 | 8814| 10701467 | 0 | 1189281239 | 674369 3 | 9223 | 9158554 | 0 | 1189281258 | 202934 5 | 9234 | 9143744 | 0| 1189281258 | 203073 7 | 8493 | 9099174 | 0 | 1189281258 | 204092 4 | 9306 | 9097074 | 0 | 1189281258 | 202402 (20 rows) So on the strength of that, I'm going to go ahead and commit the patch, but I'd be interested to see benchmarks from people with access to better hardware. regards, tom lane
Tom, > So on the strength of that, I'm going to go ahead and commit the patch, > but I'd be interested to see benchmarks from people with access to > better hardware. Is the latest version of the patch from -patches the one I should test? -- Josh Berkus PostgreSQL @ Sun San Francisco
Re: WIP patch for latestCompletedXid method of computing snapshot xmax
From
Markus Schiltknecht
Date:
Hello Tom, Tom Lane wrote: > So on the strength of that, I'm going to go ahead and commit the patch, > but I'd be interested to see benchmarks from people with access to > better hardware. I've just completed two dbt2 test runs on a mid-level system, with 4GB RAM and a 7 disk SATA RAID 1+0 w/ BBU. Once with code as of 2007/09/05 18:00 (which is *before* the first lazy xid commit) and once with cvs HEAD (2007/09/08 +latestCompletedXid.patch. Here are the results from the first test run (test run 33, without lazy xid): > $ cat 33/driver/results.out > Response Time (s) > Transaction % Average : 90th % Total Rollbacks % > ------------ ----- --------------------- ----------- --------------- ----- > Delivery 3.97 3.745 : 7.844 11844 0 0.00 > New Order 45.35 3.844 : 7.692 135192 1352 1.01 > Order Status 3.95 2.728 : 6.371 11764 0 0.00 > Payment 42.74 2.649 : 6.349 127415 0 0.00 > Stock Level 4.00 2.172 : 5.634 11915 0 0.00 > ------------ ----- --------------------- ----------- --------------- ----- > > 1103.45 new-order transactions per minute (NOTPM) > 120.1 minute duration > 0 total unknown errors > 1003 second(s) ramping up And that's with HEAD +latestCompletedXid.patch (test run 34): > $ cat 34/driver/results.out > Response Time (s) > Transaction % Average : 90th % Total Rollbacks % > ------------ ----- --------------------- ----------- --------------- ----- > Delivery 3.96 3.843 : 8.223 11760 0 0.00 > New Order 45.28 4.049 : 8.451 134398 1300 0.98 > Order Status 3.97 2.877 : 6.815 11777 0 0.00 > Payment 42.80 2.745 : 6.718 127027 0 0.00 > Stock Level 4.00 2.280 : 6.129 11859 0 0.00 > ------------ ----- --------------------- ----------- --------------- ----- > > 1097.71 new-order transactions per minute (NOTPM) > 120.1 minute duration > 0 total unknown errors > 1003 second(s) ramping up Both tests ran for two hours, had 100 warehouses and 50 connections. shared_buffers were set to 1024MB, effective_cachesize = 3800MB, all other settings were standard. Regards Markus
Markus, > I've just completed two dbt2 test runs on a mid-level system, with 4GB > RAM and a 7 disk SATA RAID 1+0 w/ BBU. Once with code as of 2007/09/05 > 18:00 (which is *before* the first lazy xid commit) and once with cvs > HEAD (2007/09/08 +latestCompletedXid.patch. Hmmm. Your results are withing the margin of error for DBT2, so they show no real difference. What we need for this is a heavy-read workload, though; on a workload like DBT2 (which is mostly writes) I wouldn't expect lazy-XID to help much. I'll see if I can find something appropriate. Maybe Jan's TPC-W implementation? -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Hmmm. Your results are withing the margin of error for DBT2, so they > show no real difference. What we need for this is a heavy-read > workload, though; on a workload like DBT2 (which is mostly writes) I > wouldn't expect lazy-XID to help much. Lazy-XID doesn't help on a write-mostly workload, but I would have expected to see some benefit from the latestCompletedXid patch. The rough tests I just finished suggest that the win to look for is improvement of the very tail of the distribution --- well beyond the 90th percentile point which is the most we can see in Markus' printout. Can we get 95% and 99% response time percentiles? For comparison, in my little test I got 0.828546 vs 0.873957 as the 99% percentile pgbench transaction times. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > Is the latest version of the patch from -patches the one I should test? Yeah, the posted patch is the same as what I committed. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > My desktop machine has a single consumer-grade IDE drive, and even with > fsync off and synchronous_commit off, it can barely make 190 tps sustained > pgbench throughput --- it's just disk write bound all the time. On a run > with 8 clients, 10000 transactions per client, DB scale factor 25, I get > this distribution of transaction times from CVS HEAD: Wouldn't you expect to see more of an effect on cpu-bound environments? Is there any i/o going on while these locks are being held? I suppose there's a WAL commit record that has to be synced? Have you tried with a smaller scale factor? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Josh Berkus" <josh@agliodbs.com> writes: > Hmmm. Your results are withing the margin of error for DBT2, so they show no > real difference. What we need for this is a heavy-read workload, though; on > a workload like DBT2 (which is mostly writes) I wouldn't expect lazy-XID to > help much. The TPM is definitely within the margin of error but the 90th percentile response times seem worrisome. But they're already over 5s which are TPC-C failures. From what we've seen when you push the scale factor too high until those numbers are even close to 5s you get very unstable results. The published benchmarks seem to keep the max around 5s which puts the 90th percentile around half that. That seems overly conservative but I wonder what reason they have for doing it that way. > I'll see if I can find something appropriate. Maybe Jan's TPC-W > implementation? We could just rejigger the percentages on the transactions. I think Stock Level and Order Status are entirely read-only but I would have to check. Stock Level is a very intensive query though so I wouldn't suggest raising the percentage on that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com