Thread: v17 vs v16 performance comparison
Hello hackers, I've repeated the performance measurement for REL_17_STABLE (1e020258e) and REL_16_STABLE (6f6b0f193) and found several benchmarks where v16 is significantly better than v17. Please find attached an html table with all the benchmarking results. I had payed attention to: Best pg-src-17--.* worse than pg-src-16--.* by 57.9 percents (225.11 > 142.52): pg_tpcds.query15 Average pg-src-17--.* worse than pg-src-16--.* by 55.5 percents (230.57 > 148.29): pg_tpcds.query15 in May, performed `git bisect` for this degradation, that led me to commit b7b0f3f27 [1]. This time I bisected the following anomaly: Best pg-src-17--.* worse than pg-src-16--.* by 23.6 percents (192.25 > 155.58): pg_tpcds.query21 Average pg-src-17--.* worse than pg-src-16--.* by 25.1 percents (196.19 > 156.85): pg_tpcds.query21 and to my surprise I got "b7b0f3f27 is the first bad commit". Moreover, bisecting of another anomaly: Best pg-src-17--.* worse than pg-src-16--.* by 24.2 percents (24269.21 > 19539.89): pg_tpcds.query72 Average pg-src-17--.* worse than pg-src-16--.* by 24.2 percents (24517.66 > 19740.12): pg_tpcds.query72 pointed at the same commit again. So it looks like q15 from TPC-DS is not the only query suffering from that change. But beside that, I've found a separate regression. Bisecting for this degradation: Best pg-src-17--.* worse than pg-src-16--.* by 105.0 percents (356.63 > 173.96): s64da_tpcds.query95 Average pg-src-17--.* worse than pg-src-16--.* by 105.2 percents (357.79 > 174.38): s64da_tpcds.query95 pointed at f7816aec2. Does this deserve more analysis and maybe fixing? [1] https://www.postgresql.org/message-id/63a63690-dd92-c809-0b47-af05459e95d1%40gmail.com Best regards, Alexander
Attachment
Alexander Lakhin <exclusion@gmail.com> writes: > I've repeated the performance measurement for REL_17_STABLE (1e020258e) > and REL_16_STABLE (6f6b0f193) and found several benchmarks where v16 is > significantly better than v17. Please find attached an html table with > all the benchmarking results. Thanks for doing that! I have no opinion about b7b0f3f27, but as far as this goes: > But beside that, I've found a separate regression. Bisecting for this degradation: > Best pg-src-17--.* worse than pg-src-16--.* by 105.0 percents (356.63 > 173.96): s64da_tpcds.query95 > Average pg-src-17--.* worse than pg-src-16--.* by 105.2 percents (357.79 > 174.38): s64da_tpcds.query95 > pointed at f7816aec2. I'm not terribly concerned about that. The nature of planner changes like that is that some queries will get worse and some better, because the statistics and cost estimates we're dealing with are not perfect. It is probably worth drilling down into that test case to understand where the planner is going wrong, with an eye to future improvements; but I doubt it's something we need to address for v17. regards, tom lane
On Thu, Aug 1, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > So it looks like q15 from TPC-DS is not the only query suffering from that > change. I'm going to try to set up a local repro to study these new cases. If you have a write-up somewhere of how exactly you run that, that'd be useful.
Hello Thomas. 01.08.2024 08:57, Thomas Munro wrote: > On Thu, Aug 1, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: >> So it looks like q15 from TPC-DS is not the only query suffering from that >> change. > I'm going to try to set up a local repro to study these new cases. If > you have a write-up somewhere of how exactly you run that, that'd be > useful. I'm using this instrumentation (on my Ubuntu 22.04 workstation): https://github.com/alexanderlaw/pg-mark.git README.md can probably serve as a such write-up. If you install all the prerequisites (some tests, including pg_tpcds, require downloading additional resources; run-benchmarks.py will ask to do that), there should be no problems with running benchmarks. I just added two instances to config.xml: <instance id="pg-src-16" type="src" pg_version="16devel" git_branch="REL_16_STABLE" /> <instance id="pg-src-17" type="src" pg_version="17devel" git_branch="REL_17_STABLE" /> and ran 1) ./prepare-instances.py -i pg-src-16 pg-src-17 2) time ./run-benchmarks.py -i pg-src-16 pg-src-17 pg-src-16 pg-src-17 pg-src-17 pg-src-16 (it took 1045m55,215s on my machine so you may prefer to choose the single benchmark (-b pg_tpcds or maybe s64da_tpcds)) 3) ./analyze-benchmarks.py -i 'pg-src-17--.*' 'pg-src-16--.*' All the upper-level commands to run benchmarks are contained in config.xml, so you can just execute them separately, but my instrumentation eases processing of the results by creating one unified benchmark-results.xml. Please feel free to ask any questions or give your feedback. Thank you for paying attention to this! Best regards, Alexander
01.08.2024 06:41, Tom Lane wrote: > >> But beside that, I've found a separate regression. Bisecting for this degradation: >> Best pg-src-17--.* worse than pg-src-16--.* by 105.0 percents (356.63 > 173.96): s64da_tpcds.query95 >> Average pg-src-17--.* worse than pg-src-16--.* by 105.2 percents (357.79 > 174.38): s64da_tpcds.query95 >> pointed at f7816aec2. > I'm not terribly concerned about that. The nature of planner changes > like that is that some queries will get worse and some better, because > the statistics and cost estimates we're dealing with are not perfect. > It is probably worth drilling down into that test case to understand > where the planner is going wrong, with an eye to future improvements; > but I doubt it's something we need to address for v17. Please find attached two plans for that query [1]. (I repeated the benchmark for f7816aec2 and f7816aec2~1 five times and made sure that both plans are stable.) Meanwhile I've bisected another degradation: Best pg-src-17--.* worse than pg-src-16--.* by 11.3 percents (7.17 > 6.44): job.query6f and came to the commit b7b0f3f27 again. [1] https://github.com/swarm64/s64da-benchmark-toolkit/blob/master/benchmarks/tpcds/queries/queries_10/95.sql Best regards, Alexander
Attachment
On Tue, Sep 3, 2024 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > From a bird's eye view, new v17-vs-v16 comparison has only 87 "worse", > while the previous one had 115 (it requires deeper analysis, of course, but > still...). Any chance you could share that whole pgdata dir with me, assuming it compresses to a manageable size? Perhaps we could discuss that off-list?