Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: pgsql: Add parallel-aware hash joins. |
Date | |
Msg-id | 14488.1516822307@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pgsql: Add parallel-aware hash joins. (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pgsql: Add parallel-aware hash joins.
Re: pgsql: Add parallel-aware hash joins. Re: pgsql: Add parallel-aware hash joins. |
List | pgsql-hackers |
Andres Freund <andres@anarazel.de> writes: > On 2018-01-24 13:11:22 -0500, Robert Haas wrote: >> Now, how much should we care about the performance of software with a >> planned release date of 2018 on hardware discontinued in 2001, >> hardware that is apparently about 20 times slower than a modern >> laptop? Some, perhaps, but maybe not a whole lot. Removing tests >> that have found actual bugs because they cost runtime on ancient >> systems that nobody uses for serious work doesn't make sense to me. > I again agree with the sentiment. I find that to be a completely bogus straw-man argument. The point of looking at the prairiedog time series is just to see a data series in which the noise level is small enough to discern the signal. If anyone's got years worth of data off a more modern machine, and they can extract a signal from that, by all means let's consider that data instead. But there's no clear argument (or at least you have not made one) that says that prairiedog's relative timings don't match what we'd get on more modern machines. Now, what *is* a relevant objection is that most of us care more about the runtime of the parallelized regression tests than serial tests. I did not use prairiedog's "make check" timings in these graphs because that would include "make install" and initdb timings, adding noise and overhead and probably making it harder to see what's going on. But it's perfectly fair to want to optimize that case not the serial case. However ... if you spend any time looking at the behavior of that, the hashjoin tests are still problematic. I instrumented the parallel tests by turning on log_disconnections so as to get per-test-script timings, and what I find to be the slowest steps on my development workstation are [pg_regress/rowsecurity] 0.517 [pg_regress/partition_join] 0.535 [pg_regress/updatable_views] 0.546 [pg_regress/stats] 0.566 [pg_regress/triggers] 0.618 [pg_regress/foreign_data] 0.642 [pg_regress/stats_ext] 0.670 [pg_regress/select_parallel] 0.828 [pg_regress/create_index] 0.916 [pg_regress/alter_table] 1.187 [pg_regress/gist] 1.283 [pg_regress/join] 1.923 [pg_regress/plpgsql] 3.100 (The overall runtime for "make installcheck-parallel" on this machine is about 17.3 seconds right now.) The next slowest test script in the join test's group is "update", at 0.373 seconds; so over 1.5 sec of the total 17.3 sec runtime is being spent solely in the join script. Running the same test on the v10 branch, the slowest steps are [pg_regress/join] 0.521 [pg_regress/rowsecurity] 0.521 [pg_regress/updatable_views] 0.554 [pg_regress/triggers] 0.624 [pg_regress/foreign_data] 0.647 [pg_regress/stats_ext] 0.675 [pg_regress/select_parallel] 0.690 [pg_regress/create_index] 0.928 [pg_regress/gist] 1.020 [pg_regress/alter_table] 1.120 [pg_regress/plpgsql] 3.217 so join has gotten about 1 second slower since v10, and that time is coming entirely out of developers' hides despite parallelism because it was already the slowest in its group. So I continue to maintain that an unreasonable fraction of the total resources devoted to the regular regression tests is going into these new hashjoin tests. Based on these numbers, it seems like one easy thing we could do to reduce parallel check time is to split the plpgsql test into several scripts that could run in parallel. But independently of that, I think we need to make an effort to push hashjoin's time back down. > One caveat is that old machines also > somewhat approximate testing with more instrumentation / debugging > enabled (say valgrind, CLOBBER_CACHE_ALWAYS, etc). So removing excessive > test overhead has still quite some benefits. But I definitely do not > want to lower coverage to achieve it. I don't want to lower coverage either. I do want some effort to be spent on achieving test coverage intelligently, rather than just throwing large test cases at the code without consideration of the costs. regards, tom lane
pgsql-hackers by date: