Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pgsql: Add parallel-aware hash joins. |
Date | |
Msg-id | CA+TgmoYn8avuxg=dS8mbppjLn0X7AXMduU+dopeN73eZmP2u6w@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Add parallel-aware hash joins. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgsql: Add parallel-aware hash joins.
|
List | pgsql-hackers |
On Wed, Jan 24, 2018 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I find that to be a completely bogus straw-man argument. The point of > looking at the prairiedog time series is just to see a data series in > which the noise level is small enough to discern the signal. If anyone's > got years worth of data off a more modern machine, and they can extract > a signal from that, by all means let's consider that data instead. But > there's no clear argument (or at least you have not made one) that says > that prairiedog's relative timings don't match what we'd get on more > modern machines. There is no need to collect years of data in order to tell whether or not the time to run the tests has increased by as much on developer machines as it has on prairiedog. You showed the time going from 3:36 to 8:09 between 2014 and the present. That is a 2.26x increase. It is obvious from the numbers I posted before that no such increase has taken place in the time it takes to run 'make check' on my relatively modern laptop. Whatever difference exists is measured in milliseconds. > so join has gotten about 1 second slower since v10, and that time is > coming entirely out of developers' hides despite parallelism because > it was already the slowest in its group. > > So I continue to maintain that an unreasonable fraction of the total > resources devoted to the regular regression tests is going into these > new hashjoin tests. I think there is an affirmative desire on the part of many contributors to have newer features tested more thoroughly than old ones were. That will tend to mean that features added more recently have test suites that are longer-running compared to the value of the feature they test than what we had in the past. When this has been discussed at developer meetings, everyone except you (and to a lesser extent me) has been in favor of this. Even if that meant that you had to wait 1 extra second every time you run 'make check', I would judge that worthwhile. But it probably doesn't, because there are a lot of things that can be done to improve this situation, such as... > Based on these numbers, it seems like one easy thing we could do to > reduce parallel check time is to split the plpgsql test into several > scripts that could run in parallel. But independently of that, > I think we need to make an effort to push hashjoin's time back down. ...this. Also, the same technique could probably be applied to the join test itself. I think Thomas just added the tests to that file because it already existed, but there's nothing to say that the file couldn't be split into several chunks. On a quick look, it looks to me as though that file is testing a lot of pretty different things, and it's one of the largest test case files, accounting for ~3% of the total test suite by itself. Another thing you could do is consider applying the patch Thomas already posted to reduce the size of the tables involved. The problem is that, for you and the buildfarm to be happy, the tests have to (1) run near-instantaneously even on thoroughly obsolete hardware, (2) give exactly the same answers on 32-bit systems, 64-bit systems, Linux, Windows, AIX, HP-UX, etc., and (3) give those same exact answers 100% deterministically on all of those platforms. Parallel query is inherently non-deterministic about things like how much work goes to each worker, and I think that really small tests will tend to show more edge cases like one worker not doing anything. So it might be that if we cut down the sizes of the test cases we'll spend more time troubleshooting the resulting instability than any developer time we would've saved by reducing the runtime. But we can try it. >> One caveat is that old machines also >> somewhat approximate testing with more instrumentation / debugging >> enabled (say valgrind, CLOBBER_CACHE_ALWAYS, etc). So removing excessive >> test overhead has still quite some benefits. But I definitely do not >> want to lower coverage to achieve it. > > I don't want to lower coverage either. I do want some effort to be > spent on achieving test coverage intelligently, rather than just throwing > large test cases at the code without consideration of the costs. I don't believe that any such thing is occurring, and I think it's wrong of you to imply that these test cases were added unintelligently. To me, that seems like an ad hominum attack on both Thomas (who spent a year or more developing the feature those test cases exercise) and Andres (who committed them). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: