Re: 2018-03 Commitfest Summary (Andres #1) - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: 2018-03 Commitfest Summary (Andres #1)
Date
Msg-id alpine.DEB.2.20.1803020918140.12500@lancre
Whole thread Raw
In response to Re: 2018-03 Commitfest Summary (Andres #1)  (Andres Freund <andres@anarazel.de>)
Responses Re: 2018-03 Commitfest Summary (Andres #1)  (Andres Freund <andres@anarazel.de>)
Re: 2018-03 Commitfest Summary (Andres #1)  (Peter Geoghegan <pg@bowt.ie>)
Re: 2018-03 Commitfest Summary (Andres #1)  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hello andres & Tom,

>>>  A bit concerned that we're turning pgbench into a kitchen sink.
>>
>> I do not understand "kitchen sink" expression in this context, and your
>> general concerns about pgbench in various comments in your message.
>
> We're adding a lot of stuff to pgbench that only a few people
> use. There's a lot of duplication with similar parts of code in other
> parts of the codebase. pgbench in my opinion is a tool to facilitate
> postgres development, not a goal in itself.

I disagree.

I think that pgbench should *also* allow to test postgres performance in 
realistic scenarii that allow to communicate about performance, and 
reassure users about their use case, not just a simplified tpc-b.

Even if you would want to restrict it to internal postgres development, 
which I would see as a shame, recently added features are still useful.

For instance, I used extensively tps throttling, latencies and timeouts 
measures when developping and testing the checkpointer sorting & 
throttling patch.

Some people are just proposing a new storage engine which changes the cost 
of basic operations (improves committed transaction, makes rolling-back 
more expensive). What is the actual impact depending on the rollback rate? 
How do you plan to measure that? Pgbench needs capacities to be useful 
there, and the good news is that some recently added ones would come 
handy.

> It's a bad measure, but the code growth shows my concerns somewhat:
> master:        5660 +817
> REL_10_STABLE: 4843 +266
> REL9_6_STABLE: 4577 +424
> REL9_5_STABLE: 4153 +464
> REL9_4_STABLE: 3689 +562
> REL9_3_STABLE: 3127 +338
> REL9_2_STABLE: 2789 +96
> REL9_1_STABLE: 2693

A significant part of this growth is the expression engine, which is 
mostly trivial code, although alas not necessarily devout of bugs. If 
moved to fe-utils, pgbench code footprint will be reduced by about 2000 
lines.

Also, code has been removed (eg the fork-based implementation) and 
significant restructuring which has greatly improved code maintenance, 
even if the number of lines has possibly increased in passing.

>> So this setting-variable-from-query patch goes with having boolean
>> expressions (already committed), having conditions (\if in the queue),
>> improving the available functions (eg hashes, in the queue)... so that
>> existing, data-dependent, realistic benchmarks can be implemented, and
>> benefit for the great performance data collection provided by the tool.
>
> I agree that they're useful in a few cases, but they have to consider
> that they need to be reviewed and maintained an the project is quite
> resource constrained in that regard.

Currently I do most of the reviewing & maintenance of pgbench, apart from 
the patch I submit.

I can stop doing both if the project decides that improving pgbench 
capabilities is against its interest.

Tom said:

> FWIW, I share Andres' concern that pgbench is being extended far past 
> what anyone has shown a need for.  If we had infinite resources
> this wouldn't be a big problem, but it's eating into limited
> committer hours and I'm not really convinced that we're getting 
> adequate return.

As pgbench patches can stay ready-to-committers for half a dozen CF, I'm 
not sure the strain on the committer time is that heavy:-) There are not 
so many of them, most of them are trivial. If you drop them on the ground 
that the you do not want them, it will not change anything to the lack of 
reviewing resources and incapacity of the project to process the submitted 
patches, which in my opinion is a wider issue, not related to the few 
pgbench-related submissions.

On the "adequate return" point, my opinion is that currently pgbench is 
just below the feature set needed to be generally usable, so not improving 
it is a self-fullfilling ensurance that it will not be used further. Once 
the "right" feature set is reached (for me, at least extracting query 
output into variables, having conditionals, possibly a few more functions 
if some benches use them), whether it would be actually more widely used 
by both developers and users is an open question.

Now, as I said, if pgbench improvements are not seen as desirable, I can 
mark submissions as "rejected" and do other things with my little 
available time than try to contribute to postgres.

>>> - pgbench - test whether a variable exists
>>
>> As already said, the motivation is that it is a preparation for a (much)
>> larger patch which would move pgbench expressions to fe utils and use them
>> in "psql".
>
> You could submit it together with that.

Sure, I could. My previous experience is that maintaining a set of 
dependent patches is tiresome, and it does not help much with testing and 
reviewing either. So I'm doing things one (slow) step at a time, 
especially as each time I've submitted patches which were doing more than 
one thing I was asked to disentangle features and restructuring.

> But I don't see in the first place why we need to add the feature with 
> duplicate code, just so we can unify.

It is not duplicate code. In psql the variable-exists-test is currently 
performed on the fly by the lexer. With the expression engine, it needs to 
be lexed, parsed and finally evaluated, so this is necessarily new code.

> We can gain it via the unification, no?

Well, this would be a re-implementation anyway. I'm not sure the old one 
would disappear completly, because it depends on backslash commands which 
have different lexing assumptions (eg currently the variable-exists-test 
is performed from both "psqlscan.l" and "psqlscanslash.l" independently).

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Yuqi Gu
Date:
Subject: RE: Optimize Arm64 crc32c implementation in Postgresql
Next
From: Thomas Munro
Date:
Subject: Re: Suspicious call of initial_cost_hashjoin()