Thread: Refactoring pgbench.c

Refactoring pgbench.c

From
Tatsuo Ishii
Date:
Main pgbench logic consists of single file pgbench.c which is 4036
lines of code as of today. This is not a small number and I think it
would be nice if it is divided into smaller files because it will make
it easier to maintain, add or change features of pgbench.  I will come
up with an idea how to split pgbench.c later. In the mean time I
attached a call graph of pgbench.c generated by egypt, which we could
get a basic idea how to split and modularize pgbench.c.

BTW, while looking at pgbench.c I noticed subtle coding problems:

1) strtoint64() should be decalred as static

2) 
static
void
agg_vals_init(AggVals *aggs, instr_time start)

"static" and "void" should be one line, rather than separate 2 lines
according to our coding style.

I will commit fix for these if there's no objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Refactoring pgbench.c

From
Fabien COELHO
Date:
Hello Tatsuo,

> Main pgbench logic consists of single file pgbench.c which is 4036 lines 
> of code as of today. This is not a small number and I think it would be 
> nice if it is divided into smaller files because it will make it easier 
> to maintain, add or change features of pgbench.

I do not think that this large file is a so big a problem (good editors 
help navigation in the code), and I'm not sure that splitting it would 
achieve much: there are not that many functions, some of them are maybe 
long (main, threadRun, doCustom) but mostly for good reasons.

I've submitted a patch to remove "fork-emulation", which I think would 
really help simplify the code (maybe -10% source in "pgbench.c", less 
#ifs, avoid double implementations or more-complex-than-necessary 
implementations or not-implemented features).

-- 
Fabien.



Re: Refactoring pgbench.c

From
Tomas Vondra
Date:
Hi,

On 06/28/2015 08:10 AM, Fabien COELHO wrote:
>
> Hello Tatsuo,
>
>> Main pgbench logic consists of single file pgbench.c which is 4036
>> lines of code as of today. This is not a small number and I think
>> it would be nice if it is divided into smaller files because it
>> will make it easier to maintain, add or change features of pgbench.
>
> I do not think that this large file is a so big a problem (good
> editors help navigation in the code), and I'm not sure that splitting
> it would achieve much: there are not that many functions, some of
> them are maybe long (main, threadRun, doCustom) but mostly for good
> reasons.

My thoughts, exactly. I don't think just splitting the file into 
multiple pieces will achieve anything - the problem is that we've 
extended the original pgbench code in rather hackish way, without any 
major design changes, so IMHO what should be done is refactoring ...

> I've submitted a patch to remove "fork-emulation", which I think
> would really help simplify the code (maybe -10% source in
> "pgbench.c", less #ifs, avoid double implementations or
> more-complex-than-necessary implementations or not-implemented
> features).

... and cleanup of dead code.

--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Refactoring pgbench.c

From
Tatsuo Ishii
Date:
>> I do not think that this large file is a so big a problem (good
>> editors help navigation in the code), and I'm not sure that splitting
>> it would achieve much: there are not that many functions, some of
>> them are maybe long (main, threadRun, doCustom) but mostly for good
>> reasons.
> 
> My thoughts, exactly. I don't think just splitting the file into
> multiple pieces will achieve anything - the problem is that we've
> extended the original pgbench code in rather hackish way, without any
> major design changes, so IMHO what should be done is refactoring ...

This is kind of surprising to me that two people are against
refactoring proposal (I understand that Fabien has pending patches for
pgbench and that could be a motivation for this though).

Splitting single large file into smaller files in which functions
contain strong relation will be itself a benefit since there will be
more chance for people to be needed to look into smaller places than
before. This is just a basic idea of modular programming and will be a
long term benefit of PostgreSQL project, which was my thought.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Refactoring pgbench.c

From
Tomas Vondra
Date:
On 06/28/15 18:56, Tatsuo Ishii wrote:
>>> I do not think that this large file is a so big a problem (good
>>> editors help navigation in the code), and I'm not sure that splitting
>>> it would achieve much: there are not that many functions, some of
>>> them are maybe long (main, threadRun, doCustom) but mostly for good
>>> reasons.
>>
>> My thoughts, exactly. I don't think just splitting the file into
>> multiple pieces will achieve anything - the problem is that we've
>> extended the original pgbench code in rather hackish way, without any
>> major design changes, so IMHO what should be done is refactoring ...
>
> This is kind of surprising to me that two people are against
> refactoring proposal (I understand that Fabien has pending patches
> for pgbench and that could be a motivation for this though).

I think that's a misunderstanding. I'm not against refactoring - not at 
all, and neither is Fabien I believe. However the first paragraph of 
your post seems to suggest that we get better code by merely splitting 
the a large file into several smaller ones.

I don't think that alone helps - I think we first need to refactor the 
code first, and then possibly split the file into several modules.

> Splitting single large file into smaller files in which functions
> contain strong relation will be itself a benefit since there will be
> more chance for people to be needed to look into smaller places than
> before. This is just a basic idea of modular programming and will be
> a long term benefit of PostgreSQL project, which was my thought.

I respectfully disagree. I think that for this to be true, the files 
need to be somehow logically related (which I'd expect to be the output 
of the refactoring). But maybe you have some particular refactoring and 
split in mind?

best regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Refactoring pgbench.c

From
Fabien COELHO
Date:
>> This is kind of surprising to me that two people are against 
>> refactoring proposal (I understand that Fabien has pending patches for 
>> pgbench and that could be a motivation for this though).
>
> I think that's a misunderstanding. I'm not against refactoring - not at all, 
> and neither is Fabien I believe.

Indeed I'm not, I was writing about splitting "pgbench.c" into several 
files. Although, as Tatsuo-san noted, I'm probably against a refactoring 
before the patches I already submitted get considered:-)

Really refactoring would mean breaking apart the long functions, but these 
are long for some reasons (say global goto to handle errors...), so how to 
do something very useful on that line is not clear to me.

-- 
Fabien.



Re: Refactoring pgbench.c

From
Jeff Janes
Date:
On Sat, Jun 27, 2015 at 5:10 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
Main pgbench logic consists of single file pgbench.c which is 4036
lines of code as of today. This is not a small number and I think it
would be nice if it is divided into smaller files because it will make
it easier to maintain, add or change features of pgbench.  I will come
up with an idea how to split pgbench.c later. In the mean time I
attached a call graph of pgbench.c generated by egypt, which we could
get a basic idea how to split and modularize pgbench.c.

I've never found the raw line count of a file to be a problem, at least not at the 4000 line mark.  

If some functions could be moved to existing libraries (like the functions strtoint64 or doConnect or read_line_from_file or the statistical distribution functions) that would be great, but I assume that would have been done already if there was already a logical place to hold them.  I don't think inventing new libraries just to hold these would be an improvement.

If you provided a suggested dissection, maybe I would find it compelling.  But without seeing a concrete proposal I think the process of refactoring is going to impede other improvements more than the result of refactoring it will promote them.

Cheers,

Jeff