Re: [EXTERNAL] Re: Allow declaration after statement and reformat code to use it - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: [EXTERNAL] Re: Allow declaration after statement and reformat code to use it
Date
Msg-id CAEudQApJn4h9iw9sL6WaO597k+Q73_0Yyxa1dBbrnyFuqnW6Jg@mail.gmail.com
Whole thread Raw
In response to Re: [EXTERNAL] Re: Allow declaration after statement and reformat code to use it  (Jelte Fennema <Jelte.Fennema@microsoft.com>)
Responses Re: [EXTERNAL] Re: Allow declaration after statement and reformat code to use it
List pgsql-hackers
Em sex., 20 de ago. de 2021 às 12:29, Jelte Fennema <Jelte.Fennema@microsoft.com> escreveu:
However, even if such an idea were to get the green light, I think I would
take the obligatory regex jokes seriously, and instead use something like
srcML [0] and do the analysis and modification on proper parse trees.
@Chapman I think that's a reasonable ask. This is the first time I heard of srcML. It indeed looks like it would be a much better tool than a regex to do this refactor. The refactor logic would still be roughly the same though because the regex essentially extracts and modifies a parse tree too. Just one that only has the bare minimum requirements for this specific refactor. I would like to state, that we've used this regex to refactor both Citus and pg_auto_failover, and no bugs turned up because of it so far. This is probably, because a regex is such a horrible way of writing a parser in the first place. So, any bugs in the regex will very likely create code that does not compile at all, instead of code that can be compiled but has some obscure bug .
I think effectively it would need to be run on all supported versions,
which means the risk of introducing errors would have to be very low.
@Bruce @Tom @Chapman I agree that doing this refactor only on the current master branch would cause way more pain due to backpatching, than the amount of happiness it provides by having easier to reason about code. I like the idea of avoiding this backpatching problem by running it . Luckily, running the automatic refactor on all branches should be pretty easy. However, it seems like PG11 and below only require a C89 compatible compiler. Am I correct in assuming increasing that requirement to C99 is not really an option? If so, I think it might be best if I revive this thread in ~27 months when PG11 is not supported anymore.
The rule against declaration-after-statement was kept
after significant discussion, which you can find in the archives if you
look. 
@Tom I guess my search skills in the archives are not very good, because I cannot find any big discussion on about declaration-after-statement. The only discussion I can find is this: https://www.postgresql.org/message-id/877efaol77.fsf@news-spur.riddles.org.uk If you're able to find that thread, could you share it?
C needs readability, not fewer lines.
Aside from horrible code, it doesn't improve 0.1% on anything.
@Ranier I tried to explain in my initial email that it not only reduces lines, but more importantly improves readability quite a bit IMHO:
2. Declarations are closer to the actual usage. This is advised by the "Code Complete" book [2] and has the following advantages:
   a. This limits variable scope to what is necessary. Which in turn makes the mental model you have to keep of a function when reading the code simpler.
   b. In most cases it allows you to see the the type of a variable without needing to go to the top of the function.
3. You can do input checking and assertions at the top of the function, instead of having to put declarations in front of it. This makes it clear which invariants hold for the function.  (as seen in the example above and the changes for pg_file_rename[3])
If you disagree with those statements, could you explain why?
2. Declarations are closer to the actual usage. 
a. This limits variable scope to what is necessary
I don't see this as an advantage, even for someone learning C today, which is not the case for 99% of hackers here.
With all declarations first, im my mental model, I can see all variables together and
and all your relationships and dependencies.
Spreading out the declarations, I would have to read and reread the code to
understand the interdependencies of all the variables and their types.

b. In most cases it allows you to see the the type of a variable without needing to go to the top of the function.
For one variable, perhaps, but I can see all of them and because of one variable I have the opportunity to find some fault in the other variables.
I agree that the Postgres code does not help much, because there is no clear organization.
As I do in my codes, where I organize the variables by type and use, aligned, which greatly facilitates the complete view.
What in my opinion would be the ideal path for Postgres, but it involves changing the habits of people who code for years.
What are you proposing to do now, with a high chance of increasing bugs.

3. You can do input checking and assertions at the top of the function, instead of having to put declarations in front of it. This makes it clear which invariants hold for the function.
And you can easily forget to validate some variable you're not seeing is several lines down.

There is a reason why GMs Brian Kernighan and Dennis Ritchie made the C89, less buggy.
IMHO C99 makes it easy to make more mistakes.
One more step and we won't even need to declare a variable.

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: The Free Space Map: Problems and Opportunities
Next
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early