Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)
Date
Msg-id 15819.1495145176@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventivemaintenance in advance of pgindent run.)  (Piotr Stefaniak <postgres@piotr-stefaniak.me>)
Responses Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)
Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventivemaintenance in advance of pgindent run.)
List pgsql-hackers
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> On 2017-05-17 23:46, Tom Lane wrote:
>> ... Much of what
>> I'm seeing with this version is randomly different decisions about
>> how far to indent comments

> pgindent doesn't set the -c indent option ("The column in which comments
> on code start."), so indent uses the default value of 33 (meaning column
> 32). If the code pushes the comment further than column 32, indent only
> places a single tab between the two just to separate them.

Well, actually what it does is to push the comment to what it thinks is
the next tab stop.  So the core issue here is that the comments get
formatted differently for -ts4 than -ts8.  I think that's arguably a bug;
the width of tabs should only affect how whitespace is stored, not
formatting decisions.  Don't suppose you'd like to introduce a separate
parameter that defines the extra-indentation step for comments?

> This, given 4-column tabs, should result in placing the comment on
> bitSize[INDEX_MAX_KEYS]; from your example onto column 44 - which the
> newer version of indent does (if you tell it -ts4), unlike the older
> one. I think that this is an improvement.

It may or may not be an improvement, but right now what I want is to see
what this version of indent does differently, with as few extraneous
changes as possible.  We can argue later about whether we're willing to
accept gratuitous comment reformatting, but when one can't even find the
positive changes in amongst the noise, the chances of getting this accepted
are not good.

> I don't know how to avoid the improvement. Try removing -ts4 as well as
> putting back detab+entab.

I tried that but it did not produce as good a match to the old results as
what I'd previously arrived at by trial and error, which was to hack
pr_comment() like this:

@@ -148,7 +151,9 @@ pr_comment(void)        else        ps.com_col = ps.decl_on_line || ps.ind_level == 0            ?
ps.decl_com_ind: ps.com_ind;
 
-        if (ps.com_col <= target_col)
+        if (ps.com_col < target_col)
+        ps.com_col = 8 * (1 + (target_col - 1) / 8) + 1;
+        else if (ps.com_col == target_col)        ps.com_col = tabsize * (1 + (target_col - 1) / tabsize) + 1;
if(ps.com_col + 24 > adj_max_col)        adj_max_col = ps.com_col + 24;
 

I'm not really sure why the old behavior seems to be to move only 4 spaces
when right at the boundary, but there you have it.

I also found that there was extra spacing getting inserted for some cases
like
case afairlylonglabel:    /* comment */

which I eventually tracked down to the fact that this bit:
    /*     * turn everything so far into a label     */    {    int len = e_code - s_code;
    CHECK_SIZE_LAB(len + 3);    memcpy(e_lab, s_code, len);    e_lab += len;    *e_lab++ = ':';    *e_lab++ = ' ';
*e_lab= '\0';    e_code = s_code;    }
 

is inserting an extra space into the "lab" string, causing pr_comment()
to think that the label extends one character to the right of where
it really does, so that it moves the comment over when it need not.
I am not sure why it's like that, but compensating for it in pr_comment()
like this improved matters:

@@ -137,8 +136,12 @@ pr_comment(void)        target_col = count_spaces(compute_code_target(), s_code);        else {
   target_col = 1;
 
-        if (s_lab != e_lab)
+        if (s_lab != e_lab) {            target_col = count_spaces(compute_label_target(), s_lab);
+            /* ignore any trailing space in lab for this purpose */
+            if (e_lab[-1] == ' ')
+            target_col--;
+        }        }        if (s_lab != e_lab && s_lab[1] == 'e' &&        (strncmp(s_lab, "#endif", 6) == 0 ||

(I see that the extra space after colon is inserted by the old version
of indent too, which makes it even less clear why the boundary-case
behavior is like this.  I have a feeling that this is hacking things
at the wrong place.)

That got me to a point where there was little enough noise that I could
start to see the real changes, and I soon noticed that there was a fair
amount of apparently buggy behavior, like this change:

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index 78d71ab..630bacc 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -33,8 +33,8 @@ PG_FUNCTION_INFO_V1(pg_prewarm);typedef enum{    PREWARM_PREFETCH,
-    PREWARM_READ,
-    PREWARM_BUFFER
+        PREWARM_READ,
+        PREWARM_BUFFER} PrewarmType;static char blockbuffer[BLCKSZ];

Curiously, there are other enum declarations that don't get the phony
extra indentation.  I traced through it a bit and eventually found that
the difference between OK and not OK is that the declarations that don't
get messed up look like "typedef enum enumlabel ...", ie the problem
with this one is there's no extra identifier after "enum".  The proximate
cause of that is this line in indent.c:
        ps.in_decl = ps.decl_on_line = ps.last_token != type_def;

which I see used to be just
    ps.in_decl = ps.decl_on_line = true;

in old indent.  This results in leaving ps.in_decl in the wrong state
when there's no enum label.  I suspect that this change is also accounting
for a lot of noise changes I see in struct decls that lack a label after
"typedef struct".  I don't know what it is you had in mind to fix with
this change, but surely there are a lot of other cases where the distance
back to the "typedef" keyword is variable?  (I tried reverting this
change, and it fixed the enum problem but introduced other issues.)

I'm also noticing some random-looking changes like this one:

diff --git a/contrib/isn/isn.c b/contrib/isn/isn.c
index c3c10e1..c60369e 100644
--- a/contrib/isn/isn.c
+++ b/contrib/isn/isn.c
@@ -531,7 +531,7 @@ static boolean2string(ean13 ean, bool errorOK, char *result, bool shortType){    const char
*(*TABLE)[2];
-    const unsigned (*TABLE_index)[2];
+    const unsigned(*TABLE_index)[2];    enum isn_type type = INVALID;    char       *aux;

which is surely not an improvement; but I ran out of energy before
investigating that.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] different column orders in regression test database