Thread: Emacs vs pg_indent's weird indentation for function declarations
Hello, Using either the .dir-locals.el settings or the "more complete" src/tools/editors/emacs.samples, I have never convinced Emacs to produce multi-line function declarations in .h files that satisfy pg_indent. For example, latch.c has the following definition: int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent *occurred_events, int nevents, uint32 wait_event_info) And now let's see the declaration in latch.h: extern int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent *occurred_events, int nevents, uint32 wait_event_info); (I replaced all tabs with four space here; if you're reading in a monospace font, you'll see that the first arguments off all lines begin in the same column in the definition by not in the declaration.) For a while I've been baffled by that: the first arguments of later lines don't line up with that of the first line, but they're also not in a constant column (it varies from function to function), and it's also not caused by 8-space vs 4-space confusion. It was only when I put those two things next to each other just now in this email that I finally spotted the logic it's using: if you remove "extern int " then the later lines line up with the first argument of the top line. This works for other examples I looked at too. Huh. That's ... annoying. I wish indent wouldn't do that, because it means that my declarations get moved around every time I write code. But if it's not possible to change that for whatever technical or political reason, I wonder if it's possible to teach Emacs to understand that weird rule... -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > For a while I've been baffled by that: the first arguments of later > lines don't line up with that of the first line, but they're also not > in a constant column (it varies from function to function), and it's > also not caused by 8-space vs 4-space confusion. It was only when I > put those two things next to each other just now in this email that I > finally spotted the logic it's using: if you remove "extern int " then > the later lines line up with the first argument of the top line. This > works for other examples I looked at too. Huh. Yeah. I suspect that the underlying cause is that pgindent doesn't really distinguish function declarations from definitions, at least not when it comes time to indent the lines after the first one. > That's ... annoying. I wish indent wouldn't do that, because it means > that my declarations get moved around every time I write code. If you can fix it, I'd vote for accepting the patch. I don't personally have the desire to dig into the indent code that much ... regards, tom lane
On Mon, Jan 28, 2019 at 12:28:30AM -0500, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> That's ... annoying. I wish indent wouldn't do that, because it means >> that my declarations get moved around every time I write code. > > If you can fix it, I'd vote for accepting the patch. I don't personally > have the desire to dig into the indent code that much ... If you could get pgindent smarter in this area, it would be really nice.. -- Michael
Attachment
On Mon, Jan 28, 2019 at 8:08 PM Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jan 28, 2019 at 12:28:30AM -0500, Tom Lane wrote: > > Thomas Munro <thomas.munro@enterprisedb.com> writes: > >> That's ... annoying. I wish indent wouldn't do that, because it means > >> that my declarations get moved around every time I write code. > > > > If you can fix it, I'd vote for accepting the patch. I don't personally > > have the desire to dig into the indent code that much ... > > If you could get pgindent smarter in this area, it would be really > nice.. Ah, it's not indent doing it, it's pgindent's post_indent subroutine trying to correct the effects of the (implied) -psl option, but not doing a complete job of it (it should adjust the indentation lines of later lines if it changes the first line). One idea I had was to tell indent not to do that by using -npsl when processing headers, like in the attached. That fixes all the headers I looked at, though of course it doesn't fix the static function declarations that appear in .c files, so it's not quite the right answer. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Mon, Jan 28, 2019 at 9:48 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, Jan 28, 2019 at 8:08 PM Michael Paquier <michael@paquier.xyz> wrote: > > If you could get pgindent smarter in this area, it would be really > > nice.. > > Ah, it's not indent doing it, it's pgindent's post_indent subroutine > trying to correct the effects of the (implied) -psl option, but not > doing a complete job of it (it should adjust the indentation lines of > later lines if it changes the first line). > > One idea I had was to tell indent not to do that by using -npsl when > processing headers, like in the attached. That fixes all the headers > I looked at, though of course it doesn't fix the static function > declarations that appear in .c files, so it's not quite the right > answer. I tried teaching pgindent's post_indent subroutine to unmangle the multi-line declarations it mangles. That produces correct indentation! But can also produce lines that exceed the column limit we would normally wrap at (of course, because pg_bsd_indent had less whitespace on the left when it made wrapping decisions). Doh. Attached for posterity, but it's useless. So I think pg_bsd_indent itself needs to be fixed. I think I know where the problem is. lexi.c isn't looking far ahead enough to recognise multi-line function declarations: if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 && state->in_parameter_declaration == 0 && state->block_init == 0) { char *tp = buf_ptr; while (tp < buf_end) if (*tp++ == ')' && (*tp == ';' || *tp == ',')) goto not_proc; strncpy(state->procname, token, sizeof state->procname - 1); if (state->in_decl) state->in_parameter_declaration = 1; return (funcname); not_proc:; } That loop that looks for ')' followed by ';' is what causes the lexer to conclude that the "foo" is an "ident" rather than a "funcname", given the following input: extern void foo(int i); But if because buf_end ends at the newline, it can't see the semi-colon and concludes that "foo" is a "funcname" here: extern void foo(int i, int j); That determination causes indent.c's procnames_start_line (-psl) mode to put "extern void" on its own line here and stop thinking of it as a declaration: case funcname: case ident: /* got an identifier or constant */ if (ps.in_decl) { if (type_code == funcname) { ps.in_decl = false; if (procnames_start_line && s_code != e_code) { *e_code = '\0'; dump_line(); } I guess it'd need something smarter than fill_buffer() to see far enough, but simply loading N lines at once wouldn't be enough because you could still happen to be looking at the final line in the buffer; you'd probably need a sliding window. I'm not planning on trying to fix that myself in the short term, but since it annoys me every time I commit anything, I couldn't resist figuring out where it's coming from at least... -- Thomas Munro https://enterprisedb.com
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > I tried teaching pgindent's post_indent subroutine to unmangle the > multi-line declarations it mangles. That produces correct > indentation! But can also produce lines that exceed the column limit > we would normally wrap at (of course, because pg_bsd_indent had less > whitespace on the left when it made wrapping decisions). Doh. > Attached for posterity, but it's useless. > So I think pg_bsd_indent itself needs to be fixed. I think I know > where the problem is. lexi.c isn't looking far ahead enough to > recognise multi-line function declarations: I experimented with fixing this. I was able to get pg_bsd_indent to distinguish multi-line function declarations from definitions, but it turns out that it doesn't help your concern about the lines being too long after re-indenting. Contrary to what you imagine above, it seems pg_bsd_indent will not reflow argument lists, regardless of whether it thinks there needs to be more or less leading whitespace. I'm a bit surprised that -bc doesn't cause that to happen, but it doesn't (and I'm not sure we'd really want to force one-parameter-per-line, anyway). Anyway, the attached hasty-and-undercommented change to pg_bsd_indent allows removal of the "Move prototype names to the same line as return type" hack in pgindent, and we then get prototypes with properly lined-up arguments, but we'll have a lot of places with over-length lines needing manual fixing. Unless somebody wants to find where to fix that in pg_bsd_indent, but I've had my fill of looking at that spaghetti code for today. regards, tom lane diff --git a/indent.h b/indent.h index 0fffd89..1708dbc 100644 --- a/indent.h +++ b/indent.h @@ -41,6 +41,8 @@ void diag2(int, const char *); void diag3(int, const char *, int); void diag4(int, const char *, int, int); void dump_line(void); +int lookahead(void); +void lookahead_reset(void); void fill_buffer(void); void parse(int); void pr_comment(void); diff --git a/io.c b/io.c index df11094..8d13a52 100644 --- a/io.c +++ b/io.c @@ -51,6 +51,13 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93"; int comment_open; static int paren_target; + +static char *lookahead_buf; /* malloc'd buffer, or NULL initially */ +static char *lookahead_buf_end; /* end+1 of allocated space */ +static char *lookahead_start; /* => next char for fill_buffer() to fetch */ +static char *lookahead_ptr; /* => next char for lookahead() to fetch */ +static char *lookahead_end; /* last+1 valid char in lookahead_buf */ + static int pad_output(int current, int target); void @@ -252,6 +259,58 @@ compute_label_target(void) : ps.ind_size * (ps.ind_level - label_offset) + 1; } +/* + * Read data ahead of what has been collected into in_buffer. + * + * Successive calls get further and further ahead, until we hit EOF. + * Call lookahead_reset to rescan from just beyond in_buffer. + */ +int +lookahead(void) +{ + while (lookahead_ptr >= lookahead_end) { + int i = getc(input); + + if (i == EOF) + return i; + if (i == '\0') + continue; /* fill_buffer drops nulls, so do we */ + + if (lookahead_end >= lookahead_buf_end) { + /* Need to allocate or enlarge lookahead_buf */ + char *new_buf; + size_t req; + + if (lookahead_buf == NULL) { + req = 64; + new_buf = malloc(req); + } else { + req = (lookahead_buf_end - lookahead_buf) * 2; + new_buf = realloc(lookahead_buf, req); + } + if (new_buf == NULL) + errx(1, "too much lookahead required"); + lookahead_start = new_buf + (lookahead_start - lookahead_buf); + lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf); + lookahead_end = new_buf + (lookahead_end - lookahead_buf); + lookahead_buf = new_buf; + lookahead_buf_end = new_buf + req; + } + + *lookahead_end++ = i; + } + return (unsigned char) *lookahead_ptr++; +} + +/* + * Reset so that lookahead() will again scan from just beyond what's in + * in_buffer. + */ +void +lookahead_reset(void) +{ + lookahead_ptr = lookahead_start; +} /* * Copyright (C) 1976 by the Board of Trustees of the University of Illinois @@ -293,11 +352,16 @@ fill_buffer(void) p = in_buffer + offset; in_buffer_limit = in_buffer + size - 2; } - if ((i = getc(f)) == EOF) { - *p++ = ' '; - *p++ = '\n'; - had_eof = true; - break; + if (lookahead_start < lookahead_end) { + i = (unsigned char) *lookahead_start++; + } else { + lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf; + if ((i = getc(f)) == EOF) { + *p++ = ' '; + *p++ = '\n'; + had_eof = true; + break; + } } if (i != '\0') *p++ = i; diff --git a/lexi.c b/lexi.c index 3c7bfef..e637e1a 100644 --- a/lexi.c +++ b/lexi.c @@ -148,6 +148,39 @@ strcmp_type(const void *e1, const void *e2) return (strcmp(e1, *(const char * const *)e2)); } +/* + * Scan over a function argument declaration list, then see if it is + * followed by ';' or ',' indicating that it's just a prototype. + * + * We do not detect comments, so you can fool this by putting unbalanced + * parens inside a comment within the argument list. So don't do that. + */ +static int +is_prototype(char *tp) +{ + int paren_depth = 0; + + lookahead_reset(); + for (;;) { + int c; + + if (tp < buf_end) + c = *tp++; + else { + c = lookahead(); + if (c == EOF) + break; + } + if (c == '(') + paren_depth++; + else if (c == ')') + paren_depth--; + else if (paren_depth == 0 && !isspace((unsigned char) c)) + return (c == ';' || c == ','); + } + return false; +} + int lexi(struct parser_state *state) { @@ -348,15 +381,12 @@ lexi(struct parser_state *state) } /* end of if (found_it) */ if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 && state->in_parameter_declaration == 0 && state->block_init == 0) { - char *tp = buf_ptr; - while (tp < buf_end) - if (*tp++ == ')' && (*tp == ';' || *tp == ',')) - goto not_proc; + if (!is_prototype(buf_ptr)) { strncpy(state->procname, token, sizeof state->procname - 1); if (state->in_decl) state->in_parameter_declaration = 1; return (funcname); - not_proc:; + } } /* * The following hack attempts to guess whether or not the current
I wrote: > I experimented with fixing this. I was able to get pg_bsd_indent to > distinguish multi-line function declarations from definitions, but it > turns out that it doesn't help your concern about the lines being too > long after re-indenting. Contrary to what you imagine above, it seems > pg_bsd_indent will not reflow argument lists, regardless of whether it > thinks there needs to be more or less leading whitespace. Actually, now that I think about it, pgindent seldom revisits line-break decisions in code anyway; it's only aggressive about reflowing comments. So maybe we shouldn't be expecting it to fix this. Also, looking at sample results (attached), it seems like we don't have such a large problem as one might guess. It looks like people have mostly formatted declarations to work with this indentation already, either because their editors did it automatically, or because they were thinking that this might get fixed someday. (I know I've often had that in the back of my mind.) There are some places in the attached diff that I might take the trouble to clean up manually, but not that many. I found out that my initial draft of a multi-line lookahead function was not nearly smart enough to survive contact with the PG source corpus, but after rejiggering it to cope with comments and attributes, it seems to do quite well. A small problem with the "rejiggering" is that it now makes the wrong choice for K&R-style function definitions, causing them to be weirdly indented. For our purposes, that's a non-problem so I'm not excited about trying to make it smart enough to recognize those. We do have a couple of amazingly old and crufty K&R-style functions in src/port/, though, so probably we'd wish to fix those. Attached is working draft of pg_bsd_indent changes (still sans comments) as well as a patch showing the difference between current pgindent results on HEAD and the results of this version. I think there's no question that this is an improvement. regards, tom lane diff --git a/indent.h b/indent.h index 0fffd89..1708dbc 100644 --- a/indent.h +++ b/indent.h @@ -41,6 +41,8 @@ void diag2(int, const char *); void diag3(int, const char *, int); void diag4(int, const char *, int, int); void dump_line(void); +int lookahead(void); +void lookahead_reset(void); void fill_buffer(void); void parse(int); void pr_comment(void); diff --git a/io.c b/io.c index df11094..8d13a52 100644 --- a/io.c +++ b/io.c @@ -51,6 +51,13 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93"; int comment_open; static int paren_target; + +static char *lookahead_buf; /* malloc'd buffer, or NULL initially */ +static char *lookahead_buf_end; /* end+1 of allocated space */ +static char *lookahead_start; /* => next char for fill_buffer() to fetch */ +static char *lookahead_ptr; /* => next char for lookahead() to fetch */ +static char *lookahead_end; /* last+1 valid char in lookahead_buf */ + static int pad_output(int current, int target); void @@ -252,6 +259,58 @@ compute_label_target(void) : ps.ind_size * (ps.ind_level - label_offset) + 1; } +/* + * Read data ahead of what has been collected into in_buffer. + * + * Successive calls get further and further ahead, until we hit EOF. + * Call lookahead_reset to rescan from just beyond in_buffer. + */ +int +lookahead(void) +{ + while (lookahead_ptr >= lookahead_end) { + int i = getc(input); + + if (i == EOF) + return i; + if (i == '\0') + continue; /* fill_buffer drops nulls, so do we */ + + if (lookahead_end >= lookahead_buf_end) { + /* Need to allocate or enlarge lookahead_buf */ + char *new_buf; + size_t req; + + if (lookahead_buf == NULL) { + req = 64; + new_buf = malloc(req); + } else { + req = (lookahead_buf_end - lookahead_buf) * 2; + new_buf = realloc(lookahead_buf, req); + } + if (new_buf == NULL) + errx(1, "too much lookahead required"); + lookahead_start = new_buf + (lookahead_start - lookahead_buf); + lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf); + lookahead_end = new_buf + (lookahead_end - lookahead_buf); + lookahead_buf = new_buf; + lookahead_buf_end = new_buf + req; + } + + *lookahead_end++ = i; + } + return (unsigned char) *lookahead_ptr++; +} + +/* + * Reset so that lookahead() will again scan from just beyond what's in + * in_buffer. + */ +void +lookahead_reset(void) +{ + lookahead_ptr = lookahead_start; +} /* * Copyright (C) 1976 by the Board of Trustees of the University of Illinois @@ -293,11 +352,16 @@ fill_buffer(void) p = in_buffer + offset; in_buffer_limit = in_buffer + size - 2; } - if ((i = getc(f)) == EOF) { - *p++ = ' '; - *p++ = '\n'; - had_eof = true; - break; + if (lookahead_start < lookahead_end) { + i = (unsigned char) *lookahead_start++; + } else { + lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf; + if ((i = getc(f)) == EOF) { + *p++ = ' '; + *p++ = '\n'; + had_eof = true; + break; + } } if (i != '\0') *p++ = i; diff --git a/lexi.c b/lexi.c index 3c7bfef..df71a20 100644 --- a/lexi.c +++ b/lexi.c @@ -148,6 +148,54 @@ strcmp_type(const void *e1, const void *e2) return (strcmp(e1, *(const char * const *)e2)); } +/* + * Scan over a function argument declaration list, then see if it is + * followed by '{', indicating that it's a function definition. + * If it's followed by ';' or ',', it's a prototype. Otherwise keep + * scanning, because there might be whitespace, comments, or attribute + * declarations before we get to the telltale punctuation. + * + * Note that this will make the wrong decision for a K&R-style function + * definition. Too bad. + */ +static int +is_func_definition(char *tp) +{ + int paren_depth = 0; + int in_comment = false; + int lastc = 0; + + lookahead_reset(); + for (;;) { + int c; + + if (tp < buf_end) + c = *tp++; + else { + c = lookahead(); + if (c == EOF) + break; + } + if (in_comment) { + if (lastc == '*' && c == '/') + in_comment = false; + } else if (lastc == '/' && c == '*') + in_comment = true; + else if (c == '(') + paren_depth++; + else if (c == ')') { + paren_depth--; + if (paren_depth < 0) + return false; + } else if (paren_depth == 0 && c == '{') + return true; + else if (paren_depth == 0 && (c == ';' || c == ',')) + return false; + lastc = c; + } + return false; +} + int lexi(struct parser_state *state) { @@ -348,15 +396,12 @@ lexi(struct parser_state *state) } /* end of if (found_it) */ if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 && state->in_parameter_declaration == 0 && state->block_init == 0) { - char *tp = buf_ptr; - while (tp < buf_end) - if (*tp++ == ')' && (*tp == ';' || *tp == ',')) - goto not_proc; + if (is_func_definition(buf_ptr)) { strncpy(state->procname, token, sizeof state->procname - 1); if (state->in_decl) state->in_parameter_declaration = 1; return (funcname); - not_proc:; + } } /* * The following hack attempts to guess whether or not the current diff --git a/tests/declarations.0.stdout b/tests/declarations.0.stdout index e97e447..ec63596 100644 --- a/tests/declarations.0.stdout +++ b/tests/declarations.0.stdout @@ -64,10 +64,10 @@ static int ald_shutingdown = 0; struct thread *ald_thread; static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; + do_execve(td, args, mac_p) +struct thread *td; +struct image_args *args; +struct mac *mac_p; { } diff --git a/tests/list_head.0.stdout b/tests/list_head.0.stdout index b6f0762..a8f985f 100644 --- a/tests/list_head.0.stdout +++ b/tests/list_head.0.stdout @@ -1,10 +1,10 @@ /* $FreeBSD$ */ /* See r309380 */ static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; + do_execve(td, args, mac_p) +struct thread *td; +struct image_args *args; +struct mac *mac_p; { }
Attachment
On Thu, May 16, 2019 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > I experimented with fixing this. I was able to get pg_bsd_indent to > > distinguish multi-line function declarations from definitions, but it > > turns out that it doesn't help your concern about the lines being too > > long after re-indenting. Contrary to what you imagine above, it seems > > pg_bsd_indent will not reflow argument lists, regardless of whether it > > thinks there needs to be more or less leading whitespace. > > Actually, now that I think about it, pgindent seldom revisits line-break > decisions in code anyway; it's only aggressive about reflowing comments. > So maybe we shouldn't be expecting it to fix this. Ahh, so my quick and dirty perl change was probably OK then. Sorry I didn't realise that before, but this certainly a better way, fixing the right thing. > A small problem with the "rejiggering" is that it now makes the wrong > choice for K&R-style function definitions, causing them to be weirdly > indented. For our purposes, that's a non-problem so I'm not excited > about trying to make it smart enough to recognize those. We do have > a couple of amazingly old and crufty K&R-style functions in src/port/, > though, so probably we'd wish to fix those. What kid of fork is pg_bsd_indent... do we care about upstreaming changes? I guess someone would need to deal with that case eventually if so. > Attached is working draft of pg_bsd_indent changes (still sans comments) > as well as a patch showing the difference between current pgindent > results on HEAD and the results of this version. I think there's no > question that this is an improvement. +1 Thanks for looking into it! -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Thu, May 16, 2019 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A small problem with the "rejiggering" is that it now makes the wrong >> choice for K&R-style function definitions, causing them to be weirdly >> indented. For our purposes, that's a non-problem so I'm not excited >> about trying to make it smart enough to recognize those. We do have >> a couple of amazingly old and crufty K&R-style functions in src/port/, >> though, so probably we'd wish to fix those. > What kid of fork is pg_bsd_indent... do we care about upstreaming > changes? I guess someone would need to deal with that case eventually > if so. We regard the FreeBSD copy as upstream, and I think we're mostly in sync with that but only mostly. So it would come down to whether the FreeBSD maintainer is worried about K&R mode and what he wants to do about that. Piotr, that's still you isn't it? regards, tom lane
I wrote: >>> A small problem with the "rejiggering" is that it now makes the wrong >>> choice for K&R-style function definitions, causing them to be weirdly >>> indented. For our purposes, that's a non-problem so I'm not excited >>> about trying to make it smart enough to recognize those. We do have >>> a couple of amazingly old and crufty K&R-style functions in src/port/, >>> though, so probably we'd wish to fix those. To be concrete about that: the existing behavior when trying to decide whether "foo(" starts a function declaration or a function definition is to scan to the matching right paren[1] and then, if the immediately next character is ';' or ',', it's a declaration; else it's a definition. This fails on decls with attributes, such as extern void pg_re_throw(void) __attribute__((noreturn)); My patch changes the rule to be "scan until we see a ';' ',' or '{' that's not within parens; then it's a definition if that character is '{', otherwise a declaration". So it gets the right answer for the above, but the wrong answer for int isinf(x) double x; { It doesn't really seem practical to me to make the lookahead function smart enough to tell the difference between attributes and K&R-style parameter declarations. What I'm thinking of doing to have an upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr', to indicate whether the user is more worried about K&R function declarations than she is about function attributes. The default for this switch could be debated perhaps, but I'd just stick an explicit -nkr into pgindent's invocation, so we wouldn't care too much. (This would also ensure that pgindent would fail if you try to use a too-old copy of pg_bsd_indent...) regards, tom lane [1] Um, well, actually the existing behavior is to scan to the *first* right paren. But upgrading it to count parens correctly seems like an unobjectionable improvement in any case.
On 17/05/2019 16.48, Tom Lane wrote: > I wrote: >>>> A small problem with the "rejiggering" is that it now makes the wrong >>>> choice for K&R-style function definitions, causing them to be weirdly >>>> indented. For our purposes, that's a non-problem so I'm not excited >>>> about trying to make it smart enough to recognize those. We do have >>>> a couple of amazingly old and crufty K&R-style functions in src/port/, >>>> though, so probably we'd wish to fix those. > It doesn't really seem practical to me to make the lookahead function > smart enough to tell the difference between attributes and K&R-style > parameter declarations. What I'm thinking of doing to have an > upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr', > to indicate whether the user is more worried about K&R function > declarations than she is about function attributes. I think it's safe to assume that upstream can drop support for K&R-style parameters altogether.
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > On 17/05/2019 16.48, Tom Lane wrote: >> It doesn't really seem practical to me to make the lookahead function >> smart enough to tell the difference between attributes and K&R-style >> parameter declarations. What I'm thinking of doing to have an >> upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr', >> to indicate whether the user is more worried about K&R function >> declarations than she is about function attributes. > I think it's safe to assume that upstream can drop support for K&R-style > parameters altogether. Cool. I already created the switch, but maybe we could have it default to -nkr? regards, tom lane
On 19/05/2019 19.27, Tom Lane wrote: > Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: >> On 17/05/2019 16.48, Tom Lane wrote: >>> It doesn't really seem practical to me to make the lookahead function >>> smart enough to tell the difference between attributes and K&R-style >>> parameter declarations. What I'm thinking of doing to have an >>> upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr', >>> to indicate whether the user is more worried about K&R function >>> declarations than she is about function attributes. > >> I think it's safe to assume that upstream can drop support for K&R-style >> parameters altogether. > > Cool. I already created the switch, but maybe we could have it > default to -nkr? Sorry, but GNU indent already uses -kr for something else and I would like FreeBSD indent have something like that under the same name. Besides, indent has too many options and this one doesn't look like particularly desired by anyone. It's possible that someone will complain some day, but I don't think we should assume that they'll do or that they're more important than the other users who benefit from your change being the default behavior and no additional options.
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > On 19/05/2019 19.27, Tom Lane wrote: >> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: >>> I think it's safe to assume that upstream can drop support for K&R-style >>> parameters altogether. >> Cool. I already created the switch, but maybe we could have it >> default to -nkr? > Sorry, but GNU indent already uses -kr for something else and I would > like FreeBSD indent have something like that under the same name. > Besides, indent has too many options and this one doesn't look like > particularly desired by anyone. It's possible that someone will complain > some day, but I don't think we should assume that they'll do or that > they're more important than the other users who benefit from your change > being the default behavior and no additional options. Huh. OK, I'll rip the switch back out again. regards, tom lane
I wrote: > Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: >> Sorry, but GNU indent already uses -kr for something else and I would >> like FreeBSD indent have something like that under the same name. >> Besides, indent has too many options and this one doesn't look like >> particularly desired by anyone. It's possible that someone will complain >> some day, but I don't think we should assume that they'll do or that >> they're more important than the other users who benefit from your change >> being the default behavior and no additional options. > Huh. OK, I'll rip the switch back out again. Here's a proposed patch for you. regards, tom lane diff --git a/args.c b/args.c index f79de75..1850542 100644 --- a/args.c +++ b/args.c @@ -55,7 +55,7 @@ static char sccsid[] = "@(#)args.c 8.1 (Berkeley) 6/6/93"; #include "indent_globs.h" #include "indent.h" -#define INDENT_VERSION "2.0" +#define INDENT_VERSION "2.1" /* profile types */ #define PRO_SPECIAL 1 /* special case */ diff --git a/indent.h b/indent.h index 0fffd89..1708dbc 100644 --- a/indent.h +++ b/indent.h @@ -41,6 +41,8 @@ void diag2(int, const char *); void diag3(int, const char *, int); void diag4(int, const char *, int, int); void dump_line(void); +int lookahead(void); +void lookahead_reset(void); void fill_buffer(void); void parse(int); void pr_comment(void); diff --git a/io.c b/io.c index df11094..fbaa5dd 100644 --- a/io.c +++ b/io.c @@ -51,6 +51,14 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93"; int comment_open; static int paren_target; + +static char *lookahead_buf; /* malloc'd buffer, or NULL initially */ +static char *lookahead_buf_end; /* end+1 of allocated space */ +static char *lookahead_start; /* => next char for fill_buffer() to fetch */ +static char *lookahead_ptr; /* => next char for lookahead() to fetch */ +static char *lookahead_end; /* last+1 valid char in lookahead_buf */ +static char *lookahead_bp_save; /* lookahead position in bp_save, if any */ + static int pad_output(int current, int target); void @@ -252,6 +260,73 @@ compute_label_target(void) : ps.ind_size * (ps.ind_level - label_offset) + 1; } +/* + * Read data ahead of what has been collected into in_buffer. + * + * Successive calls get further and further ahead, until we hit EOF. + * Call lookahead_reset() to rescan from just beyond in_buffer. + * + * Lookahead is automatically reset whenever fill_buffer() reads beyond + * the lookahead buffer, i.e., you can't use this for "look behind". + * + * The standard pattern for potentially multi-line lookahead is to call + * lookahead_reset(), then enter a loop that scans forward from buf_ptr + * to buf_end, then (if necessary) calls lookahead() to read additional + * characters from beyond the end of the current line. + */ +int +lookahead(void) +{ + /* First read whatever's in bp_save area */ + if (lookahead_bp_save != NULL && lookahead_bp_save < be_save) + return (unsigned char) *lookahead_bp_save++; + /* Else, we have to examine and probably fill the main lookahead buffer */ + while (lookahead_ptr >= lookahead_end) { + int i = getc(input); + + if (i == EOF) + return i; + if (i == '\0') + continue; /* fill_buffer drops nulls, and so do we */ + + if (lookahead_end >= lookahead_buf_end) { + /* Need to allocate or enlarge lookahead_buf */ + char *new_buf; + size_t req; + + if (lookahead_buf == NULL) { + req = 64; + new_buf = malloc(req); + } else { + req = (lookahead_buf_end - lookahead_buf) * 2; + new_buf = realloc(lookahead_buf, req); + } + if (new_buf == NULL) + errx(1, "too much lookahead required"); + lookahead_start = new_buf + (lookahead_start - lookahead_buf); + lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf); + lookahead_end = new_buf + (lookahead_end - lookahead_buf); + lookahead_buf = new_buf; + lookahead_buf_end = new_buf + req; + } + + *lookahead_end++ = i; + } + return (unsigned char) *lookahead_ptr++; +} + +/* + * Reset so that lookahead() will again scan from just beyond what's in + * in_buffer. + */ +void +lookahead_reset(void) +{ + /* Reset the main lookahead buffer */ + lookahead_ptr = lookahead_start; + /* If bp_save isn't NULL, we need to scan that first */ + lookahead_bp_save = bp_save; +} /* * Copyright (C) 1976 by the Board of Trustees of the University of Illinois @@ -261,7 +336,9 @@ compute_label_target(void) * * NAME: fill_buffer * - * FUNCTION: Reads one block of input into input_buffer + * FUNCTION: Reads one line of input into in_buffer, + * sets up buf_ptr and buf_end to point to the line's start and end+1. + * (Note that the buffer does not get null-terminated.) * * HISTORY: initial coding November 1976 D A Willcox of CAC 1/7/77 A * Willcox of CAC Added check for switch back to partly full input @@ -279,6 +356,7 @@ fill_buffer(void) buf_ptr = bp_save; /* do not read anything, just switch buffers */ buf_end = be_save; bp_save = be_save = NULL; + lookahead_bp_save = NULL; if (buf_ptr < buf_end) return; /* only return if there is really something in * this buffer */ @@ -293,16 +371,21 @@ fill_buffer(void) p = in_buffer + offset; in_buffer_limit = in_buffer + size - 2; } - if ((i = getc(f)) == EOF) { + if (lookahead_start < lookahead_end) { + i = (unsigned char) *lookahead_start++; + } else { + lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf; + if ((i = getc(f)) == EOF) { *p++ = ' '; *p++ = '\n'; had_eof = true; break; + } } if (i != '\0') *p++ = i; if (i == '\n') - break; + break; } buf_ptr = in_buffer; buf_end = p; diff --git a/lexi.c b/lexi.c index 3c7bfef..d43723c 100644 --- a/lexi.c +++ b/lexi.c @@ -148,6 +148,74 @@ strcmp_type(const void *e1, const void *e2) return (strcmp(e1, *(const char * const *)e2)); } +/* + * Decide whether "foo(..." is a function definition or declaration. + * + * At call, we are looking at the '('. Look ahead to find the first + * '{', ';' or ',' that is not within parentheses or comments; then + * it's a definition if we found '{', otherwise a declaration. + * Note that this rule is fooled by K&R-style parameter declarations, + * but telling the difference between those and function attributes + * seems like more trouble than it's worth. This code could also be + * fooled by mismatched parens or apparent comment starts within string + * literals, but that seems unlikely in the context it's used in. + */ +static int +is_func_definition(char *tp) +{ + int paren_depth = 0; + int in_comment = false; + int in_slash_comment = false; + int lastc = 0; + + /* We may need to look past the end of the current buffer. */ + lookahead_reset(); + for (;;) { + int c; + + /* Fetch next character. */ + if (tp < buf_end) + c = *tp++; + else { + c = lookahead(); + if (c == EOF) + break; + } + /* Handle comments. */ + if (in_comment) { + if (lastc == '*' && c == '/') + in_comment = false; + } else if (lastc == '/' && c == '*' && !in_slash_comment) + in_comment = true; + else if (in_slash_comment) { + if (c == '\n') + in_slash_comment = false; + } else if (lastc == '/' && c == '/') + in_slash_comment = true; + /* Count nested parens properly. */ + else if (c == '(') + paren_depth++; + else if (c == ')') { + paren_depth--; + /* + * If we find unbalanced parens, we must have started inside a + * declaration. + */ + if (paren_depth < 0) + return false; + } else if (paren_depth == 0) { + /* We are outside any parentheses or comments. */ + if (c == '{') + return true; + else if (c == ';' || c == ',') + return false; + } + lastc = c; + } + /* Hit EOF --- for lack of anything better, assume "not a definition". */ + return false; +} + int lexi(struct parser_state *state) { @@ -348,15 +416,12 @@ lexi(struct parser_state *state) } /* end of if (found_it) */ if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 && state->in_parameter_declaration == 0 && state->block_init == 0) { - char *tp = buf_ptr; - while (tp < buf_end) - if (*tp++ == ')' && (*tp == ';' || *tp == ',')) - goto not_proc; - strncpy(state->procname, token, sizeof state->procname - 1); - if (state->in_decl) - state->in_parameter_declaration = 1; - return (funcname); - not_proc:; + if (is_func_definition(buf_ptr)) { + strncpy(state->procname, token, sizeof state->procname - 1); + if (state->in_decl) + state->in_parameter_declaration = 1; + return (funcname); + } } /* * The following hack attempts to guess whether or not the current diff --git a/tests/declarations.0 b/tests/declarations.0 index 6d668b1..8419494 100644 --- a/tests/declarations.0 +++ b/tests/declarations.0 @@ -70,10 +70,10 @@ static int ald_shutingdown = 0; struct thread *ald_thread; static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; +do_execve( +struct thread *td, +struct image_args *args, +struct mac *mac_p) { } diff --git a/tests/declarations.0.stdout b/tests/declarations.0.stdout index e97e447..ab5a447 100644 --- a/tests/declarations.0.stdout +++ b/tests/declarations.0.stdout @@ -64,10 +64,10 @@ static int ald_shutingdown = 0; struct thread *ald_thread; static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; +do_execve( + struct thread *td, + struct image_args *args, + struct mac *mac_p) { } diff --git a/tests/list_head.0 b/tests/list_head.0 index 3a186ca..35874eb 100644 --- a/tests/list_head.0 +++ b/tests/list_head.0 @@ -1,10 +1,9 @@ /* $FreeBSD$ */ /* See r309380 */ static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; +do_execve(struct thread *td, +struct image_args *args, +struct mac *mac_p) { } diff --git a/tests/list_head.0.stdout b/tests/list_head.0.stdout index b6f0762..2ebcca5 100644 --- a/tests/list_head.0.stdout +++ b/tests/list_head.0.stdout @@ -1,10 +1,9 @@ /* $FreeBSD$ */ /* See r309380 */ static int -do_execve(td, args, mac_p) - struct thread *td; - struct image_args *args; - struct mac *mac_p; +do_execve(struct thread *td, + struct image_args *args, + struct mac *mac_p) { }