Thread: A little report on informal commit tag usage

A little report on informal commit tag usage

From
Thomas Munro
Date:
On Fri, Jul 12, 2019 at 1:25 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Jul 11, 2019 at 09:44:07AM -0400, Tom Lane wrote:
> > I thought we *did* have an agreement, to wit using
> >
> > Discussion: https://postgr.es/m/<message-id>
> >
> > to link to relevant mail thread(s).  Some people use more tags
> > but that seems inessential to me.
>
> Hehe.  I actually was thinking about advocating for having more of
> them in the commit logs.  I'll just start a new thread about what I
> had in mind.  Perhaps that will lead us nowhere, but let's see.

[Moving to -hackers]

Here are the tags that people have used in the past year, in commit messages:

 763     Author
   9     Authors
 144     Backpatch-through
  55     Backpatch
  14     Bug
  14     Co-authored-by
  27     Diagnosed-By
1593     Discussion
  42     Doc
 284     Reported-By
   5     Review
   8     Reviewed by
 456     Reviewed-By
   7     Security
   9     Tested-By

Other things I've noticed:

* a few people list authors and reviewers in prose in a fairly
mechanical paragraph
* some people put back-patch and bug number information in prose
* a few people list authors and reviewers with full email addresses
* some people repeat tags for multiple values, others make comma separated lists
* some people break long lines of meta-data with newlines
* authors "X and Y" may be an alternative to "X, Y", or imply greater
collaboration

The counts above were produced by case-insensitively sorting and
counting unique stuff that precedes a colon, and then throwing out
those used fewer than three times (these are false matches and typos),
and then throwing out a couple of obvious false matches by hand.
Starting from here:

git log --since 2018-07-14 | \
  grep -E '^ *[A-Z].*: ' | \
  sort -i | \
  sed 's/:.*//' | \
  uniq -ic | \
  grep -v -E '^ *[12] '

-- 
Thomas Munro
https://enterprisedb.com



Re: A little report on informal commit tag usage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Here are the tags that people have used in the past year, in commit messages:

>  763     Author
>    9     Authors
>  144     Backpatch-through
>   55     Backpatch
>   14     Bug
>   14     Co-authored-by
>   27     Diagnosed-By
> 1593     Discussion
>   42     Doc
>  284     Reported-By
>    5     Review
>    8     Reviewed by
>  456     Reviewed-By
>    7     Security
>    9     Tested-By

One small comment on that --- I'm not sure what you meant to count
in respect to the "Doc" item, but I believe there's a fairly widespread
convention to write "doc:" or some variant in the initial summary line
of commits that touch only documentation.  The point here is to let
release-note writers quickly ignore such commits, since we never list
them as release note items.  Bruce and I, being the usual suspects for
release-note writing, are pretty religious about this but other people
do it too.  I see a lot more than 42 such commit messages in the past
year, so not sure what you were counting?

Anyway, that's not a "tag" in the sense I understand you to be using
(otherwise the entries would look something like "Doc: yes" and be at
the end, which is unhelpful for the purpose).  But it's a related sort
of commit-message convention.

            regards, tom lane



Re: A little report on informal commit tag usage

From
Thomas Munro
Date:
On Mon, Jul 15, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> >   42     Doc

> [...]  I see a lot more than 42 such commit messages in the past
> year, so not sure what you were counting?

I would have tried to exclude the first line messages if I'd thought
of that.  But anyway, the reason for the low Doc number is case
sensitivity. I ran that on a Mac and its lame collation support failed
me in the "sort" step (also -i didn't do what I wanted, but that
wasn't the issue).  Trying again on FreeBSD box and explicitly setting
LANG for the benefit of anyone else wanting to run this (see end), and
then removing a few obvious false matches, I now get similar numbers
in most fields but a higher "doc" number:

 767     Author
   9     Authors
 144     Backpatch-through
  55     Backpatch
  14     Bug
  14     Co-authored-by
  27     Diagnosed-by
1599     Discussion
 119     doc
  36     docs
 284     Reported-by
   5     Review
   8     Reviewed by
 460     Reviewed-by
   7     Security
   9     Tested-by

git log --since 2018-07-14 | \
  grep -E '^ +[a-zA-Z].*: ' | \
  LANG=en_US.UTF-8 sort | \
  sed 's/:.*//' | \
  LANG=en_US.UTF-8 uniq -ic | \
  grep -v -E '^ *[12] '

-- 
Thomas Munro
https://enterprisedb.com



Re: A little report on informal commit tag usage

From
Michael Paquier
Date:
On Mon, Jul 15, 2019 at 05:49:26PM +1200, Thomas Munro wrote:
> I would have tried to exclude the first line messages if I'd thought
> of that.  But anyway, the reason for the low Doc number is case
> sensitivity. I ran that on a Mac and its lame collation support failed
> me in the "sort" step (also -i didn't do what I wanted, but that
> wasn't the issue).  Trying again on FreeBSD box and explicitly setting
> LANG for the benefit of anyone else wanting to run this (see end), and
> then removing a few obvious false matches, I now get similar numbers
> in most fields but a higher "doc" number:
>
>  767     Author
>    9     Authors
>  144     Backpatch-through
>   55     Backpatch
>   14     Bug
>   14     Co-authored-by
>   27     Diagnosed-by
> 1599     Discussion
>  119     doc
>   36     docs
>  284     Reported-by
>    5     Review
>    8     Reviewed by
>  460     Reviewed-by
>    7     Security
>    9     Tested-by

Thanks for those numbers.  I am wondering if we could do a bit of
consolidation here and write a page about this stuff on the wiki.
Getting the "Discussion" field most of the time is really cool.

I think that we could get some improvements on the following things.
Here is a set of ideas:
- Avoid "Authors" and replace it with "Author" even if there are
multiple authors.
- Avoid having multiple entries for each one of them?  For example we
have a couple of commits listed listing one "Reviewed-by" field with
one single name.
- Most commit entries to not use the email address with the name of
the author, reviewer, tester or reporter.  Perhaps we should give up
on that?
- Keep "Backpatch-through", not "Backpatch".
- Keep "Reviewed-by", not "Reviewed by" nor "Review".

"Security" is a special case, we append it to all the CVE-related
commits.

That is mainly a matter of taste, but I tend to prefer the following
format, protecting usually the order:
- Diagnosed-by
- Author
- Reviewed-by
- Discussion
- Backpatch-through
- I tend to have only one "Reviewed-by" entry with a list of names,
same for "Author" and "Reported-by".
- Only names, no emails.

As mentioned on different threads, "Discussion" is the only one we had
a strong agreement with.  Could it be possible to consider things like
Author, Reported-by, Reviewed-by or Backpatch-through for example and
extend to that?  The first three ones are useful for parsing the
commit logs.  The fourth one is handy so as there is no need to look
at a full log tree with git log --graph or such, which is something I
do from time to time to guess down to where a fix has been applied (I
tend to avoid git_changelog).
--
Michael

Attachment

Re: A little report on informal commit tag usage

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> As mentioned on different threads, "Discussion" is the only one we had
> a strong agreement with.  Could it be possible to consider things like
> Author, Reported-by, Reviewed-by or Backpatch-through for example and
> extend to that?  The first three ones are useful for parsing the
> commit logs.  The fourth one is handy so as there is no need to look
> at a full log tree with git log --graph or such, which is something I
> do from time to time to guess down to where a fix has been applied (I
> tend to avoid git_changelog).

FWIW, I'm one of the people who prefer prose for this.  The backpatching
bit is a good example of why, because my log messages typically don't
just say "backpatch to 9.6" but something about why that was the cutoff.
For instance in 0ec3e13c6,

    Per gripe from Ken Tanzer.  Back-patch to 9.6.  The issue exists
    further back, but before 9.6 the code looks very different and it
    doesn't actually know whether the "var" name matches anything,
    so I desisted from trying to fix it.

I am in favor of trying to consistently mention that a patch is being
back-patched, rather than expecting people to rely on git metadata
to find that out.  But I don't see that a rigid "Backpatch" tag format
makes anything easier there.  If you need to know that mechanically,
git_changelog is way more reliable.

I'm also skeptical of the argument that machine-parseable Reported-by
and so forth are useful to anybody.  Who'd use them, and for what?
Also, it's not always clear how to apply such a format to a real
situation --- eg, what do you do if the reporter is also the patch
author, or a co-author?  I'm not excited about redundantly entering
somebody's name several times.

            regards, tom lane



Re: A little report on informal commit tag usage

From
Daniel Gustafsson
Date:
> On 16 Jul 2019, at 16:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Michael Paquier <michael@paquier.xyz> writes:
>> As mentioned on different threads, "Discussion" is the only one we had
>> a strong agreement with.  Could it be possible to consider things like
>> Author, Reported-by, Reviewed-by or Backpatch-through for example and
>> extend to that?  The first three ones are useful for parsing the
>> commit logs.  The fourth one is handy so as there is no need to look
>> at a full log tree with git log --graph or such, which is something I
>> do from time to time to guess down to where a fix has been applied (I
>> tend to avoid git_changelog).
>
> FWIW, I'm one of the people who prefer prose for this.  The backpatching
> bit is a good example of why, because my log messages typically don't
> just say "backpatch to 9.6" but something about why that was the cutoff.

Wearing my $work-hat where I regularly perform interesting merges of postgres
releases as an upstream, these detailed commit messages are very valuable and
much appreciated.  The wealth of (human readable) information stored in the
commit logs makes tracking postgres as an upstream quite a lot easier.

> I'm also skeptical of the argument that machine-parseable Reported-by
> and so forth are useful to anybody.  Who'd use them, and for what?

The green gamification dot on people’s Github profiles might light up if the
machine readable format with email address was used (and the user has that
specific email connected to their Github account unless it’s a primary email).
Looking at commit 1c9bb02d8ec1d5b1b319e4fed70439a403c245b1 I can see that for
August 2018 Amit’s Github profile lists “Created 1 commit in 1 repository
postgres/postgres 1 commit”, which is likely from this commit message being
parsed in the mirror.

cheers ./daniel


Re: A little report on informal commit tag usage

From
Andres Freund
Date:
Hi,

On 2019-07-16 10:33:06 -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > As mentioned on different threads, "Discussion" is the only one we had
> > a strong agreement with.  Could it be possible to consider things like
> > Author, Reported-by, Reviewed-by or Backpatch-through for example and
> > extend to that?  The first three ones are useful for parsing the
> > commit logs.  The fourth one is handy so as there is no need to look
> > at a full log tree with git log --graph or such, which is something I
> > do from time to time to guess down to where a fix has been applied (I
> > tend to avoid git_changelog).
> 
> FWIW, I'm one of the people who prefer prose for this.  The backpatching
> bit is a good example of why, because my log messages typically don't
> just say "backpatch to 9.6" but something about why that was the cutoff.

They don't preclude each other though. E.g. it'd be sensible to have both

>     Per gripe from Ken Tanzer.  Back-patch to 9.6.  The issue exists
>     further back, but before 9.6 the code looks very different and it
>     doesn't actually know whether the "var" name matches anything,
>     so I desisted from trying to fix it.

and "Backpatch: 9.6-" or such.


> I am in favor of trying to consistently mention that a patch is being
> back-patched, rather than expecting people to rely on git metadata
> to find that out.  But I don't see that a rigid "Backpatch" tag format
> makes anything easier there.  If you need to know that mechanically,
> git_changelog is way more reliable.

I find it useful to have a quick place to scan in a commit message. It's
a lot quicker to focus on the last few lines with tags, and see a
'Backpatch: 9.6-' than to parse a potentially long commit message. If
I'm then still interested in the commit, I'll then read the commit.

Greetings,

Andres Freund



Re: A little report on informal commit tag usage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> They don't preclude each other though. E.g. it'd be sensible to have both

>> Per gripe from Ken Tanzer.  Back-patch to 9.6.  The issue exists
>> further back, but before 9.6 the code looks very different and it
>> doesn't actually know whether the "var" name matches anything,
>> so I desisted from trying to fix it.

> and "Backpatch: 9.6-" or such.

I've wondered for some time what you think the "-" means in this.

            regards, tom lane



Re: A little report on informal commit tag usage

From
Andres Freund
Date:
Hi,

On 2019-07-16 19:26:59 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > They don't preclude each other though. E.g. it'd be sensible to have both
> 
> >> Per gripe from Ken Tanzer.  Back-patch to 9.6.  The issue exists
> >> further back, but before 9.6 the code looks very different and it
> >> doesn't actually know whether the "var" name matches anything,
> >> so I desisted from trying to fix it.
> 
> > and "Backpatch: 9.6-" or such.
> 
> I've wondered for some time what you think the "-" means in this.

Up to master. Occasionally there's bugs that only need to be fixed in
some back branches etc.

Greetings,

Andres Freund



Re: A little report on informal commit tag usage

From
Alvaro Herrera
Date:
On 2019-Jul-16, Daniel Gustafsson wrote:

> The green gamification dot on people’s Github profiles might light up if the
> machine readable format with email address was used (and the user has that
> specific email connected to their Github account unless it’s a primary email).
> Looking at commit 1c9bb02d8ec1d5b1b319e4fed70439a403c245b1 I can see that for
> August 2018 Amit’s Github profile lists “Created 1 commit in 1 repository
> postgres/postgres 1 commit”, which is likely from this commit message being
> parsed in the mirror.

I specifically use "co-authored-by" (and scanning the grep results, I'm
the only person doing it) because github recognizes it in this way.
However I only feel entitled to use it when the patch has been developed
by me plus some other person(s), which has a bit of a contradictory result:
when I don't touch some submitted patch, I use "Author" since I (the
committer) am not a co-author.  That means github attributes such
patches solely to me :-(

I realize now, however, that in order for this to work I have to include
the email address, not just the name.  I failed to do that at least
once.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A little report on informal commit tag usage

From
Michael Paquier
Date:
On Tue, Jul 16, 2019 at 04:33:07PM -0700, Andres Freund wrote:
> On 2019-07-16 19:26:59 -0400, Tom Lane wrote:
>> I've wondered for some time what you think the "-" means in this.
>
> Up to master. Occasionally there's bugs that only need to be fixed in
> some back branches etc.

Is "-" most common to define a range of branches?  I would have
imagined that "~" makes more sense here.
--
Michael

Attachment