Thread: Maintaining a list of pgindent commits for "git blame" to ignore
Recent versions of git are capable of maintaining a list of commits for "git blame" to ignore: https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame I tried this out myself, using my own list of pgindent commits. It works very well -- much better than what you get when you ask git to heuristically ignore commits based on whitespace-only line changes. This is not surprising, in a way; I don't actually want to avoid whitespace. I just want to ignore pgindent commits. Note that there are still a small number of pgindent line changes, even with this. That's because sometimes it's unavoidable -- some "substantively distinct lines of code" are actually created by pgindent. But these all seem to be <CR> lines that are only shown because there is legitimately no more appropriate commit to attribute the line to. This seems like the ideal behavior to me. I propose that we (I suppose I actually mean Bruce) start maintaining our own file for this in git. It can be configured to run without any extra steps via a once-off "git config blame.ignoreRevsFile .git-blame-ignore-revs". It would only need to be updated whenever Bruce or Tom runs pgindent. It doesn't matter if this misses one or two smaller pgindent runs, it seems. Provided the really huge ones are in the file, everything works very well. -- Peter Geoghegan
On Thu, Mar 18, 2021 at 01:46:49PM -0700, Peter Geoghegan wrote: > Recent versions of git are capable of maintaining a list of commits > for "git blame" to ignore: > > https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame > > I tried this out myself, using my own list of pgindent commits. It > works very well -- much better than what you get when you ask git to > heuristically ignore commits based on whitespace-only line changes. > This is not surprising, in a way; I don't actually want to avoid > whitespace. I just want to ignore pgindent commits. > > Note that there are still a small number of pgindent line changes, > even with this. That's because sometimes it's unavoidable -- some > "substantively distinct lines of code" are actually created by > pgindent. But these all seem to be <CR> lines that are only shown > because there is legitimately no more appropriate commit to attribute > the line to. This seems like the ideal behavior to me. > > I propose that we (I suppose I actually mean Bruce) start maintaining Actually, Tom Lane runs pgindent usually now. I do the copyright change, but I don't think we would ignore those since the single-line change is probably something we would want to blame. > our own file for this in git. It can be configured to run without any > extra steps via a once-off "git config blame.ignoreRevsFile > .git-blame-ignore-revs". It would only need to be updated whenever > Bruce or Tom runs pgindent. > > It doesn't matter if this misses one or two smaller pgindent runs, it > seems. Provided the really huge ones are in the file, everything works > very well. It would certainly be very easy to pull pgindent commits out of git log and add them. I do wish we could cause everyone to honor that file, but it seems each user has to configure their repository to honor it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 2:10 PM Bruce Momjian <bruce@momjian.us> wrote: > Actually, Tom Lane runs pgindent usually now. I do the copyright > change, but I don't think we would ignore those since the single-line > change is probably something we would want to blame. The copyright change commits don't need to be considered here. In practice they're just not a problem because nobody wants or expects "git blame" to do anything more than attribute an affected line to a copyright commit. > It would certainly be very easy to pull pgindent commits out of git log > and add them. I do wish we could cause everyone to honor that file, but > it seems each user has to configure their repository to honor it. That doesn't seem like a huge problem. There is no reason why this shouldn't be easy to use and to maintain going forward. There just aren't very many commits involved. Attached is my .git-blame-ignore-revs file, which has pgindent commits that I'd like to exclude from git blame. The file is helpful on its own. But what we really ought to do is commit the file (perhaps with some revisions) and require that it be maintained by the official project workflow documented at src/tools/pgindent/README. -- Peter Geoghegan
Attachment
On Thu, Mar 18, 2021 at 03:03:41PM -0700, Peter Geoghegan wrote: > Attached is my .git-blame-ignore-revs file, which has pgindent commits > that I'd like to exclude from git blame. The file is helpful on its > own. But what we really ought to do is commit the file (perhaps with > some revisions) and require that it be maintained by the official > project workflow documented at src/tools/pgindent/README. It would be kind of nice if the file can be generated automatically. I have you checked if 'pgindent' being on the first line of the commit is sufficient? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian <bruce@momjian.us> wrote: > It would be kind of nice if the file can be generated automatically. I > have you checked if 'pgindent' being on the first line of the commit is > sufficient? I generated the file by looking for commits that: 1) Mentioned "pgindent" or "PGINDENT" in the entire commit message. 2) Had more than 20 or 30 files changed. This left me with fewer than 50 commits that cover over 20 years of history since the first pgindent commit. I also added one or two others that I somehow missed (maybe you happened to spell it "pg indent" that year) through trial and error. The file that I sent to the list works really well for me. I don't think that it's a good idea to automate this process, because we certainly don't want to let incorrect entries slip in. And because there just isn't a lot left to automate -- running pgindent on the tree is something that happens no more than 2 or 3 times a year. It could easily be added to the checklist in the README. It should take less than 5 minutes a year. -- Peter Geoghegan
On Thu, Mar 18, 2021 at 03:20:37PM -0700, Peter Geoghegan wrote: > On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian <bruce@momjian.us> wrote: > > It would be kind of nice if the file can be generated automatically. I > > have you checked if 'pgindent' being on the first line of the commit is > > sufficient? > > I generated the file by looking for commits that: > > 1) Mentioned "pgindent" or "PGINDENT" in the entire commit message. > > 2) Had more than 20 or 30 files changed. > > This left me with fewer than 50 commits that cover over 20 years of > history since the first pgindent commit. I also added one or two > others that I somehow missed (maybe you happened to spell it "pg > indent" that year) through trial and error. The file that I sent to > the list works really well for me. > > I don't think that it's a good idea to automate this process, because > we certainly don't want to let incorrect entries slip in. And because > there just isn't a lot left to automate -- running pgindent on the > tree is something that happens no more than 2 or 3 times a year. It > could easily be added to the checklist in the README. It should take > less than 5 minutes a year. Sounds like a plan. We should mention adding to this file somewhere in our pgindent README. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Peter Geoghegan <pg@bowt.ie> writes: > Attached is my .git-blame-ignore-revs file, which has pgindent commits > that I'd like to exclude from git blame. The file is helpful on its > own. But what we really ought to do is commit the file (perhaps with > some revisions) and require that it be maintained by the official > project workflow documented at src/tools/pgindent/README. I don't object to maintaining such a file; if it makes "git blame" work better, that's a huge win. However, the file as you have it seems rather unreadable. I'd much rather have something that includes the commit date and/or first line of commit message. Is there any flexibility in the format, or does git blame insist it be just like this? regards, tom lane
On Thu, Mar 18, 2021 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't object to maintaining such a file; if it makes "git blame" > work better, that's a huge win. However, the file as you have it > seems rather unreadable. I'd much rather have something that includes > the commit date and/or first line of commit message. Is there any > flexibility in the format, or does git blame insist it be just like this? I ended up doing it that way because I was in a hurry to see how much it helped. I can fix it up. We could require (but not automatically enforce) that the first line of the commit message be included above each hash, as a comment. You could also require reverse chronological ordering of commits. That would make everything easy to follow. It's worth noting that git insists that you provide the full hash of commits here. This is not something I remember it insisting upon in any other area. There is probably a very good practical reason for that. -- Peter Geoghegan
On Thu, Mar 18, 2021 at 03:46:10PM -0700, Peter Geoghegan wrote: > It's worth noting that git insists that you provide the full hash of > commits here. This is not something I remember it insisting upon in > any other area. There is probably a very good practical reason for > that. Probably because later commits might collide with shorter hashes. When you are reporting a hash that only looks _backward_, this is not an issue. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 4:00 PM Bruce Momjian <bruce@momjian.us> wrote: > Probably because later commits might collide with shorter hashes. When > you are reporting a hash that only looks _backward_, this is not an > issue. Right, but it's extremely unlikely to happen by accident. I was suggesting that there might be a security issue. I could fairly easily make my git commit match a prefix intended to uniquely identify your git commit if I set out to do so. There are projects that might have to consider that possibility, though perhaps we're not one of them. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Mar 18, 2021 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't object to maintaining such a file; if it makes "git blame" >> work better, that's a huge win. However, the file as you have it >> seems rather unreadable. I'd much rather have something that includes >> the commit date and/or first line of commit message. Is there any >> flexibility in the format, or does git blame insist it be just like this? > I ended up doing it that way because I was in a hurry to see how much > it helped. I can fix it up. > We could require (but not automatically enforce) that the first line > of the commit message be included above each hash, as a comment. You > could also require reverse chronological ordering of commits. That > would make everything easy to follow. Given that the file will be added to manually, I think just having an existing format to follow will be easy enough. I'd suggest something like b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400 # pgindent run prior to branching v13. which is easy to make from "git log" or "git show" output. (Because of the precedent of those tools, I'd rather write the commit hash before the rest of the entry.) The date is important IMO because otherwise it's quite unclear whether to add a new entry at the top or the bottom. Other points: the file should be kept in src/tools/pgindent/, and it should definitely NOT have a name beginning with ".". regards, tom lane
On Thu, Mar 18, 2021 at 07:05:03PM -0400, Tom Lane wrote: > Other points: the file should be kept in src/tools/pgindent/, and > it should definitely NOT have a name beginning with ".". Well, if we want github and others to eventually honor a file, it seems .git-blame-ignore-revs at the top of the tree is the common location for this. Of course, I don't know if they will ever do that, and can move it later if needed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400 > # pgindent run prior to branching v13. > > which is easy to make from "git log" or "git show" output. (Because > of the precedent of those tools, I'd rather write the commit hash > before the rest of the entry.) WFM. What about reformat-dat-files and perltidy runs? It seems that you have sometimes used all three reformatting tools to produce one commit -- but not always. ISTM that I should get any of those that I missed. And that the pgindent README (which already mentions these other tools) ought to be updated to be explicit about the policy applying equally to commits that apply any of the two other tools in bulk. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > What about reformat-dat-files and perltidy runs? It seems that you > have sometimes used all three reformatting tools to produce one commit > -- but not always. ISTM that I should get any of those that I missed. > And that the pgindent README (which already mentions these other > tools) ought to be updated to be explicit about the policy applying > equally to commits that apply any of the two other tools in bulk. Good question. We don't have a standard about that (whether to do those in separate or the same commits), but we could establish one if it seems helpful. regards, tom lane
On Thu, Mar 18, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Good question. We don't have a standard about that (whether to > do those in separate or the same commits), but we could establish one > if it seems helpful. I don't think that it matters too much, but it will necessitate updating the file multiple times. It might become natural to just do everything together in a way that it wasn't before. The really big wins come from excluding the enormous pgindent run commits, especially for the few historic pgindent runs where the rules changed -- there are no more than a handful of those. They tend to generate an enormous amount of churn that touches almost everything. So it probably isn't necessary to worry about smaller things. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Mar 18, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Good question. We don't have a standard about that (whether to >> do those in separate or the same commits), but we could establish one >> if it seems helpful. > I don't think that it matters too much, but it will necessitate > updating the file multiple times. It might become natural to just do > everything together in a way that it wasn't before. Doubt that it matters. The workflow would have to be "commit and push the mechanical updates, then edit the tracking file, commit and push that". You don't have the commit hash nailed down till you've pushed. So if we decided to do the mechanical updates in several commits, not just one, I'd still be inclined to do them all and then edit the tracking file once. regards, tom lane
On Thu, Mar 18, 2021 at 5:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Doubt that it matters. The workflow would have to be "commit and push > the mechanical updates, then edit the tracking file, commit and push > that". You don't have the commit hash nailed down till you've pushed. Okay. I have made a personal TODO list item for this. I'll pick this up again in April, once the final CF is over. -- Peter Geoghegan
On Thu, Mar 18, 2021 at 4:32 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Mar 18, 2021 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400 > > # pgindent run prior to branching v13. > > > > which is easy to make from "git log" or "git show" output. (Because > > of the precedent of those tools, I'd rather write the commit hash > > before the rest of the entry.) > > WFM. What do you think of the attached? I prefer the ISO date style myself, so I went with that. Note that I have included "Modify BufferGetPage() to prepare for "snapshot too old" feature", as well as "Revert no-op changes to BufferGetPage()". I've noticed that those two particular commits cause unhelpful noise when I run "git blame" on access method code. I see problems with these commits often enough to matter. The latter commit cleanly reverted the former after only 12 days, so ignoring both seems okay to me. Everything else should be either pgindent/perltidy related or reformat-dat-files related. -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > What do you think of the attached? I prefer the ISO date style myself, > so I went with that. I grepped the git logs for "indent" and found a bunch more commits that look like they should be included: db6e2b4c5 d84213909 1e9c85809 f04d4ac91 9ef2dbefc 651902deb ce5548103 b5bce6c1e de94e2af1 d0cd7bda9 befa3e648 7584649a1 84288a86a d74714027 b6b71b85b 46785776c 089003fb4 ea08e6cd5 59f6a57e5 It's possible that some of these touch few enough lines that they are not worth listing; I did not check the commit delta sizes. > Note that I have included "Modify BufferGetPage() to prepare for > "snapshot too old" feature", as well as "Revert no-op changes to > BufferGetPage()". I've noticed that those two particular commits cause > unhelpful noise when I run "git blame" on access method code. Meh. I can get on board with the idea of adding commit+revert pairs to this list, but I'd like a more principled selection filter than "which ones bother Peter". Maybe the size of the reverted patch should factor into it. Do we have an idea of how much adding entries to this list slows down "git blame"? If we include commit+revert pairs more than very sparingly, I'm afraid we'll soon have an enormous list, and I wonder what that will cost us. regards, tom lane
On Mon, Jun 21, 2021 at 8:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's possible that some of these touch few enough lines that they > are not worth listing; I did not check the commit delta sizes. Commits that touch very few lines weren't included originally, just because it didn't seem necessary. Even still, I've looked through the extra commits now, and everything that you picked out looks in scope. I'm just going to include these extra commits. Attached is a new version of the same file, based on your feedback (with those extra commits, and some commits from the last version removed). I'll produce a conventional patch file in the next revision, most likely. > Meh. I can get on board with the idea of adding commit+revert pairs > to this list, but I'd like a more principled selection filter than > "which ones bother Peter". Maybe the size of the reverted patch > should factor into it I have to admit that you're right. That was why I picked those two out. Of course I can defend this choice in detail, but in the interest of not setting a terrible precedent I won't do that. The commits in question have been removed from this revised version. I think it's important that we not get into the business of adding stuff to this willy-nilly. Inevitably everybody will have their own pet peeve noisy commit, and will want to add to the list -- just like I did. Naturally nobody will be interested in arguing against including whatever individual pet peeve commit each time this comes up. Regardless of the merits of the case. Let's just not go there (unless perhaps it's truly awful for almost everybody). > Do we have an idea of how much adding entries to this list slows > down "git blame"? If we include commit+revert pairs more than > very sparingly, I'm afraid we'll soon have an enormous list, and > I wonder what that will cost us. I doubt it costs us much, at least in any way that has a very noticeable relationship as new commits are added. I've now settled on 68 commits, and expect that this won't need to grow very quickly, so that seems fine. From my point of view it makes "git blame" far more useful. LLVM uses a file with fewer entries, and have had such a file since last year: https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs The list of commit hashes in the file that the Blender project uses is about the same size: https://github.com/blender/blender/blob/master/.git-blame-ignore-revs We're using more commits than either project here, but it's within an order of magnitude. Note that this is going to be opt-in, not opt-out. It won't do anything unless an individual hacker decides to enable it locally. The convention seems to be that it is located in the top-level directory. ISTM that we should follow that convention, since following the convention is good, and does not in itself force anybody to ignore any of the listed commits. Any thoughts on that? -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > The convention seems to be that it is located in the top-level > directory. ISTM that we should follow that convention, since following > the convention is good, and does not in itself force anybody to ignore > any of the listed commits. Any thoughts on that? Agreed. I think I'd previously suggested something under src/tools, but we might as well do like others are doing; especially since we have .gitattributes and the like there already. regards, tom lane
On Mon, Jun 21, 2021 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Agreed. I think I'd previously suggested something under src/tools, > but we might as well do like others are doing; especially since > we have .gitattributes and the like there already. Great. Attached is a patch file that puts it all together. I would like to commit this in the next couple of days. -- Peter Geoghegan
Attachment
On Mon, Jun 21, 2021 at 07:43:59PM -0700, Peter Geoghegan wrote: > On Mon, Jun 21, 2021 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Agreed. I think I'd previously suggested something under src/tools, > > but we might as well do like others are doing; especially since > > we have .gitattributes and the like there already. > > Great. > > Attached is a patch file that puts it all together. I would like to > commit this in the next couple of days. +1 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Peter Geoghegan <pg@bowt.ie> writes: > Attached is a patch file that puts it all together. I would like to > commit this in the next couple of days. Hmm, is the "git config blame.ignoreRevsFile" setting actually repo-relative? I'm a bit confused as to whether an absolute file path would be needed to ensure correct behavior. regards, tom lane
On Tue, Jun 22, 2021 at 8:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm, is the "git config blame.ignoreRevsFile" setting actually > repo-relative? I'm a bit confused as to whether an absolute > file path would be needed to ensure correct behavior. That seems to be standard practice, and certainly works for me. If any of the hashes are not well formed, or even appear in abbreviated form, "git blame" breaks in a very obvious and visible way. So there is zero chance of it breaking without somebody noticing immediately. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Tue, Jun 22, 2021 at 8:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm, is the "git config blame.ignoreRevsFile" setting actually >> repo-relative? I'm a bit confused as to whether an absolute >> file path would be needed to ensure correct behavior. > That seems to be standard practice, and certainly works for me. OK, no objections then. regards, tom lane
On Tue, Jun 22, 2021 at 8:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > OK, no objections then. Pushed. Thanks. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Tue, Jun 22, 2021 at 8:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> OK, no objections then. > Pushed. Thanks. Um. You probably should have waited for beta2 to be tagged. No harm done likely, but it's not per release process. regards, tom lane
On Tue, Jun 22, 2021 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Pushed. Thanks. > > Um. You probably should have waited for beta2 to be tagged. > No harm done likely, but it's not per release process. Sorry about that. I was aware of the policy, but somehow overlooked that we were in the window between stamping and tagging. I'll be more careful about this in the future. -- Peter Geoghegan
On 2021-Mar-18, Peter Geoghegan wrote: > On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian <bruce@momjian.us> wrote: > > It would be kind of nice if the file can be generated automatically. I > > have you checked if 'pgindent' being on the first line of the commit is > > sufficient? > > I generated the file by looking for commits that: > > 1) Mentioned "pgindent" or "PGINDENT" in the entire commit message. > > 2) Had more than 20 or 30 files changed. Is there a minimum git version for this to work? It doesn't seem to work for me. ... ah, apparently one needs git 2.23: https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame I have 2.20. [ apt install -t buster-backports git ] I have 2.30. It works better. To be clear: some lines still appear as originating in some pgindent commit, when they are created by such a commit. But as far as I've seen, they're mostly uninteresting ones (whitespace, only braces, only "else", only "for (;;)" and similar). The git blame experience seems much better. Thanks! -- Álvaro Herrera 39°49'30"S 73°17'W "El que vive para el futuro es un iluso, y el que vive para el pasado, un imbécil" (Luis Adler, "Los tripulantes de la noche")
On Tue, Jun 22, 2021 at 5:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I have 2.30. It works better. To be clear: some lines still appear as > originating in some pgindent commit, when they are created by such a > commit. But as far as I've seen, they're mostly uninteresting ones > (whitespace, only braces, only "else", only "for (;;)" and similar). As I understand it there are a small number of remaining lines that are fundamentally impossible to attribute to any commit but a pgindent commit. These are lines that a pgindent commit created, typically when it adds a new single line of whitespace (carriage return). I think that these remaining lines of whitespace probably *should* be attributed to a pgindent commit -- it's actually a good thing. In any case they're unlikely to be called up because they're just whitespace. > The git blame experience seems much better. Thanks! I'm very pleased with the results myself. It's particularly nice when you "git blame" an old file that has been through multiple huge pgindent changes. You can actually see reasonable attributions for commits that go back to the 1990s now. -- Peter Geoghegan