Thread: code contributions for 2024, WIP version
Hi, As many of you are probably aware, I have been doing an annual blog post on who contributes to PostgreSQL development for some years now. It includes information on lines of code committed to PostgreSQL, and also emails sent to the list. This year, I got a jump on analyzing the commit log, and a draft of the data covering January-November of 2024 has been uploaded in pg_dump format to here: https://sites.google.com/site/robertmhaas/contributions I'm sending this message to invite anyone who is interested to review the data in the commits2024 table and send me corrections. For example, it's possible that there are cases where I've failed to pick out the correct primary author for a commit; or where somebody's name is spelled in two different ways; or where somebody's name is not spelled the way that they prefer. You'll notice that the table has columns "lines" and "xlines". I have set xlines=0 in cases where (a) I considered the commit to be a large, mechanical commit such as a pgindent run or translation updates; or (b) the commit was reverting some other commit that occurred earlier in 2024; or (c) the commit was subsequently reverted. When I run the final statistics, those commits will still count for the statistics that count the number of commits, but the lines they inserted will not be counted as lines of code contributed in 2024. Also for clarity, please be aware that the "ncauthor" column is not used in the final reporting; that is just there so that I can set author=coalesce(ncauthor,committer) at a certain phase of the data preparation. Corrections should be made to the author column, not ncauthor. If you would like to correct the data, please send me your corrections off-list, as a reply to this email, ideally in the form of one or more UPDATE statements. If you would like to complain about the methodology, I can't stop you, but please bear in mind that (1) this is already a lot of work and (2) I've always been upfront in my blog post about what the limitations of the methodology are and I do my best not to suggest that this method is somehow perfect or unimpeachable and (3) you're welcome to publish your own blog post where you compute things differently. I'm open to reasonable suggestions for improvement, but if your overall view is that this sucks or that I suck for doing it, I'm sorry that you feel that way but giving me that feedback probably will not induce me to do anything differently. Donning my asbestos underwear, I remain yours faithfully, -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote: > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote: >> Donning my asbestos underwear, I remain yours faithfully, > > Thanks for taking the time to compile all that. That's really nice. +1, I always look forward to the blog post. -- nathan
On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote: > > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote: > >> Donning my asbestos underwear, I remain yours faithfully, > > > > Thanks for taking the time to compile all that. That's really nice. > > +1, I always look forward to the blog post. Thanks, glad it's appreciated. -- Robert Haas EDB: http://www.enterprisedb.com
On 12/3/24 10:44, Robert Haas wrote: > On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote: >> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote: >> >> Donning my asbestos underwear, I remain yours faithfully, >> > >> > Thanks for taking the time to compile all that. That's really nice. >> >> +1, I always look forward to the blog post. > > Thanks, glad it's appreciated. It is definitely appreciated. While I know you said "you will do you" when it comes to your annual blog, there are a number of similar efforts -- top of mind is the analysis done (as I understand it) by Daniel Gustafsson and Claire Giordano [1], as well as ongoing/recurring analysis done by the contributor committee. And there is the adjacent related discussion around commit messages/authors. It makes me wonder if there isn't a way to make all of our lives easier going forward. [1] https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote: > While I know you said "you will do you" when it comes to your annual > blog, there are a number of similar efforts -- top of mind is the > analysis done (as I understand it) by Daniel Gustafsson and Claire > Giordano [1], as well as ongoing/recurring analysis done by the > contributor committee. And there is the adjacent related discussion > around commit messages/authors. It makes me wonder if there isn't a way > to make all of our lives easier going forward. Yes, I'm game to try to figure out how to combine our efforts. I don't think it's a bad thing that different people have different takes; this is complicated and looking at it through just one lens is limiting. But people duplicating work is, well, not so good. -- Robert Haas EDB: http://www.enterprisedb.com
> On 3 Dec 2024, at 17:41, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote: >> While I know you said "you will do you" when it comes to your annual >> blog, there are a number of similar efforts -- top of mind is the >> analysis done (as I understand it) by Daniel Gustafsson and Claire >> Giordano [1], as well as ongoing/recurring analysis done by the >> contributor committee. And there is the adjacent related discussion >> around commit messages/authors. It makes me wonder if there isn't a way >> to make all of our lives easier going forward. > > Yes, I'm game to try to figure out how to combine our efforts. I don't > think it's a bad thing that different people have different takes; > this is complicated and looking at it through just one lens is > limiting. But people duplicating work is, well, not so good. If we settled on a meta-data standard for how to identify authors, reviewers, backpatches etc I think that would go a very long way to lower the complexity of getting to the data and keep folks focused on doing interesting analysis. -- Daniel Gustafsson
Hello Robert, On 2024-Dec-02, Robert Haas wrote: > As many of you are probably aware, I have been doing an annual blog > post on who contributes to PostgreSQL development for some years now. > It includes information on lines of code committed to PostgreSQL, and > also emails sent to the list. This year, I got a jump on analyzing the > commit log, and a draft of the data covering January-November of 2024 > has been uploaded in pg_dump format to here: > > https://sites.google.com/site/robertmhaas/contributions > > I'm sending this message to invite anyone who is interested to review > the data in the commits2024 table and send me corrections. No corrections here -- I noticed nothing wrong with the commits I am involved with, in a quick read. I did notice that for patches with multiple authors, only the first one is listed. For instance, 53c2a97a926's author ("Improve performance of subsystems on top of SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize that addressing this would complicate the schema and queries, but maybe it's worth thinking about for next time. We have plenty of patches with multiple authors, after all. Hmm, maybe UPDATE commits2024 SET xlines = 0 WHERE commitid in ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6', '21ef4d4d897', '592a2283721'); How did you come up with the 'lines' number for each commit anyway? Judging by 592a2283721 it's not just the number of lines added, since that commit added 3 lines and you have lines=2. An unrelated (and possibly useless) thing is that some committers seem firmly in the camp of ending commit titles with a period, others are firmly in the other camp; only two people seem not to have made up their minds about that: committer │ with end period │ without end period │ fraction with end period ────────────────────┼─────────────────┼────────────────────┼────────────────────────── Etsuro Fujita │ 6 │ 0 │ 100.00 Peter Geoghegan │ 39 │ 0 │ 100.00 Tatsuo Ishii │ 8 │ 0 │ 100.00 Amit Kapila │ 87 │ 0 │ 100.00 Fujii Masao │ 35 │ 0 │ 100.00 Tom Lane │ 296 │ 1 │ 99.66 Nathan Bossart │ 131 │ 1 │ 99.24 Jeff Davis │ 88 │ 1 │ 98.88 Noah Misch │ 61 │ 1 │ 98.39 Thomas Munro │ 59 │ 1 │ 98.33 Masahiko Sawada │ 39 │ 1 │ 97.50 Dean Rasheed │ 23 │ 1 │ 95.83 Robert Haas │ 77 │ 10 │ 88.51 Joe Conway │ 1 │ 2 │ 33.33 Alexander Korotkov │ 4 │ 153 │ 2.55 Andrew Dunstan │ 1 │ 40 │ 2.44 Bruce Momjian │ 2 │ 82 │ 2.38 Heikki Linnakangas │ 4 │ 174 │ 2.25 Peter Eisentraut │ 6 │ 309 │ 1.90 Amit Langote │ 1 │ 54 │ 1.82 Álvaro Herrera │ 1 │ 118 │ 0.84 Michael Paquier │ 1 │ 275 │ 0.36 Andres Freund │ 0 │ 26 │ 0.00 Richard Guo │ 0 │ 27 │ 0.00 Daniel Gustafsson │ 0 │ 99 │ 0.00 Magnus Hagander │ 0 │ 4 │ 0.00 John Naylor │ 0 │ 33 │ 0.00 Melanie Plageman │ 0 │ 6 │ 0.00 David Rowley │ 0 │ 106 │ 0.00 Tomas Vondra │ 0 │ 33 │ 0.00 Query was: select committer, count(*) filter (where subject like '%.') as "with end period", count(*) filter (where subject not like '%.') "without end period", ((count(*) filter (where subject like '%.'))::numeric / count(*) * 100)::numeric(5,2) as "fraction with end period" from commits2024 group by committer order by 4 desc, split_part(committer, ' ', 2); Thanks! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working." -- Paul Graham, http://www.paulgraham.com/opensource.html
> On 5 Dec 2024, at 17:46, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > We have plenty of patches with > multiple authors, after all. +1, thanks for raising this. A lot of stuff is actually joint work. It’s much more fun to develop something in a group of co-authors. Best regards, Andrey Borodin.
On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > No corrections here -- I noticed nothing wrong with the commits I am > involved with, in a quick read. I did notice that for patches with > multiple authors, only the first one is listed. For instance, > 53c2a97a926's author ("Improve performance of subsystems on top of > SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize > that addressing this would complicate the schema and queries, but maybe > it's worth thinking about for next time. We have plenty of patches with > multiple authors, after all. I agree, but I don't know how to apportion the work between the authors. I think dividing credit equally between two or three authors would often be very unfair to the first author. If we want to annotate commit messages in a way that allows me to apportion credit more fairly, I'm totally game to do that, but otherwise I think that giving the credit to the first author is probably more fair on average. > Hmm, maybe > UPDATE commits2024 SET xlines = 0 WHERE commitid in > ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6', > '21ef4d4d897', '592a2283721'); Thanks. > How did you come up with the 'lines' number for each commit anyway? > Judging by 592a2283721 it's not just the number of lines added, since > that commit added 3 lines and you have lines=2. git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M -- Robert Haas EDB: http://www.enterprisedb.com
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.
Perhaps slightly off topic, so how does one provide input to the contributor committee?
[1]
https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Thomas John Kincaid
On 2024-Dec-05, Robert Haas wrote: > On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > No corrections here -- I noticed nothing wrong with the commits I am > > involved with, in a quick read. I did notice that for patches with > > multiple authors, only the first one is listed. For instance, > > 53c2a97a926's author ("Improve performance of subsystems on top of > > SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize > > that addressing this would complicate the schema and queries, but maybe > > it's worth thinking about for next time. We have plenty of patches with > > multiple authors, after all. > > I agree, but I don't know how to apportion the work between the > authors. I think dividing credit equally between two or three authors > would often be very unfair to the first author. If we want to annotate > commit messages in a way that allows me to apportion credit more > fairly, I'm totally game to do that, but otherwise I think that giving > the credit to the first author is probably more fair on average. Just give credit to all lines for all authors, would be my approach. Is that unfair? Perhaps, but I'd rather err on the side of giving too much credit, than on not giving enough. > > How did you come up with the 'lines' number for each commit anyway? > > Judging by 592a2283721 it's not just the number of lines added, since > > that commit added 3 lines and you have lines=2. > > git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M Ah, it's -w that makes the difference, got it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Right now the sectors on the hard disk run clockwise, but I heard a rumor that you can squeeze 0.2% more throughput by running them counterclockwise. It's worth the effort. Recommended." (Gerry Pourwelle)
On Thu, Dec 5, 2024 at 10:39:38AM -0500, Tom Kincaid wrote: > While I know you said "you will do you" when it comes to your annual > blog, there are a number of similar efforts -- top of mind is the > analysis done (as I understand it) by Daniel Gustafsson and Claire > Giordano [1], as well as ongoing/recurring analysis done by the > contributor committee. And there is the adjacent related discussion > around commit messages/authors. It makes me wonder if there isn't a way > to make all of our lives easier going forward. > > Perhaps slightly off topic, so how does one provide input to the contributor > committee? The committee is responsible for updating the contributors list web page: https://www.postgresql.org/community/contributors/ and does analysis of contributions to the Postgres community to help update the list. Their email address at the bottom: contributors@postgresql.org -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
On Thu, Dec 5, 2024 at 11:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Just give credit to all lines for all authors, would be my approach. Is > that unfair? Perhaps, but I'd rather err on the side of giving too much > credit, than on not giving enough. I'm not against somebody putting that together, but I don't think it would be useful for me. I think it would inflate the numbers for committers by quite a lot more than what is fair, because if I commit a 1000 line patch and I add 50 lines of code, I'm going to get an awful lot more credit than I deserve. It would probably also inflate or distort the numbers for some other people as well. But what I would say is -- if you think it's a useful thing, try doing it. -- Robert Haas EDB: http://www.enterprisedb.com