Thread: code contributions for 2024, WIP version

code contributions for 2024, WIP version

From

Robert Haas

Date:

03 December 2024, 00:10:22

Hi,

As many of you are probably aware, I have been doing an annual blog
post on who contributes to PostgreSQL development for some years now.
It includes information on lines of code committed to PostgreSQL, and
also emails sent to the list. This year, I got a jump on analyzing the
commit log, and a draft of the data covering January-November of 2024
has been uploaded in pg_dump format to here:

https://sites.google.com/site/robertmhaas/contributions

I'm sending this message to invite anyone who is interested to review
the data in the commits2024 table and send me corrections. For
example, it's possible that there are cases where I've failed to pick
out the correct primary author for a commit; or where somebody's name
is spelled in two different ways; or where somebody's name is not
spelled the way that they prefer.

You'll notice that the table has columns "lines" and "xlines". I have
set xlines=0 in cases where (a) I considered the commit to be a large,
mechanical commit such as a pgindent run or translation updates; or
(b) the commit was reverting some other commit that occurred earlier
in 2024; or (c) the commit was subsequently reverted. When I run the
final statistics, those commits will still count for the statistics
that count the number of commits, but the lines they inserted will not
be counted as lines of code contributed in 2024. Also for clarity,
please be aware that the "ncauthor" column is not used in the final
reporting; that is just there so that I can set
author=coalesce(ncauthor,committer) at a certain phase of the data
preparation. Corrections should be made to the author column, not
ncauthor.

If you would like to correct the data, please send me your corrections
off-list, as a reply to this email, ideally in the form of one or more
UPDATE statements. If you would like to complain about the
methodology, I can't stop you, but please bear in mind that (1) this
is already a lot of work and (2) I've always been upfront in my blog
post about what the limitations of the methodology are and I do my
best not to suggest that this method is somehow perfect or
unimpeachable and (3) you're welcome to publish your own blog post
where you compute things differently. I'm open to reasonable
suggestions for improvement, but if your overall view is that this
sucks or that I suck for doing it, I'm sorry that you feel that way
but giving me that feedback probably will not induce me to do anything
differently.

Donning my asbestos underwear, I remain yours faithfully,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: code contributions for 2024, WIP version

From

Nathan Bossart

Date:

03 December 2024, 18:37:41

On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> Donning my asbestos underwear, I remain yours faithfully,
> 
> Thanks for taking the time to compile all that.  That's really nice.

+1, I always look forward to the blog post.

-- 
nathan

Re: code contributions for 2024, WIP version

From

Robert Haas

Date:

03 December 2024, 18:44:31

On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
> >> Donning my asbestos underwear, I remain yours faithfully,
> >
> > Thanks for taking the time to compile all that.  That's really nice.
>
> +1, I always look forward to the blog post.

Thanks, glad it's appreciated.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: code contributions for 2024, WIP version

From

Joe Conway

Date:

03 December 2024, 19:19:19

On 12/3/24 10:44, Robert Haas wrote:
> On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
>> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> >> Donning my asbestos underwear, I remain yours faithfully,
>> >
>> > Thanks for taking the time to compile all that.  That's really nice.
>>
>> +1, I always look forward to the blog post.
> 
> Thanks, glad it's appreciated.

It is definitely appreciated.

While I know you said "you will do you" when it comes to your annual 
blog, there are a number of similar efforts -- top of mind is the 
analysis done (as I understand it) by Daniel Gustafsson and Claire 
Giordano [1], as well as ongoing/recurring analysis done by the 
contributor committee. And there is the adjacent related discussion 
around commit messages/authors. It makes me wonder if there isn't a way 
to make all of our lives easier going forward.

[1] 

https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024
-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: code contributions for 2024, WIP version

From

Robert Haas

Date:

03 December 2024, 19:41:22

On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
> While I know you said "you will do you" when it comes to your annual
> blog, there are a number of similar efforts -- top of mind is the
> analysis done (as I understand it) by Daniel Gustafsson and Claire
> Giordano [1], as well as ongoing/recurring analysis done by the
> contributor committee. And there is the adjacent related discussion
> around commit messages/authors. It makes me wonder if there isn't a way
> to make all of our lives easier going forward.

Yes, I'm game to try to figure out how to combine our efforts. I don't
think it's a bad thing that different people have different takes;
this is complicated and looking at it through just one lens is
limiting. But people duplicating work is, well, not so good.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: code contributions for 2024, WIP version

From

Daniel Gustafsson

Date:

04 December 2024, 00:07:21

> On 3 Dec 2024, at 17:41, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
>> While I know you said "you will do you" when it comes to your annual
>> blog, there are a number of similar efforts -- top of mind is the
>> analysis done (as I understand it) by Daniel Gustafsson and Claire
>> Giordano [1], as well as ongoing/recurring analysis done by the
>> contributor committee. And there is the adjacent related discussion
>> around commit messages/authors. It makes me wonder if there isn't a way
>> to make all of our lives easier going forward.
>
> Yes, I'm game to try to figure out how to combine our efforts. I don't
> think it's a bad thing that different people have different takes;
> this is complicated and looking at it through just one lens is
> limiting. But people duplicating work is, well, not so good.

If we settled on a meta-data standard for how to identify authors, reviewers,
backpatches etc I think that would go a very long way to lower the complexity
of getting to the data and keep folks focused on doing interesting analysis.

--
Daniel Gustafsson

Re: code contributions for 2024, WIP version

From

Alvaro Herrera

Date:

05 December 2024, 15:46:48

Hello Robert,

On 2024-Dec-02, Robert Haas wrote:

> As many of you are probably aware, I have been doing an annual blog
> post on who contributes to PostgreSQL development for some years now.
> It includes information on lines of code committed to PostgreSQL, and
> also emails sent to the list. This year, I got a jump on analyzing the
> commit log, and a draft of the data covering January-November of 2024
> has been uploaded in pg_dump format to here:
> 
> https://sites.google.com/site/robertmhaas/contributions
> 
> I'm sending this message to invite anyone who is interested to review
> the data in the commits2024 table and send me corrections.

No corrections here -- I noticed nothing wrong with the commits I am
involved with, in a quick read.  I did notice that for patches with
multiple authors, only the first one is listed.  For instance,
53c2a97a926's author ("Improve performance of subsystems on top of
SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
that addressing this would complicate the schema and queries, but maybe
it's worth thinking about for next time.  We have plenty of patches with
multiple authors, after all.

Hmm, maybe
UPDATE commits2024 SET xlines = 0 WHERE commitid in
  ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
  '21ef4d4d897', '592a2283721');

How did you come up with the 'lines' number for each commit anyway?
Judging by 592a2283721 it's not just the number of lines added, since
that commit added 3 lines and you have lines=2.


An unrelated (and possibly useless) thing is that some committers seem
firmly in the camp of ending commit titles with a period, others are
firmly in the other camp; only two people seem not to have made up their
minds about that:

     committer      │ with end period │ without end period │ fraction with end period 
────────────────────┼─────────────────┼────────────────────┼──────────────────────────
 Etsuro Fujita      │               6 │                  0 │                   100.00
 Peter Geoghegan    │              39 │                  0 │                   100.00
 Tatsuo Ishii       │               8 │                  0 │                   100.00
 Amit Kapila        │              87 │                  0 │                   100.00
 Fujii Masao        │              35 │                  0 │                   100.00
 Tom Lane           │             296 │                  1 │                    99.66
 Nathan Bossart     │             131 │                  1 │                    99.24
 Jeff Davis         │              88 │                  1 │                    98.88
 Noah Misch         │              61 │                  1 │                    98.39
 Thomas Munro       │              59 │                  1 │                    98.33
 Masahiko Sawada    │              39 │                  1 │                    97.50
 Dean Rasheed       │              23 │                  1 │                    95.83
 Robert Haas        │              77 │                 10 │                    88.51
 Joe Conway         │               1 │                  2 │                    33.33
 Alexander Korotkov │               4 │                153 │                     2.55
 Andrew Dunstan     │               1 │                 40 │                     2.44
 Bruce Momjian      │               2 │                 82 │                     2.38
 Heikki Linnakangas │               4 │                174 │                     2.25
 Peter Eisentraut   │               6 │                309 │                     1.90
 Amit Langote       │               1 │                 54 │                     1.82
 Álvaro Herrera     │               1 │                118 │                     0.84
 Michael Paquier    │               1 │                275 │                     0.36
 Andres Freund      │               0 │                 26 │                     0.00
 Richard Guo        │               0 │                 27 │                     0.00
 Daniel Gustafsson  │               0 │                 99 │                     0.00
 Magnus Hagander    │               0 │                  4 │                     0.00
 John Naylor        │               0 │                 33 │                     0.00
 Melanie Plageman   │               0 │                  6 │                     0.00
 David Rowley       │               0 │                106 │                     0.00
 Tomas Vondra       │               0 │                 33 │                     0.00

Query was:
select committer,
  count(*) filter (where subject     like '%.') as "with end period",
  count(*) filter (where subject not like '%.') "without end period",
  ((count(*) filter (where subject like '%.'))::numeric / count(*) * 100)::numeric(5,2) as "fraction with end period"
from commits2024
group by committer
order by 4 desc, split_part(committer, ' ', 2);


Thanks!

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
                  -- Paul Graham, http://www.paulgraham.com/opensource.html

Re: code contributions for 2024, WIP version

From

"Andrey M. Borodin"

Date:

05 December 2024, 16:43:06


> On 5 Dec 2024, at 17:46, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> We have plenty of patches with
> multiple authors, after all.

+1, thanks for raising this. A lot of stuff is actually joint work.
It’s much more fun to develop something in a group of co-authors.


Best regards, Andrey Borodin.

Re: code contributions for 2024, WIP version

From

Robert Haas

Date:

05 December 2024, 18:23:23

On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> No corrections here -- I noticed nothing wrong with the commits I am
> involved with, in a quick read.  I did notice that for patches with
> multiple authors, only the first one is listed.  For instance,
> 53c2a97a926's author ("Improve performance of subsystems on top of
> SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
> that addressing this would complicate the schema and queries, but maybe
> it's worth thinking about for next time.  We have plenty of patches with
> multiple authors, after all.

I agree, but I don't know how to apportion the work between the
authors. I think dividing credit equally between two or three authors
would often be very unfair to the first author. If we want to annotate
commit messages in a way that allows me to apportion credit more
fairly, I'm totally game to do that, but otherwise I think that giving
the credit to the first author is probably more fair on average.

> Hmm, maybe
> UPDATE commits2024 SET xlines = 0 WHERE commitid in
>   ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
>   '21ef4d4d897', '592a2283721');

Thanks.

> How did you come up with the 'lines' number for each commit anyway?
> Judging by 592a2283721 it's not just the number of lines added, since
> that commit added 3 lines and you have lines=2.

git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: code contributions for 2024, WIP version

From

Tom Kincaid

Date:

05 December 2024, 18:39:38

While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.

Perhaps slightly off topic, so how does one provide input to the contributor committee?

[1]
https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Thomas John Kincaid

Re: code contributions for 2024, WIP version

From

Alvaro Herrera

Date:

05 December 2024, 19:19:18

On 2024-Dec-05, Robert Haas wrote:

> On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > No corrections here -- I noticed nothing wrong with the commits I am
> > involved with, in a quick read.  I did notice that for patches with
> > multiple authors, only the first one is listed.  For instance,
> > 53c2a97a926's author ("Improve performance of subsystems on top of
> > SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
> > that addressing this would complicate the schema and queries, but maybe
> > it's worth thinking about for next time.  We have plenty of patches with
> > multiple authors, after all.
> 
> I agree, but I don't know how to apportion the work between the
> authors. I think dividing credit equally between two or three authors
> would often be very unfair to the first author. If we want to annotate
> commit messages in a way that allows me to apportion credit more
> fairly, I'm totally game to do that, but otherwise I think that giving
> the credit to the first author is probably more fair on average.

Just give credit to all lines for all authors, would be my approach.  Is
that unfair?  Perhaps, but I'd rather err on the side of giving too much
credit, than on not giving enough.

> > How did you come up with the 'lines' number for each commit anyway?
> > Judging by 592a2283721 it's not just the number of lines added, since
> > that commit added 3 lines and you have lines=2.
> 
> git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M

Ah, it's -w that makes the difference, got it.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended."  (Gerry Pourwelle)

Re: code contributions for 2024, WIP version

From

Bruce Momjian

Date:

05 December 2024, 19:42:36

On Thu, Dec  5, 2024 at 10:39:38AM -0500, Tom Kincaid wrote:
>     While I know you said "you will do you" when it comes to your annual
>     blog, there are a number of similar efforts -- top of mind is the
>     analysis done (as I understand it) by Daniel Gustafsson and Claire
>     Giordano [1], as well as ongoing/recurring analysis done by the
>     contributor committee. And there is the adjacent related discussion
>     around commit messages/authors. It makes me wonder if there isn't a way
>     to make all of our lives easier going forward.
>
> Perhaps slightly off topic, so how does one provide input to the contributor
> committee?

The committee is responsible for updating the contributors list web page:

    https://www.postgresql.org/community/contributors/

and does analysis of contributions to the Postgres community to help
update the list.

Their email address at the bottom:

    contributors@postgresql.org

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: code contributions for 2024, WIP version

From

Robert Haas

Date:

05 December 2024, 19:43:49

On Thu, Dec 5, 2024 at 11:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Just give credit to all lines for all authors, would be my approach.  Is
> that unfair?  Perhaps, but I'd rather err on the side of giving too much
> credit, than on not giving enough.

I'm not against somebody putting that together, but I don't think it
would be useful for me. I think it would inflate the numbers for
committers by quite a lot more than what is fair, because if I commit
a 1000 line patch and I add 50 lines of code, I'm going to get an
awful lot more credit than I deserve. It would probably also inflate
or distort the numbers for some other people as well. But what I would
say is -- if you think it's a useful thing, try doing it.

--
Robert Haas
EDB: http://www.enterprisedb.com