Re: Experimental tool to explore commitfest patches - Mailing list pgsql-hackers

From Jacob Brazeal
Subject Re: Experimental tool to explore commitfest patches
Date
Msg-id CA+COZaDtJ-fa0Lu1zDW7W8op+k+y-77rhABqLz8U0MVAJ9w70g@mail.gmail.com
Whole thread Raw
In response to Re: Experimental tool to explore commitfest patches  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I wanted to provide a quick update on the app [0]. Here are the main issues I've seen flagged so far:

1. The ranking system needs improvement. Ideally it should promote relevant, important, ready-for-review patches. 
2. We should display contributor names as they appear in the commitfest app (this is relevant because we have to correlate names from several different systems.)

I will be working on all of these, but tonight I want to provide an update on the ranking system. The app now predicts which committers might be a good fit for a patch, and displays this information in the app. If you are committer and select your name in the queue, those patches will float to the top. As a quick sanity check, most of the cases I've seen flagged so far are correctly handled by the new system. Here are some more details on how it works and how 

The new recommendation system is based on keywords. I used an LLM to extract technical keywords from the mailing list threads associated with the last 10,000 git commits, and then trained a logistic regression model to match the keywords to committers. I'm no expert at this, but I did some basic statistical validation of the result on a training/test split and got decent results: around 44% of the top choices of the model were correct, and just to be safe, I show the top 3 predicted committer for each patch in the UX. When looking at specific folks like Robert, in our test dataset, about 77% of the results matched to him he actually committed (precision) and we overall identify about 45% of his commits (recall.) So, not perfect, but actually pretty likely to tag a mailing list thread to the person who will commit it.

In the UX, if you are one of the top 3 identified committers, you will also see a list of the top keywords from the mailing thread that were associated with you.

[0] https://patchwork-three.vercel.app/

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: per backend WAL statistics
Next
From: Andrey Borodin
Date:
Subject: Re: Spinlock can be released twice in procsignal.c