Thread: use AV worker items infrastructure for GIN pending list's cleanup
Hi, When AV worker items where introduced 4 years ago, i was suggested that it could be used for other things like cleaning the pending list of GIN index when it reaches gin_pending_list_limit instead of making user visible operation pay the price. That never happened though. So, here is a little patch for that. Should I add an entry for this on next commitfest? -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Attachment
On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:
When AV worker items where introduced 4 years ago, i was suggested thatit could be used for other things like cleaning the pending list of GINindex when it reaches gin_pending_list_limit instead of making uservisible operation pay the price.That never happened though. So, here is a little patch for that.Should I add an entry for this on next commitfest?
+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.
> To avoid fluctuations in observed response time, it's desirable to have
> pending-list cleanup occur in the background (i.e., via autovacuum).
Could you provide a link from the previous discussion?
On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote: > On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote: > > When AV worker items where introduced 4 years ago, i was suggested that > > it could be used for other things like cleaning the pending list of GIN > > index when it reaches gin_pending_list_limit instead of making user > > visible operation pay the price. > > > > That never happened though. So, here is a little patch for that. > > > > Should I add an entry for this on next commitfest? > +1. It slipped through the cracks along the years. It is even suggested in the > current docs since the fast update support. > > https://www.postgresql.org/docs/current/gin-tips.html > Interesting, that comment maybe needs to be rewritten. I would go for remove completely the first paragraph under gin_pending_list_limit entry > > Could you provide a link from the previous discussion? > It happened here: https://www.postgresql.org/message-id/flat/20170301045823.vneqdqkmsd4as4ds%40alvherre.pgsql -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote:
On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:> On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:> > When AV worker items where introduced 4 years ago, i was suggested that> > it could be used for other things like cleaning the pending list of GIN> > index when it reaches gin_pending_list_limit instead of making user> > visible operation pay the price.> >> > That never happened though. So, here is a little patch for that.> >> > Should I add an entry for this on next commitfest?> +1. It slipped through the cracks along the years. It is even suggested in the> current docs since the fast update support.>>Interesting, that comment maybe needs to be rewritten. I would go forremove completely the first paragraph under gin_pending_list_limit entry
Thanks for working on this patch.
I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found via the front page of Hacker News: "Debugging random slow writes in PostgreSQL" (https://iamsafts.com/posts/postgres-gin-performance/).
I thought it could be interesting to read about a real user story where this patch would be helpful.
Hacker News discussion: https://news.ycombinator.com/item?id=27152507
/Joel
On Sat, May 15, 2021 at 08:12:51AM +0200, Joel Jacobson wrote: > On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote: > > On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote: > > > On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote: > > > > When AV worker items where introduced 4 years ago, i was suggested that > > > > it could be used for other things like cleaning the pending list of GIN > > > > index when it reaches gin_pending_list_limit instead of making user > > > > visible operation pay the price. > > > > > > > > That never happened though. So, here is a little patch for that. > > > > > > > > Should I add an entry for this on next commitfest? > > > +1. It slipped through the cracks along the years. It is even suggested in the > > > current docs since the fast update support. > > > > > > https://www.postgresql.org/docs/current/gin-tips.html > > > > > > > Interesting, that comment maybe needs to be rewritten. I would go for > > remove completely the first paragraph under gin_pending_list_limit entry > > Thanks for working on this patch. > > I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found viathe front page of Hacker News: "Debugging random slow writes in PostgreSQL" (https://iamsafts.com/posts/postgres-gin-performance/). > > I thought it could be interesting to read about a real user story where this patch would be helpful. > A customer here has 20+ GIN indexes in a big heavily used table and every time one of the indexes reaches gin_pending_list_limit (because of an insert or update) a user feels the impact. So, currently we have a cronjob running periodically and checking pending list sizes to process the index before the limit get fired by an user operation. While the index still is processed and locked the fact that doesn't happen in the user face make the process less notorious and in the mind of users faster. This will provide the same facility, the process will happen "in the background". -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Sat, May 15, 2021, at 08:42, Jaime Casanova wrote:
A customer here has 20+ GIN indexes in a big heavily used table andevery time one of the indexes reaches gin_pending_list_limit (because ofan insert or update) a user feels the impact.So, currently we have a cronjob running periodically and checkingpending list sizes to process the index before the limit get fired by anuser operation. While the index still is processed and locked the factthat doesn't happen in the user face make the process less notorious andin the mind of users faster.This will provide the same facility, the process will happen "in thebackground".
Sounds like a great improvement, many thanks.
/Joel
On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > Hi, > > When AV worker items where introduced 4 years ago, i was suggested that > it could be used for other things like cleaning the pending list of GIN > index when it reaches gin_pending_list_limit instead of making user > visible operation pay the price. > > That never happened though. So, here is a little patch for that. Thank you for working on this. I like the idea of cleaning the GIN pending list using by autovacuum work item. But with the patch, we request and skip the pending list cleanup if the pending list size exceeds gin_pending_list_limit during insertion. But autovacuum work items are executed after an autovacuum runs. So if many insertions happen before executing the autovacuum work item, we will end up greatly exceeding the threshold (gin_pending_list_limit) and registering the same work item again and again. Maybe we need something like a soft limit and a hard limit? That is, if the pending list size exceeds the soft limit, we request the work item. OTOH, if it exceeds the hard limit (gin_pending_list_limit) we cleanup the pending list before insertion. We might also need to have autovacuum work items ignore the work item if the same work item with the same arguments is already registered. In addition to that, I think we should avoid the work item for cleaning the pending list from being executed if an autovacuum runs on the gin index before executing the work item. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Mon, May 17, 2021 at 01:46:37PM +0900, Masahiko Sawada wrote: > On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: > > > > Hi, > > > > When AV worker items where introduced 4 years ago, i was suggested that > > it could be used for other things like cleaning the pending list of GIN > > index when it reaches gin_pending_list_limit instead of making user > > visible operation pay the price. > > > > That never happened though. So, here is a little patch for that. > > Thank you for working on this. > > I like the idea of cleaning the GIN pending list using by autovacuum > work item. But with the patch, we request and skip the pending list > cleanup if the pending list size exceeds gin_pending_list_limit during > insertion. But autovacuum work items are executed after an autovacuum > runs. So if many insertions happen before executing the autovacuum > work item, we will end up greatly exceeding the threshold > (gin_pending_list_limit) and registering the same work item again and > again. Maybe we need something like a soft limit and a hard limit? > That is, if the pending list size exceeds the soft limit, we request > the work item. OTOH, if it exceeds the hard limit > (gin_pending_list_limit) we cleanup the pending list before insertion. > We might also need to have autovacuum work items ignore the work item > if the same work item with the same arguments is already registered. > In addition to that, I think we should avoid the work item for > cleaning the pending list from being executed if an autovacuum runs on > the gin index before executing the work item. > Thanks for your comments on this. I have been working on a rebased version, but ENOTIME right now. Will mark this one as "Returned with feedback" and resubmit for november. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL