Thread: use AV worker items infrastructure for GIN pending list's cleanup

use AV worker items infrastructure for GIN pending list's cleanup

From
Jaime Casanova
Date:
Hi,

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment

Re: use AV worker items infrastructure for GIN pending list's cleanup

From
"Euler Taveira"
Date:
On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:
When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?
+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.


> To avoid fluctuations in observed response time, it's desirable to have
> pending-list cleanup occur in the background (i.e., via autovacuum).

Could you provide a link from the previous discussion?


--
Euler Taveira

Re: use AV worker items infrastructure for GIN pending list's cleanup

From
Jaime Casanova
Date:
On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:
> On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:
> > When AV worker items where introduced 4 years ago, i was suggested that
> > it could be used for other things like cleaning the pending list of GIN
> > index when it reaches gin_pending_list_limit instead of making user
> > visible operation pay the price.
> > 
> > That never happened though. So, here is a little patch for that.
> > 
> > Should I add an entry for this on next commitfest?
> +1. It slipped through the cracks along the years. It is even suggested in the
> current docs since the fast update support.
> 
> https://www.postgresql.org/docs/current/gin-tips.html
> 

Interesting, that comment maybe needs to be rewritten. I would go for
remove completely the first paragraph under gin_pending_list_limit entry

> 
> Could you provide a link from the previous discussion?
> 

It happened here:
https://www.postgresql.org/message-id/flat/20170301045823.vneqdqkmsd4as4ds%40alvherre.pgsql

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: use AV worker items infrastructure for GIN pending list's cleanup

From
"Joel Jacobson"
Date:
On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote:
On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:
> On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:
> > When AV worker items where introduced 4 years ago, i was suggested that
> > it could be used for other things like cleaning the pending list of GIN
> > index when it reaches gin_pending_list_limit instead of making user
> > visible operation pay the price.
> > 
> > That never happened though. So, here is a little patch for that.
> > 
> > Should I add an entry for this on next commitfest?
> +1. It slipped through the cracks along the years. It is even suggested in the
> current docs since the fast update support.



Interesting, that comment maybe needs to be rewritten. I would go for
remove completely the first paragraph under gin_pending_list_limit entry

Thanks for working on this patch.

I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found via the front page of Hacker News: "Debugging random slow writes in PostgreSQL" (https://iamsafts.com/posts/postgres-gin-performance/).

I thought it could be interesting to read about a real user story where this patch would be helpful.


/Joel

Re: use AV worker items infrastructure for GIN pending list's cleanup

From
Jaime Casanova
Date:
On Sat, May 15, 2021 at 08:12:51AM +0200, Joel Jacobson wrote:
> On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote:
> > On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:
> > > On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:
> > > > When AV worker items where introduced 4 years ago, i was suggested that
> > > > it could be used for other things like cleaning the pending list of GIN
> > > > index when it reaches gin_pending_list_limit instead of making user
> > > > visible operation pay the price.
> > > > 
> > > > That never happened though. So, here is a little patch for that.
> > > > 
> > > > Should I add an entry for this on next commitfest?
> > > +1. It slipped through the cracks along the years. It is even suggested in the
> > > current docs since the fast update support.
> > > 
> > > https://www.postgresql.org/docs/current/gin-tips.html
> > > 
> > 
> > Interesting, that comment maybe needs to be rewritten. I would go for
> > remove completely the first paragraph under gin_pending_list_limit entry
> 
> Thanks for working on this patch.
> 
> I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found
viathe front page of Hacker News: "Debugging random slow writes in PostgreSQL"
(https://iamsafts.com/posts/postgres-gin-performance/).
> 
> I thought it could be interesting to read about a real user story where this patch would be helpful.
> 

A customer here has 20+ GIN indexes in a big heavily used table and
every time one of the indexes reaches gin_pending_list_limit (because of
an insert or update) a user feels the impact.

So, currently we have a cronjob running periodically and checking
pending list sizes to process the index before the limit get fired by an
user operation. While the index still is processed and locked the fact
that doesn't happen in the user face make the process less notorious and
in the mind of users faster.

This will provide the same facility, the process will happen "in the
background".

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: use AV worker items infrastructure for GIN pending list's cleanup

From
"Joel Jacobson"
Date:
On Sat, May 15, 2021, at 08:42, Jaime Casanova wrote:
A customer here has 20+ GIN indexes in a big heavily used table and
every time one of the indexes reaches gin_pending_list_limit (because of
an insert or update) a user feels the impact.

So, currently we have a cronjob running periodically and checking
pending list sizes to process the index before the limit get fired by an
user operation. While the index still is processed and locked the fact
that doesn't happen in the user face make the process less notorious and
in the mind of users faster.

This will provide the same facility, the process will happen "in the
background".

Sounds like a great improvement, many thanks.

/Joel

Re: use AV worker items infrastructure for GIN pending list's cleanup

From
Masahiko Sawada
Date:
On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> Hi,
>
> When AV worker items where introduced 4 years ago, i was suggested that
> it could be used for other things like cleaning the pending list of GIN
> index when it reaches gin_pending_list_limit instead of making user
> visible operation pay the price.
>
> That never happened though. So, here is a little patch for that.

Thank you for working on this.

I like the idea of cleaning the GIN pending list using by autovacuum
work item. But with the patch, we request and skip the pending list
cleanup if the pending list size exceeds gin_pending_list_limit during
insertion. But autovacuum work items are executed after an autovacuum
runs. So if many insertions happen before executing the autovacuum
work item, we will end up greatly exceeding the threshold
(gin_pending_list_limit) and registering the same work item again and
again. Maybe we need something like a soft limit and a hard limit?
That is, if the pending list size exceeds the soft limit, we request
the work item. OTOH, if it exceeds the hard limit
(gin_pending_list_limit) we cleanup the pending list before insertion.
We might also need to have autovacuum work items ignore the work item
if the same work item with the same arguments is already registered.
In addition to that, I think we should avoid the work item for
cleaning the pending list from being executed if an autovacuum runs on
the gin index before executing the work item.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Re: use AV worker items infrastructure for GIN pending list's cleanup

From
Jaime Casanova
Date:
On Mon, May 17, 2021 at 01:46:37PM +0900, Masahiko Sawada wrote:
> On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
> >
> > Hi,
> >
> > When AV worker items where introduced 4 years ago, i was suggested that
> > it could be used for other things like cleaning the pending list of GIN
> > index when it reaches gin_pending_list_limit instead of making user
> > visible operation pay the price.
> >
> > That never happened though. So, here is a little patch for that.
> 
> Thank you for working on this.
> 
> I like the idea of cleaning the GIN pending list using by autovacuum
> work item. But with the patch, we request and skip the pending list
> cleanup if the pending list size exceeds gin_pending_list_limit during
> insertion. But autovacuum work items are executed after an autovacuum
> runs. So if many insertions happen before executing the autovacuum
> work item, we will end up greatly exceeding the threshold
> (gin_pending_list_limit) and registering the same work item again and
> again. Maybe we need something like a soft limit and a hard limit?
> That is, if the pending list size exceeds the soft limit, we request
> the work item. OTOH, if it exceeds the hard limit
> (gin_pending_list_limit) we cleanup the pending list before insertion.
> We might also need to have autovacuum work items ignore the work item
> if the same work item with the same arguments is already registered.
> In addition to that, I think we should avoid the work item for
> cleaning the pending list from being executed if an autovacuum runs on
> the gin index before executing the work item.
> 

Thanks for your comments on this. I have been working on a rebased
version, but ENOTIME right now. 

Will mark this one as "Returned with feedback" and resubmit for
november.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL