Re: maintenance_work_mem used by Vacuum - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: maintenance_work_mem used by Vacuum
Date
Msg-id CAA4eK1K7ZzKJ5TitbSoo4BvMSdM_koJH=ac2JpYBF2vepmG01A@mail.gmail.com
Whole thread Raw
In response to Re: maintenance_work_mem used by Vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: maintenance_work_mem used by Vacuum
Re: maintenance_work_mem used by Vacuum
List pgsql-hackers
On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > >
> > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > thought originally.
> > > >
> > >
> > > One idea to something better could be to check, if there is a GIN
> > > index on a table, then use 1/4 (25% or whatever) of
> > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > maintenance_work_mem for collection dead tuples.
> > >
> >
> > I felt that it would not be easy for users to tune
> > maintenance_work_mem which controls more than one things.  If this is
> > an index AM(GIN) specific issue we might rather want to control the
> > memory limit of pending list cleanup by a separate GUC parameter like
> > gin_pending_list_limit, say gin_pending_list_work_mem.

Sure, by having another work_mem parameter for the Gin indexes which
controls when we need to flush the pending list will make life easier
as a programmer.  I think if we have a specific parameter for this
purpose, then we can even think of using the same for a clean up
during insert operation as well.  However, I am not sure how easy it
would be for users?  Basically, now they need to remember another
parameter and for which there is no easy way to know what should be
the value.  I think one has to check
gin_metapage_info->n_pending_pages and then based on that they can
configure the value for this parameter to get the maximum benefit
possible.

Can we think of using work_mem for this?  Basically, we use work_mem
during insert operation, so why not use it during vacuum operation for
this purpose?

Another idea could be to try to divide the maintenance_work_mem
smartly if we know the value of pending_pages for each Gin index, but
I think for that we need to either read the metapage of maybe use some
sort of stats which can be used by vacuum.  We need to somehow divide
it based on the amount of memory required for a number of dead tuples
in heap and memory required by tuples in the pending list.  I am not
sure how feasible is this approach.

About difficulty for users tuning one or two parameters for vacuum, I
think if they can compute what could be the values for Guc's
separately, then why can't they add up and set it as one value.
Having said that, I am not denying that having a separate parameter
gives better control, and for this specific case using separate
parameter can allow us to use it both during vacuum and insert
operations.

> > And we can
> > either set the  (the memory for GIN pending list cleanup / # of GIN
> > indexes) to the parallel workers.
> >
> IMHO if we do that then we will loose the meaning of having
> maintenance_work_mem right?  Then user can not control that how much
> memory the autovacuum worker will use.
>

I am not sure how different it is from the current situation?
Basically, now it can use up to 2 * maintenance_work_mem memory and if
we do what Sawada-San is proposing, then it will be
maintenance_work_mem + gin_*_work_mem.  Do you have some other
alternative idea in mind or you think the current situation is better
than anything else we can do in this area?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: maintenance_work_mem used by Vacuum
Next
From: Alvaro Herrera
Date:
Subject: Re: dropping column prevented due to inherited index