Re: autovacuum: change priority of the vacuumed tables - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject Re: autovacuum: change priority of the vacuumed tables
Date
Msg-id 20180219173855.05bd313c@wp.localdomain
Whole thread Raw
In response to Re: autovacuum: change priority of the vacuumed tables  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: autovacuum: change priority of the vacuumed tables
List pgsql-hackers
On Fri, 16 Feb 2018 21:48:14 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
> <i.kurbangaliev@postgrespro.ru> wrote:
> > On Fri, 16 Feb 2018 17:42:34 +0900
> > Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >  
> >> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
> >> <g.smolkin@postgrespro.ru> wrote:  
> >> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
> >> >  
> >> >> Hi,
> >> >>
> >> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> >> >> <i.kurbangaliev@postgrespro.ru> wrote:  
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> Attached patch adds 'autovacuum_table_priority' to the current
> >> >>> list of automatic vacuuming settings. It's used in sorting of
> >> >>> vacuumed tables in autovacuum worker before actual vacuum.
> >> >>>
> >> >>> The idea is to give possibility to the users to prioritize
> >> >>> their tables in autovacuum process.
> >> >>>  
> >> >> Hmm, I couldn't understand the benefit of this patch. Would you
> >> >> elaborate it a little more?
> >> >>
> >> >> Multiple autovacuum worker can work on one database. So even if
> >> >> a table that you want to vacuum first is the back of the list
> >> >> and there other worker would pick up it. If the vacuuming the
> >> >> table gets delayed due to some big tables are in front of that
> >> >> table I think you can deal with it by increasing the number of
> >> >> autovacuum workers.
> >> >>
> >> >> Regards,
> >> >>
> >> >> --
> >> >> Masahiko Sawada
> >> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> >> NTT Open Source Software Center
> >> >>  
> >> >
> >> > Database can contain thousands of tables and often
> >> > updates/deletes concentrate mostly in only a handful of tables.
> >> > Going through thousands of less bloated tables can take ages.
> >> > Currently autovacuum know nothing about prioritizing it`s work
> >> > with respect to user`s understanding of his data and
> >> > application.  
> >>
> >> Understood. I have a question; please imagine the following case.
> >>
> >> Suppose that there are 1000 tables in a database, and one table of
> >> them (table-A) has the highest priority while other 999 tables have
> >> same priority. Almost tables (say 800 tables) including table-A
> >> need to get vacuumed at some point, so with your patch an AV
> >> worker listed 800 tables and table-A will be at the head of the
> >> list. Table-A will get vacuumed first but this AV worker has to
> >> vacuum other 799 tables even if table-A requires vacuum later
> >> again.
> >>
> >> If an another AV worker launches during table-A being vacuumed, the
> >> new AV worker would include table-A but would not process it
> >> because concurrent AV worker is processing it. So it would vacuum
> >> other tables instead. Similarly, this AV worker can not get the
> >> new table list until finish to vacuum all other tables. (Note that
> >> it might skip some tables if they are already vacuumed by other AV
> >> worker.) On the other hand, if another new AV worker launches
> >> after table-A got vacuumed and requires vacuuming again, the new
> >> AV worker puts the table-A at the head of list. It processes
> >> table-A first but, again, it has to vacuum other tables before
> >> getting new table list next time that might include table-A.
> >>
> >> Is this the expected behavior? I'd rather expect postgres to
> >> vacuum it before other lower priority tables whenever the table
> >> having the highest priority requires vacuuming, but it wouldn't.  
> >
> > Yes, this is the expected behavior. The patch is the way to give the
> > user at least some control of the sorting, later it could be
> > extended with something more sophisticated.
> >  
> 
> Since user doesn't know that each AV worker processes tables based on
> its table list that is different from lists that other worker has, I
> think it's hard for user to understand this parameter. I'd say that
> user would expect that high priority table can get vacuumed any time.

Yes, very good point. It could be strange for the user in cases like
that.

> 
> I think what you want to solve is to vacuum some tables preferentially
> if there are many tables requiring vacuuming. Right? If so, I think
> the prioritizing table only in the list would not solve the
> fundamental issue. In the example, table-A will still need to wait for
> other 799 tables to get vacuumed. Table-A will be bloating during
> vacuuming other tables. To deal with it, I think we need something
> queue on the shmem per database in order to control the order of
> tables waiting for vacuuming and need to use it with a smart
> algorithm. Thoughts?

Agree, it would require some shared queue for the autovacuum workers if
we want to prioritize the table across all of them. I will look into
this, and maybe will come up with something.

Masahiko, are you working on this too or just interested with the idea?

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: CURRENT OF causes an error when IndexOnlyScan is used
Next
From: Tom Lane
Date:
Subject: Re: [PROPOSAL] Nepali Snowball dictionary