Thread: autovacuum: change priority of the vacuumed tables

autovacuum: change priority of the vacuumed tables

From
Ildus Kurbangaliev
Date:
Hi,

Attached patch adds 'autovacuum_table_priority' to the current list of
automatic vacuuming settings. It's used in sorting of vacuumed tables in
autovacuum worker before actual vacuum.

The idea is to give possibility to the users to prioritize their tables
in autovacuum process.

-- 
---
Regards,
Ildus Kurbangaliev

Attachment

Re: autovacuum: change priority of the vacuumed tables

From
Masahiko Sawada
Date:
Hi,

On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
<i.kurbangaliev@postgrespro.ru> wrote:
> Hi,
>
> Attached patch adds 'autovacuum_table_priority' to the current list of
> automatic vacuuming settings. It's used in sorting of vacuumed tables in
> autovacuum worker before actual vacuum.
>
> The idea is to give possibility to the users to prioritize their tables
> in autovacuum process.
>

Hmm, I couldn't understand the benefit of this patch. Would you
elaborate it a little more?

Multiple autovacuum worker can work on one database. So even if a
table that you want to vacuum first is the back of the list and there
other worker would pick up it. If the vacuuming the table gets delayed
due to some big tables are in front of that table I think you can deal
with it by increasing the number of autovacuum workers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: autovacuum: change priority of the vacuumed tables

From
Grigory Smolkin
Date:
On 02/15/2018 09:28 AM, Masahiko Sawada wrote:

> Hi,
>
> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> <i.kurbangaliev@postgrespro.ru> wrote:
>> Hi,
>>
>> Attached patch adds 'autovacuum_table_priority' to the current list of
>> automatic vacuuming settings. It's used in sorting of vacuumed tables in
>> autovacuum worker before actual vacuum.
>>
>> The idea is to give possibility to the users to prioritize their tables
>> in autovacuum process.
>>
> Hmm, I couldn't understand the benefit of this patch. Would you
> elaborate it a little more?
>
> Multiple autovacuum worker can work on one database. So even if a
> table that you want to vacuum first is the back of the list and there
> other worker would pick up it. If the vacuuming the table gets delayed
> due to some big tables are in front of that table I think you can deal
> with it by increasing the number of autovacuum workers.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>

Database can contain thousands of tables and often updates/deletes 
concentrate mostly in only a handful of tables.
Going through thousands of less bloated tables can take ages.
Currently autovacuum know nothing about prioritizing it`s work with 
respect to user`s understanding of his data and application.
Also It`s would be great to sort tables according to dead/live tuple 
ratio and relfrozenxid.

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: autovacuum: change priority of the vacuumed tables

From
Masahiko Sawada
Date:
On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote:
> On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>
>> Hi,
>>
>> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>> <i.kurbangaliev@postgrespro.ru> wrote:
>>>
>>> Hi,
>>>
>>> Attached patch adds 'autovacuum_table_priority' to the current list of
>>> automatic vacuuming settings. It's used in sorting of vacuumed tables in
>>> autovacuum worker before actual vacuum.
>>>
>>> The idea is to give possibility to the users to prioritize their tables
>>> in autovacuum process.
>>>
>> Hmm, I couldn't understand the benefit of this patch. Would you
>> elaborate it a little more?
>>
>> Multiple autovacuum worker can work on one database. So even if a
>> table that you want to vacuum first is the back of the list and there
>> other worker would pick up it. If the vacuuming the table gets delayed
>> due to some big tables are in front of that table I think you can deal
>> with it by increasing the number of autovacuum workers.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>>
>
> Database can contain thousands of tables and often updates/deletes
> concentrate mostly in only a handful of tables.
> Going through thousands of less bloated tables can take ages.
> Currently autovacuum know nothing about prioritizing it`s work with respect
> to user`s understanding of his data and application.

Understood. I have a question; please imagine the following case.

Suppose that there are 1000 tables in a database, and one table of
them (table-A) has the highest priority while other 999 tables have
same priority. Almost tables (say 800 tables) including table-A need
to get vacuumed at some point, so with your patch an AV worker listed
800 tables and table-A will be at the head of the list. Table-A will
get vacuumed first but this AV worker has to vacuum other 799 tables
even if table-A requires vacuum later again.

If an another AV worker launches during table-A being vacuumed, the
new AV worker would include table-A but would not process it because
concurrent AV worker is processing it. So it would vacuum other tables
instead. Similarly, this AV worker can not get the new table list
until finish to vacuum all other tables. (Note that it might skip some
tables if they are already vacuumed by other AV worker.) On the other
hand, if another new AV worker launches after table-A got vacuumed and
requires vacuuming again, the new AV worker puts the table-A at the
head of list. It processes table-A first but, again, it has to vacuum
other tables before getting new table list next time that might
include table-A.

Is this the expected behavior? I'd rather expect postgres to vacuum it
before other lower priority tables whenever the table having the
highest priority requires vacuuming, but it wouldn't.

> Also It`s would be great to sort tables according to dead/live tuple ratio
> and relfrozenxid.

Yeah, for anti-wraparound vacuum on the database, it would be good
idea to sort the list by relfrozenxid as discussed on another
thread[1],

[1]
https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: autovacuum: change priority of the vacuumed tables

From
Ildus Kurbangaliev
Date:
On Fri, 16 Feb 2018 17:42:34 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
> <g.smolkin@postgrespro.ru> wrote:
> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
> >  
> >> Hi,
> >>
> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> >> <i.kurbangaliev@postgrespro.ru> wrote:  
> >>>
> >>> Hi,
> >>>
> >>> Attached patch adds 'autovacuum_table_priority' to the current
> >>> list of automatic vacuuming settings. It's used in sorting of
> >>> vacuumed tables in autovacuum worker before actual vacuum.
> >>>
> >>> The idea is to give possibility to the users to prioritize their
> >>> tables in autovacuum process.
> >>>  
> >> Hmm, I couldn't understand the benefit of this patch. Would you
> >> elaborate it a little more?
> >>
> >> Multiple autovacuum worker can work on one database. So even if a
> >> table that you want to vacuum first is the back of the list and
> >> there other worker would pick up it. If the vacuuming the table
> >> gets delayed due to some big tables are in front of that table I
> >> think you can deal with it by increasing the number of autovacuum
> >> workers.
> >>
> >> Regards,
> >>
> >> --
> >> Masahiko Sawada
> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> NTT Open Source Software Center
> >>  
> >
> > Database can contain thousands of tables and often updates/deletes
> > concentrate mostly in only a handful of tables.
> > Going through thousands of less bloated tables can take ages.
> > Currently autovacuum know nothing about prioritizing it`s work with
> > respect to user`s understanding of his data and application.  
> 
> Understood. I have a question; please imagine the following case.
> 
> Suppose that there are 1000 tables in a database, and one table of
> them (table-A) has the highest priority while other 999 tables have
> same priority. Almost tables (say 800 tables) including table-A need
> to get vacuumed at some point, so with your patch an AV worker listed
> 800 tables and table-A will be at the head of the list. Table-A will
> get vacuumed first but this AV worker has to vacuum other 799 tables
> even if table-A requires vacuum later again.
> 
> If an another AV worker launches during table-A being vacuumed, the
> new AV worker would include table-A but would not process it because
> concurrent AV worker is processing it. So it would vacuum other tables
> instead. Similarly, this AV worker can not get the new table list
> until finish to vacuum all other tables. (Note that it might skip some
> tables if they are already vacuumed by other AV worker.) On the other
> hand, if another new AV worker launches after table-A got vacuumed and
> requires vacuuming again, the new AV worker puts the table-A at the
> head of list. It processes table-A first but, again, it has to vacuum
> other tables before getting new table list next time that might
> include table-A.
> 
> Is this the expected behavior? I'd rather expect postgres to vacuum it
> before other lower priority tables whenever the table having the
> highest priority requires vacuuming, but it wouldn't.

Yes, this is the expected behavior. The patch is the way to give the
user at least some control of the sorting, later it could be extended
with something more sophisticated.


-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: autovacuum: change priority of the vacuumed tables

From
Masahiko Sawada
Date:
On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
<i.kurbangaliev@postgrespro.ru> wrote:
> On Fri, 16 Feb 2018 17:42:34 +0900
> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
>> <g.smolkin@postgrespro.ru> wrote:
>> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>> >
>> >> Hi,
>> >>
>> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>> >> <i.kurbangaliev@postgrespro.ru> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Attached patch adds 'autovacuum_table_priority' to the current
>> >>> list of automatic vacuuming settings. It's used in sorting of
>> >>> vacuumed tables in autovacuum worker before actual vacuum.
>> >>>
>> >>> The idea is to give possibility to the users to prioritize their
>> >>> tables in autovacuum process.
>> >>>
>> >> Hmm, I couldn't understand the benefit of this patch. Would you
>> >> elaborate it a little more?
>> >>
>> >> Multiple autovacuum worker can work on one database. So even if a
>> >> table that you want to vacuum first is the back of the list and
>> >> there other worker would pick up it. If the vacuuming the table
>> >> gets delayed due to some big tables are in front of that table I
>> >> think you can deal with it by increasing the number of autovacuum
>> >> workers.
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Masahiko Sawada
>> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> >> NTT Open Source Software Center
>> >>
>> >
>> > Database can contain thousands of tables and often updates/deletes
>> > concentrate mostly in only a handful of tables.
>> > Going through thousands of less bloated tables can take ages.
>> > Currently autovacuum know nothing about prioritizing it`s work with
>> > respect to user`s understanding of his data and application.
>>
>> Understood. I have a question; please imagine the following case.
>>
>> Suppose that there are 1000 tables in a database, and one table of
>> them (table-A) has the highest priority while other 999 tables have
>> same priority. Almost tables (say 800 tables) including table-A need
>> to get vacuumed at some point, so with your patch an AV worker listed
>> 800 tables and table-A will be at the head of the list. Table-A will
>> get vacuumed first but this AV worker has to vacuum other 799 tables
>> even if table-A requires vacuum later again.
>>
>> If an another AV worker launches during table-A being vacuumed, the
>> new AV worker would include table-A but would not process it because
>> concurrent AV worker is processing it. So it would vacuum other tables
>> instead. Similarly, this AV worker can not get the new table list
>> until finish to vacuum all other tables. (Note that it might skip some
>> tables if they are already vacuumed by other AV worker.) On the other
>> hand, if another new AV worker launches after table-A got vacuumed and
>> requires vacuuming again, the new AV worker puts the table-A at the
>> head of list. It processes table-A first but, again, it has to vacuum
>> other tables before getting new table list next time that might
>> include table-A.
>>
>> Is this the expected behavior? I'd rather expect postgres to vacuum it
>> before other lower priority tables whenever the table having the
>> highest priority requires vacuuming, but it wouldn't.
>
> Yes, this is the expected behavior. The patch is the way to give the
> user at least some control of the sorting, later it could be extended
> with something more sophisticated.
>

Since user doesn't know that each AV worker processes tables based on
its table list that is different from lists that other worker has, I
think it's hard for user to understand this parameter. I'd say that
user would expect that high priority table can get vacuumed any time.

I think what you want to solve is to vacuum some tables preferentially
if there are many tables requiring vacuuming. Right? If so, I think
the prioritizing table only in the list would not solve the
fundamental issue. In the example, table-A will still need to wait for
other 799 tables to get vacuumed. Table-A will be bloating during
vacuuming other tables. To deal with it, I think we need something
queue on the shmem per database in order to control the order of
tables waiting for vacuuming and need to use it with a smart
algorithm. Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: autovacuum: change priority of the vacuumed tables

From
Ildus Kurbangaliev
Date:
On Fri, 16 Feb 2018 21:48:14 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
> <i.kurbangaliev@postgrespro.ru> wrote:
> > On Fri, 16 Feb 2018 17:42:34 +0900
> > Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >  
> >> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
> >> <g.smolkin@postgrespro.ru> wrote:  
> >> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
> >> >  
> >> >> Hi,
> >> >>
> >> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> >> >> <i.kurbangaliev@postgrespro.ru> wrote:  
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> Attached patch adds 'autovacuum_table_priority' to the current
> >> >>> list of automatic vacuuming settings. It's used in sorting of
> >> >>> vacuumed tables in autovacuum worker before actual vacuum.
> >> >>>
> >> >>> The idea is to give possibility to the users to prioritize
> >> >>> their tables in autovacuum process.
> >> >>>  
> >> >> Hmm, I couldn't understand the benefit of this patch. Would you
> >> >> elaborate it a little more?
> >> >>
> >> >> Multiple autovacuum worker can work on one database. So even if
> >> >> a table that you want to vacuum first is the back of the list
> >> >> and there other worker would pick up it. If the vacuuming the
> >> >> table gets delayed due to some big tables are in front of that
> >> >> table I think you can deal with it by increasing the number of
> >> >> autovacuum workers.
> >> >>
> >> >> Regards,
> >> >>
> >> >> --
> >> >> Masahiko Sawada
> >> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> >> NTT Open Source Software Center
> >> >>  
> >> >
> >> > Database can contain thousands of tables and often
> >> > updates/deletes concentrate mostly in only a handful of tables.
> >> > Going through thousands of less bloated tables can take ages.
> >> > Currently autovacuum know nothing about prioritizing it`s work
> >> > with respect to user`s understanding of his data and
> >> > application.  
> >>
> >> Understood. I have a question; please imagine the following case.
> >>
> >> Suppose that there are 1000 tables in a database, and one table of
> >> them (table-A) has the highest priority while other 999 tables have
> >> same priority. Almost tables (say 800 tables) including table-A
> >> need to get vacuumed at some point, so with your patch an AV
> >> worker listed 800 tables and table-A will be at the head of the
> >> list. Table-A will get vacuumed first but this AV worker has to
> >> vacuum other 799 tables even if table-A requires vacuum later
> >> again.
> >>
> >> If an another AV worker launches during table-A being vacuumed, the
> >> new AV worker would include table-A but would not process it
> >> because concurrent AV worker is processing it. So it would vacuum
> >> other tables instead. Similarly, this AV worker can not get the
> >> new table list until finish to vacuum all other tables. (Note that
> >> it might skip some tables if they are already vacuumed by other AV
> >> worker.) On the other hand, if another new AV worker launches
> >> after table-A got vacuumed and requires vacuuming again, the new
> >> AV worker puts the table-A at the head of list. It processes
> >> table-A first but, again, it has to vacuum other tables before
> >> getting new table list next time that might include table-A.
> >>
> >> Is this the expected behavior? I'd rather expect postgres to
> >> vacuum it before other lower priority tables whenever the table
> >> having the highest priority requires vacuuming, but it wouldn't.  
> >
> > Yes, this is the expected behavior. The patch is the way to give the
> > user at least some control of the sorting, later it could be
> > extended with something more sophisticated.
> >  
> 
> Since user doesn't know that each AV worker processes tables based on
> its table list that is different from lists that other worker has, I
> think it's hard for user to understand this parameter. I'd say that
> user would expect that high priority table can get vacuumed any time.

Yes, very good point. It could be strange for the user in cases like
that.

> 
> I think what you want to solve is to vacuum some tables preferentially
> if there are many tables requiring vacuuming. Right? If so, I think
> the prioritizing table only in the list would not solve the
> fundamental issue. In the example, table-A will still need to wait for
> other 799 tables to get vacuumed. Table-A will be bloating during
> vacuuming other tables. To deal with it, I think we need something
> queue on the shmem per database in order to control the order of
> tables waiting for vacuuming and need to use it with a smart
> algorithm. Thoughts?

Agree, it would require some shared queue for the autovacuum workers if
we want to prioritize the table across all of them. I will look into
this, and maybe will come up with something.

Masahiko, are you working on this too or just interested with the idea?

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: autovacuum: change priority of the vacuumed tables

From
Tomas Vondra
Date:

On 02/19/2018 03:38 PM, Ildus Kurbangaliev wrote:
> On Fri, 16 Feb 2018 21:48:14 +0900
> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
>> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
>> <i.kurbangaliev@postgrespro.ru> wrote:
>>> On Fri, 16 Feb 2018 17:42:34 +0900
>>> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>  
>>>> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
>>>> <g.smolkin@postgrespro.ru> wrote:  
>>>>> On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>>>>>  
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>>>>>> <i.kurbangaliev@postgrespro.ru> wrote:  
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Attached patch adds 'autovacuum_table_priority' to the current
>>>>>>> list of automatic vacuuming settings. It's used in sorting of
>>>>>>> vacuumed tables in autovacuum worker before actual vacuum.
>>>>>>>
>>>>>>> The idea is to give possibility to the users to prioritize
>>>>>>> their tables in autovacuum process.
>>>>>>>  
>>>>>> Hmm, I couldn't understand the benefit of this patch. Would you
>>>>>> elaborate it a little more?
>>>>>>
>>>>>> Multiple autovacuum worker can work on one database. So even if
>>>>>> a table that you want to vacuum first is the back of the list
>>>>>> and there other worker would pick up it. If the vacuuming the
>>>>>> table gets delayed due to some big tables are in front of that
>>>>>> table I think you can deal with it by increasing the number of
>>>>>> autovacuum workers.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> --
>>>>>> Masahiko Sawada
>>>>>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>>>>>> NTT Open Source Software Center
>>>>>>  
>>>>>
>>>>> Database can contain thousands of tables and often
>>>>> updates/deletes concentrate mostly in only a handful of tables.
>>>>> Going through thousands of less bloated tables can take ages.
>>>>> Currently autovacuum know nothing about prioritizing it`s work
>>>>> with respect to user`s understanding of his data and
>>>>> application.  
>>>>
>>>> Understood. I have a question; please imagine the following case.
>>>>
>>>> Suppose that there are 1000 tables in a database, and one table of
>>>> them (table-A) has the highest priority while other 999 tables have
>>>> same priority. Almost tables (say 800 tables) including table-A
>>>> need to get vacuumed at some point, so with your patch an AV
>>>> worker listed 800 tables and table-A will be at the head of the
>>>> list. Table-A will get vacuumed first but this AV worker has to
>>>> vacuum other 799 tables even if table-A requires vacuum later
>>>> again.
>>>>
>>>> If an another AV worker launches during table-A being vacuumed, the
>>>> new AV worker would include table-A but would not process it
>>>> because concurrent AV worker is processing it. So it would vacuum
>>>> other tables instead. Similarly, this AV worker can not get the
>>>> new table list until finish to vacuum all other tables. (Note that
>>>> it might skip some tables if they are already vacuumed by other AV
>>>> worker.) On the other hand, if another new AV worker launches
>>>> after table-A got vacuumed and requires vacuuming again, the new
>>>> AV worker puts the table-A at the head of list. It processes
>>>> table-A first but, again, it has to vacuum other tables before
>>>> getting new table list next time that might include table-A.
>>>>
>>>> Is this the expected behavior? I'd rather expect postgres to
>>>> vacuum it before other lower priority tables whenever the table
>>>> having the highest priority requires vacuuming, but it wouldn't.  
>>>
>>> Yes, this is the expected behavior. The patch is the way to give the
>>> user at least some control of the sorting, later it could be
>>> extended with something more sophisticated.
>>>  
>>
>> Since user doesn't know that each AV worker processes tables based on
>> its table list that is different from lists that other worker has, I
>> think it's hard for user to understand this parameter. I'd say that
>> user would expect that high priority table can get vacuumed any time.
> 
> Yes, very good point. It could be strange for the user in cases like
> that.
> 
>>
>> I think what you want to solve is to vacuum some tables preferentially
>> if there are many tables requiring vacuuming. Right? If so, I think
>> the prioritizing table only in the list would not solve the
>> fundamental issue. In the example, table-A will still need to wait for
>> other 799 tables to get vacuumed. Table-A will be bloating during
>> vacuuming other tables. To deal with it, I think we need something
>> queue on the shmem per database in order to control the order of
>> tables waiting for vacuuming and need to use it with a smart
>> algorithm. Thoughts?
> 
> Agree, it would require some shared queue for the autovacuum workers if
> we want to prioritize the table across all of them. I will look into
> this, and maybe will come up with something.
> 
> Masahiko, are you working on this too or just interested with the idea?
> 

Hello,

I have a hard time understanding how adding yet another autovacuum
table-level knob makes the DBA's life any easier. Especially when the
behavior is so unreliable and does not really guarantee when the
high-priority table will be cleaned up ...

As you mentioned at the beginning of this thread, you suggested this
patch would be useful to prioritize cleanup of important tables that
receive a lot of updates/deletes. In such cases, however, people just
lower vacuum_scale_factor - either for those high-priority tables, or
globally. This usually does the trick, by doing the cleanup more
frequently and in smaller chunks.

When this does not work, it's usually because all the workers are busy
processing other tables, and everything else has to wait in the queue.

In other words, the autovacuum throughput is too low and can't keep up
with the database activity, either because of hardware bottlenecks or
autovacuum cost limit set too low.

This proposed patch does not really improve that, as the tables are
sorted by priority only once at the beginning (in each worker). As
explained by Masahiko-san in one of the previous e-mails, it means the
patch does not guarantee timely cleanup of the high-priority tables.

And you can't really fix that by making the workers refresh the list of
tables to vacuum more often, because that could easily mean only
high-priority tables are cleaned up and nothing else. That would be
pretty disastrous, obviously.

So I don't think this is a very promising approach, unfortunately.

What I think might work is having a separate pool of autovac workers,
dedicated to these high-priority tables. That still would not guarantee
the high-priority tables are vacuumed immediately, but at least that
they are not stuck in the worker queue behind low-priority ones.

I wonder if we could detect tables with high update/delete activity and
promote them to high-priority automatically. The reasoning is that by
delaying the cleanup for those tables would result in significantly more
bloat than for those with low update/delete activity.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: autovacuum: change priority of the vacuumed tables

From
Andres Freund
Date:
Hi,

On 2018-02-19 17:00:34 +0100, Tomas Vondra wrote:
> I have a hard time understanding how adding yet another autovacuum
> table-level knob makes the DBA's life any easier. Especially when the
> behavior is so unreliable and does not really guarantee when the
> high-priority table will be cleaned up ...

Based on the criticism voiced and a quick skim of the proposal, this CF
entry appears to still be in its design phase. In my opinion this thus
isn't v11 material, and should be marked as 'returned with feedback'?


Greetings,

Andres Freund


Re: autovacuum: change priority of the vacuumed tables

From
Ildus Kurbangaliev
Date:
On Thu, 1 Mar 2018 23:39:34 -0800
Andres Freund <andres@anarazel.de> wrote:

> Hi,
> 
> On 2018-02-19 17:00:34 +0100, Tomas Vondra wrote:
> > I have a hard time understanding how adding yet another autovacuum
> > table-level knob makes the DBA's life any easier. Especially when
> > the behavior is so unreliable and does not really guarantee when the
> > high-priority table will be cleaned up ...  
> 
> Based on the criticism voiced and a quick skim of the proposal, this
> CF entry appears to still be in its design phase. In my opinion this
> thus isn't v11 material, and should be marked as 'returned with
> feedback'?

Hi, I agree, this patch definitely needs more thinking.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: autovacuum: change priority of the vacuumed tables

From
David Steele
Date:
On 3/2/18 4:07 AM, Ildus Kurbangaliev wrote:
> On Thu, 1 Mar 2018 23:39:34 -0800
> Andres Freund <andres@anarazel.de> wrote:
> 
>> Hi,
>>
>> On 2018-02-19 17:00:34 +0100, Tomas Vondra wrote:
>>> I have a hard time understanding how adding yet another autovacuum
>>> table-level knob makes the DBA's life any easier. Especially when
>>> the behavior is so unreliable and does not really guarantee when the
>>> high-priority table will be cleaned up ...  
>>
>> Based on the criticism voiced and a quick skim of the proposal, this
>> CF entry appears to still be in its design phase. In my opinion this
>> thus isn't v11 material, and should be marked as 'returned with
>> feedback'?
> 
> Hi, I agree, this patch definitely needs more thinking.

Marked as Returned with Feedback.

Thanks,
-- 
-David
david@pgmasters.net


Re: autovacuum: change priority of the vacuumed tables

From
Jim Nasby
Date:
On 2/19/18 10:00 AM, Tomas Vondra wrote:
> So I don't think this is a very promising approach, unfortunately.
> 
> What I think might work is having a separate pool of autovac workers,
> dedicated to these high-priority tables. That still would not guarantee
> the high-priority tables are vacuumed immediately, but at least that
> they are not stuck in the worker queue behind low-priority ones.
> 
> I wonder if we could detect tables with high update/delete activity and
> promote them to high-priority automatically. The reasoning is that by
> delaying the cleanup for those tables would result in significantly more
> bloat than for those with low update/delete activity.

I've looked at this stuff in the past, and I think that the first step 
in trying to improve autovacuum needs to be allowing for a much more 
granular means of controlling worker table selection, and exposing that 
ability. There are simply too many different scenarios to try and 
account for to try and make a single policy that will satisfy everyone. 
Just as a simple example, OLTP databases (especially with small queue 
tables) have very different vacuum needs than data warehouses.

One fairly simple option would be to simply replace the logic that 
currently builds a worker's table list with running a query via SPI. 
That would allow for prioritizing important tables. It could also reduce 
the problem of workers getting "stuck" on a ton of large tables by 
taking into consideration the total number of pages/tuples a list contains.

A more fine-grained approach would be to have workers make a new 
selection after every vacuum they complete. That would provide the 
ultimate in control, since you'd be able to see exactly what all the 
other workers are doing.
-- 
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com


Re: autovacuum: change priority of the vacuumed tables

From
Tomas Vondra
Date:
On 03/03/2018 08:32 PM, Jim Nasby wrote:
> On 2/19/18 10:00 AM, Tomas Vondra wrote:
>> So I don't think this is a very promising approach, unfortunately.
>>
>> What I think might work is having a separate pool of autovac workers,
>> dedicated to these high-priority tables. That still would not guarantee
>> the high-priority tables are vacuumed immediately, but at least that
>> they are not stuck in the worker queue behind low-priority ones.
>>
>> I wonder if we could detect tables with high update/delete activity and
>> promote them to high-priority automatically. The reasoning is that by
>> delaying the cleanup for those tables would result in significantly more
>> bloat than for those with low update/delete activity.
> 
> I've looked at this stuff in the past, and I think that the first step
> in trying to improve autovacuum needs to be allowing for a much more
> granular means of controlling worker table selection, and exposing that
> ability. There are simply too many different scenarios to try and
> account for to try and make a single policy that will satisfy everyone.
> Just as a simple example, OLTP databases (especially with small queue
> tables) have very different vacuum needs than data warehouses.
> 

That largely depends on what knobs would be exposed. I'm against adding
some low-level knobs that perhaps 1% of the users will know how to tune,
and the rest will set it incorrectly. Some high-level options that would
specify the workload type might work, but I have no idea about details.

> One fairly simple option would be to simply replace the logic that
> currently builds a worker's table list with running a query via SPI.
> That would allow for prioritizing important tables. It could also reduce
> the problem of workers getting "stuck" on a ton of large tables by
> taking into consideration the total number of pages/tuples a list contains.
> 

I don't see why SPI would be needed to do that, i.e. why couldn't we
implement such prioritization with the current approach. Another thing
is I really doubt prioritizing "important tables" is an good solution,
as it does not really guarantee anything.

> A more fine-grained approach would be to have workers make a new
> selection after every vacuum they complete. That would provide the
> ultimate in control, since you'd be able to see exactly what all the
> other workers are doing.

That was proposed earlier in this thread, and the issue is it may starve
all the other tables when the "important" tables need cleanup all the time.

regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: autovacuum: change priority of the vacuumed tables

From
Jim Nasby
Date:
On 3/3/18 2:53 PM, Tomas Vondra wrote:
> That largely depends on what knobs would be exposed. I'm against adding
> some low-level knobs that perhaps 1% of the users will know how to tune,
> and the rest will set it incorrectly. Some high-level options that would
> specify the workload type might work, but I have no idea about details.

Not knowing about details is why we've been stuck here for years: it's 
not terribly obvious how to create a scheduler that is going to work in 
all situations. Current autovac is great for 80% of situations, but it 
simply doesn't handle the remaining 20% by itself. Once you're pushing 
your IO limits you *have* to start scheduling manual vacuums for any 
critical tables.

At least if we exposed some low level ability to control autovac workers 
then others could create tools to improve the situation. Currently 
that's not possible because manual vacuum lacks features that autovac has.

>> One fairly simple option would be to simply replace the logic that
>> currently builds a worker's table list with running a query via SPI.
>> That would allow for prioritizing important tables. It could also reduce
>> the problem of workers getting "stuck" on a ton of large tables by
>> taking into consideration the total number of pages/tuples a list contains.
>>
> I don't see why SPI would be needed to do that, i.e. why couldn't we
> implement such prioritization with the current approach. Another thing

Sure, it's just a SMOC. But most of the issue here is actually a query 
problem. I suspect that the current code would actually shrink if 
converted to SPI. In any case, I'm not wed to that idea.

> is I really doubt prioritizing "important tables" is an good solution,
> as it does not really guarantee anything.

If by "important" you mean small tables with high update rates, 
prioritizing those actually would help as long as you have free workers. 
By itself it doesn't gain all that much though.

>> A more fine-grained approach would be to have workers make a new
>> selection after every vacuum they complete. That would provide the
>> ultimate in control, since you'd be able to see exactly what all the
>> other workers are doing.
> That was proposed earlier in this thread, and the issue is it may starve
> all the other tables when the "important" tables need cleanup all the time.

There's plenty of other ways to shoot yourself in the foot in that 
regard already. We can always have safeguards in place if we get too 
close to wrap-around, just like we currently do.
-- 
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com


Re: autovacuum: change priority of the vacuumed tables

From
Tomas Vondra
Date:
On 03/03/2018 10:21 PM, Jim Nasby wrote:
> On 3/3/18 2:53 PM, Tomas Vondra wrote:
>> That largely depends on what knobs would be exposed. I'm against adding
>> some low-level knobs that perhaps 1% of the users will know how to tune,
>> and the rest will set it incorrectly. Some high-level options that would
>> specify the workload type might work, but I have no idea about details.
> 
> Not knowing about details is why we've been stuck here for years: it's
> not terribly obvious how to create a scheduler that is going to work in
> all situations. Current autovac is great for 80% of situations, but it
> simply doesn't handle the remaining 20% by itself. Once you're pushing
> your IO limits you *have* to start scheduling manual vacuums for any
> critical tables.
> 
> At least if we exposed some low level ability to control autovac
> workers then others could create tools to improve the situation.
> Currently that's not possible because manual vacuum lacks features
> that autovac has.
> 

I have my doubts about both points - usefulness of low-level controls
and viability of tools built on them.

Firstly, my hunch is that if we knew what low-level controls to expose,
it would pretty much how to implement the tool internally. Exposing
something just because you home someone will find a use for that seems
like a dead-end to me. So, which controls would you expose?

Second, all the statistics used to decide which tables need vacuuming
are already exposed, and we have things like bgworkers etc. So you could
go and write a custom autovacuum today - copy the autovacuum code, tweak
the scheduling, done. Yet no such tool emerged yet. Why is that?

>>> One fairly simple option would be to simply replace the logic
>>> that currently builds a worker's table list with running a query
>>> via SPI. That would allow for prioritizing important tables. It
>>> could also reduce the problem of workers getting "stuck" on a ton
>>> of large tables by taking into consideration the total number of
>>> pages/tuples a list contains.
>>>
>> I don't see why SPI would be needed to do that, i.e. why couldn't
>> we implement such prioritization with the current approach. Another
>> thing
> 
> Sure, it's just a SMOC. But most of the issue here is actually a
> query problem. I suspect that the current code would actually shrink
> if converted to SPI. In any case, I'm not wed to that idea.
> 

I disagree this a "query problem" - it certainly is not the case that
simply prioritizing the tables differently will make a difference. Or
more precisely, it certainly does not solve the autovacuum issues I'm
thinking about. I have no idea which issues are you trying to solve,
because you haven't really described those.

>> is I really doubt prioritizing "important tables" is an good solution,
>> as it does not really guarantee anything.
> 
> If by "important" you mean small tables with high update rates,
> prioritizing those actually would help as long as you have free workers.
> By itself it doesn't gain all that much though.
> 

Which is why I mentioned we could have separate pools of autovacuum
workers - one for regular tables, one for "important" ones.

>>> A more fine-grained approach would be to have workers make a new 
>>> selection after every vacuum they complete. That would provide
>>> the ultimate in control, since you'd be able to see exactly what
>>> all the other workers are doing.
>> That was proposed earlier in this thread, and the issue is it may
>> starve all the other tables when the "important" tables need
>> cleanup all the time.
> 
> There's plenty of other ways to shoot yourself in the foot in that 
> regard already. We can always have safeguards in place if we get too 
> close to wrap-around, just like we currently do.

I haven't mentioned wraparound at all.

My point is that if you entirely ignore some tables because "important"
ones need constant cleanup (saturating the total autovacuum capacity),
then you'll end up with extreme bloat in those other tables. And then
you will need to do more work to clean them up, which will likely cause
delays in cleaning up the important ones.


FWIW I find this discussion rather too hand-wavy, because I have no idea
what controls would you like to expose, etc. If you have an idea, please
write a patch or at least a proposal explaining the details.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: autovacuum: change priority of the vacuumed tables

From
Marco Garavello
Date:
Hello all,

I've just read this thread as I googled for "how to give to autovacuum priority on a table"
I'm looking this because I have large table with lot of inserts/updates, and to improve the system I want to force the first free autovacuum worker to work on that table, of course if condition "such table is in need-to-be-vacuumed-state (depending on autovacuum_scale_factor parameter)" is satisfed.
I found that autovacuum_scale_factor can be specified for each table, so I think this could do the trick, isn't it?
From my understanding specifying it lower on the target table should give an implicit higher priority to that table.


Distinti Saluti / Kind Regards

Marco Garavello | IoT Cloud Operations Specialist



Il giorno ven 14 gen 2022 alle ore 12:55 Tomas Vondra <tomas.vondra@2ndquadrant.com> ha scritto:

On 03/03/2018 10:21 PM, Jim Nasby wrote:
> On 3/3/18 2:53 PM, Tomas Vondra wrote:
>> That largely depends on what knobs would be exposed. I'm against adding
>> some low-level knobs that perhaps 1% of the users will know how to tune,
>> and the rest will set it incorrectly. Some high-level options that would
>> specify the workload type might work, but I have no idea about details.
>
> Not knowing about details is why we've been stuck here for years: it's
> not terribly obvious how to create a scheduler that is going to work in
> all situations. Current autovac is great for 80% of situations, but it
> simply doesn't handle the remaining 20% by itself. Once you're pushing
> your IO limits you *have* to start scheduling manual vacuums for any
> critical tables.
>
> At least if we exposed some low level ability to control autovac
> workers then others could create tools to improve the situation.
> Currently that's not possible because manual vacuum lacks features
> that autovac has.
>

I have my doubts about both points - usefulness of low-level controls
and viability of tools built on them.

Firstly, my hunch is that if we knew what low-level controls to expose,
it would pretty much how to implement the tool internally. Exposing
something just because you home someone will find a use for that seems
like a dead-end to me. So, which controls would you expose?

Second, all the statistics used to decide which tables need vacuuming
are already exposed, and we have things like bgworkers etc. So you could
go and write a custom autovacuum today - copy the autovacuum code, tweak
the scheduling, done. Yet no such tool emerged yet. Why is that?

>>> One fairly simple option would be to simply replace the logic
>>> that currently builds a worker's table list with running a query
>>> via SPI. That would allow for prioritizing important tables. It
>>> could also reduce the problem of workers getting "stuck" on a ton
>>> of large tables by taking into consideration the total number of
>>> pages/tuples a list contains.
>>>
>> I don't see why SPI would be needed to do that, i.e. why couldn't
>> we implement such prioritization with the current approach. Another
>> thing
>
> Sure, it's just a SMOC. But most of the issue here is actually a
> query problem. I suspect that the current code would actually shrink
> if converted to SPI. In any case, I'm not wed to that idea.
>

I disagree this a "query problem" - it certainly is not the case that
simply prioritizing the tables differently will make a difference. Or
more precisely, it certainly does not solve the autovacuum issues I'm
thinking about. I have no idea which issues are you trying to solve,
because you haven't really described those.

>> is I really doubt prioritizing "important tables" is an good solution,
>> as it does not really guarantee anything.
>
> If by "important" you mean small tables with high update rates,
> prioritizing those actually would help as long as you have free workers.
> By itself it doesn't gain all that much though.
>

Which is why I mentioned we could have separate pools of autovacuum
workers - one for regular tables, one for "important" ones.

>>> A more fine-grained approach would be to have workers make a new
>>> selection after every vacuum they complete. That would provide
>>> the ultimate in control, since you'd be able to see exactly what
>>> all the other workers are doing.
>> That was proposed earlier in this thread, and the issue is it may
>> starve all the other tables when the "important" tables need
>> cleanup all the time.
>
> There's plenty of other ways to shoot yourself in the foot in that
> regard already. We can always have safeguards in place if we get too
> close to wrap-around, just like we currently do.

I haven't mentioned wraparound at all.

My point is that if you entirely ignore some tables because "important"
ones need constant cleanup (saturating the total autovacuum capacity),
then you'll end up with extreme bloat in those other tables. And then
you will need to do more work to clean them up, which will likely cause
delays in cleaning up the important ones.


FWIW I find this discussion rather too hand-wavy, because I have no idea
what controls would you like to expose, etc. If you have an idea, please
write a patch or at least a proposal explaining the details.

regards

--
Tomas Vondra                  https://urldefense.com/v3/__http://www.2ndQuadrant.com__;!!Ck4v2Rc!h1TqYTvzyQn1mDOR119nZYsN26gRVhRZ3UIT7sQT60SgRR7833KpQ3CtJ6l0JpMIEt3JBbd09ZZ9-N3U5MHRn7r7RU4NEiRE$
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services