Re: autovacuum: change priority of the vacuumed tables - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: autovacuum: change priority of the vacuumed tables
Date
Msg-id CAD21AoC9fwtneY00pRTBasFbyDPA=gcv-31pZhTqkPHrg9VA6w@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum: change priority of the vacuumed tables  (Grigory Smolkin <g.smolkin@postgrespro.ru>)
Responses Re: autovacuum: change priority of the vacuumed tables
List pgsql-hackers
On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
<g.smolkin@postgrespro.ru> wrote:
> On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>
>> Hi,
>>
>> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>> <i.kurbangaliev@postgrespro.ru> wrote:
>>>
>>> Hi,
>>>
>>> Attached patch adds 'autovacuum_table_priority' to the current list of
>>> automatic vacuuming settings. It's used in sorting of vacuumed tables in
>>> autovacuum worker before actual vacuum.
>>>
>>> The idea is to give possibility to the users to prioritize their tables
>>> in autovacuum process.
>>>
>> Hmm, I couldn't understand the benefit of this patch. Would you
>> elaborate it a little more?
>>
>> Multiple autovacuum worker can work on one database. So even if a
>> table that you want to vacuum first is the back of the list and there
>> other worker would pick up it. If the vacuuming the table gets delayed
>> due to some big tables are in front of that table I think you can deal
>> with it by increasing the number of autovacuum workers.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>>
>
> Database can contain thousands of tables and often updates/deletes
> concentrate mostly in only a handful of tables.
> Going through thousands of less bloated tables can take ages.
> Currently autovacuum know nothing about prioritizing it`s work with respect
> to user`s understanding of his data and application.

Understood. I have a question; please imagine the following case.

Suppose that there are 1000 tables in a database, and one table of
them (table-A) has the highest priority while other 999 tables have
same priority. Almost tables (say 800 tables) including table-A need
to get vacuumed at some point, so with your patch an AV worker listed
800 tables and table-A will be at the head of the list. Table-A will
get vacuumed first but this AV worker has to vacuum other 799 tables
even if table-A requires vacuum later again.

If an another AV worker launches during table-A being vacuumed, the
new AV worker would include table-A but would not process it because
concurrent AV worker is processing it. So it would vacuum other tables
instead. Similarly, this AV worker can not get the new table list
until finish to vacuum all other tables. (Note that it might skip some
tables if they are already vacuumed by other AV worker.) On the other
hand, if another new AV worker launches after table-A got vacuumed and
requires vacuuming again, the new AV worker puts the table-A at the
head of list. It processes table-A first but, again, it has to vacuum
other tables before getting new table list next time that might
include table-A.

Is this the expected behavior? I'd rather expect postgres to vacuum it
before other lower priority tables whenever the table having the
highest priority requires vacuuming, but it wouldn't.

> Also It`s would be great to sort tables according to dead/live tuple ratio
> and relfrozenxid.

Yeah, for anti-wraparound vacuum on the database, it would be good
idea to sort the list by relfrozenxid as discussed on another
thread[1],

[1]
https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: reorganizing partitioning code
Next
From: Michail Nikolaev
Date:
Subject: Re: Contention preventing locking