Re: autovacuum big table taking hours and sometimes seconds - Mailing list pgsql-performance

From Mariel Cherkassky
Subject Re: autovacuum big table taking hours and sometimes seconds
Date
Msg-id CA+t6e1kH8KtHDEWHVnGq3As1+E7Wut0qk9M-W5G+0L-6EVYgrA@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum big table taking hours and sometimes seconds  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: autovacuum big table taking hours and sometimes seconds  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Just to make sure that I understood : 
-By increasing the cost_limit or decreasing the cost of the page_cost we can decrease the time it takes the autovacuum process to vacuum a specific table. 
-The vacuum threshold/scale are used to decide how often the table will be vacuum and not how long it should take.

I have 3 questions : 
1)To what value do you recommend to increase the vacuum cost_limit ? 2000 seems reasonable ? Or maybe its better to leave it as default and assign a specific value for big tables ?
2)When the autovacuum reaches the cost_limit while trying to vacuum a specific table, it wait nap_time seconds and then it continue to work on the same table ? 
3)So in case I have a table that keeps growing (not fast because I set the vacuum_scale_factor to 0 and the autovacuum_vacuum_threshold to 10000). If the table keep growing it means I should try to increase the cost right ? Do you see any other option ? The table represent sessions of my system so basically from my point of view I should have almost the same amount of sessions every day and the table shouldn't grow dramatically but before changing the vacuum threshold/factor it happened. As I mentioned in my first comment there is a byte column and therefore the toasted table is the problematic here.

‫בתאריך יום ה׳, 7 בפבר׳ 2019 ב-0:34 מאת ‪David Rowley‬‏ <‪david.rowley@2ndquadrant.com‬‏>:‬
On Thu, 7 Feb 2019 at 02:34, Mariel Cherkassky
<mariel.cherkassky@gmail.com> wrote:
> As I said, I set the next settings for the toasted table :
>
>  alter table orig_table set (toast.autovacuum_vacuum_scale_factor = 0);
>
>   alter table orig_table set (toast.autovacuum_vacuum_threshold  =10000);

These settings don't control how fast auto-vacuum runs, just when it should run.

> Can you explain a little bit more why you decided that the autovacuum spent it time on sleeping ?

Yeah, if you look at the following settings.

 vacuum_cost_limit                   | 200
 vacuum_cost_page_dirty              | 20
 vacuum_cost_page_hit                | 1
 vacuum_cost_page_miss               | 10
 autovacuum_vacuum_cost_delay        | 20ms

I've tagged on the default setting for each of these.  Both vacuum and
auto-vacuum keep score of how many points they've accumulated while
running. 20 points for dirtying a page, 10 for a read that's not found
to be in shared_buffers, 1 for reading a buffer from shared buffers.
When vacuum_cost_limit points is reached (or
autovacuum_vacuum_cost_limit if not -1) auto-vacuum sleeps for
autovacuum_vacuum_cost_delay, normal manual vacuums sleep for
vacuum_cost_delay.

In one of the log entries you saw:

> buffer usage: 4808221 hits, 6404148 misses, 6152603 dirtied
> avg read rate: 2.617 MiB/s, avg write rate: 2.514 MiB/s
> system usage: CPU 148.65s/70.06u sec elapsed 19119.55 sec

Doing a bit of maths to see how much time that vacuum should have slept for:

postgres=# select (4808221 * 1 + 6404148 * 10 + 6152603 * 20) / 200.0
* 20 / 1000;
      ?column?
--------------------
 19190.176100000000

That's remarkably close to the actual time of 19119.55 sec. If you do
the same for the other 3 vacuums then you'll see the same close match.

> I see the autovacuum statistics from the logs, how can I check that the workers are busy very often ?

It would be nice if there was something better, but periodically doing:

SELECT count(*) FROM pg_stat_activity where query like 'autovacuum%';

will work.

> My vacuum limit is 200 right now, basically If vacuum runs on my toasted table and reached 200 but it didnt finish to clean all the dead tuples, after the nap, should it continue cleaning it or wait until the vacuum_threshold hit again ?

You're confusing nap time is something else, Maybe you're confusing
that with speed of vacuum?  Napping is just the time auto-vacuum will
wait between checking for new tables to work on.  Having the
auto-vacuum run so slowly is a probable cause of still having dead
tuples after the vacuum... likely because they became dead after
vacuum started.

I'd recommend reading the manual or Tomas Vondra's blog about vacuum
costs. It's not overly complex, once you understand what each of the
vacuum settings does.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-performance by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Transaction size and Wal2Json
Next
From: Kristi Anderson
Date:
Subject: Managing High Availability in PostgreSQL – Part I