Thread: Speeding up an in-progress wraparound-preventing vacuum

Speeding up an in-progress wraparound-preventing vacuum

From

Vincent de Phily

Date:

08 December 2014, 12:53:46

Hi List,

I have a "autovacuum: VACUUM ANALYZE public.some_big_table (to prevent
wraparound)" that has been running for over 13 days. The process is consuming
IO so I'm confident it isn't stuck, but it's still taking surprisingly long.
PG 9.1.13 on Debian.

The actual table is 584G on a SAN, plus 324G of indexes on local disk. The
system was IO-starved until about 5 days ago, after offloading half the work
to a different server and waiting for the purging of old data (went from
keeping 4 weeks to 2) to catch up (so probably 2/3rd of the table is currently
bloat, which I'd like to get back asap). Currently about 80% of the IO is
devoted to the vacuum process (on average throughout the day, as extrapolated
from atop output).

I've tried raising autovacuum_vacuum_cost_limit from 500 to 5000, but
apparently the already-running vacuum process didn't pick up the change (I did
send a sighup, and new vacuum processes do run faster). I tried running a
manual vacuum on that table (to benefit from the new settings and from the
more aggressive behavior of manual vacuums), but it's apparently waiting for
the wraparound vacuum to finish first.

My currrent settings:
 autovacuum                      | on        | default
 autovacuum_analyze_scale_factor | 0.1       | default
 autovacuum_analyze_threshold    | 10000     | configuration file
 autovacuum_freeze_max_age       | 200000000 | default
 autovacuum_max_workers          | 3         | default
 autovacuum_naptime              | 60        | default
 autovacuum_vacuum_cost_delay    | 10        | configuration file
 autovacuum_vacuum_cost_limit    | 5000      | configuration file
 autovacuum_vacuum_scale_factor  | 0.002     | configuration file
 autovacuum_vacuum_threshold     | 10000     | configuration file
 maintenance_work_mem            | 524288    | configuration file
The vacuum_scale_factor is tweaked to trigger once a day for most tables, and
should trigger every 3 days on the problem table.



My first question is: is there a way to speedup the currently-running vacuum
process ? I guess killing it to let it be replaced by a more agressively-
configured one would work, but I'd hate to lose 2 weeks of processing (is
there a way to estimate how much more vacuuming work remains to be done ?),
and I'm being a bit more cautious with the wraparound-preventing kind.

Second question is: how come we reached the wraparound threshold on this table
at all ? We've only been keeping 28 days of data in this table, doing
incremental deletes every day (there are no updates at all). I find it very
unlikely that we'd go through 2M transactions in that timespan (that'd need
890 transactions per second, we're well below 100). The pg_class.relfozenxid
on that table is at 680M, while most other tables are around 860M. Could it be
that routine vacuums haven't been able to update the relfrozenxid in a long
long time, or am I missing something else ?



Thanks.

--
Vincent de Phily

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Jeff Janes

Date:

08 December 2014, 18:17:44

On Mon, Dec 8, 2014 at 4:54 AM, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:

Hi List,

I have a "autovacuum: VACUUM ANALYZE public.some_big_table (to prevent
wraparound)" that has been running for over 13 days. The process is consuming
IO so I'm confident it isn't stuck, but it's still taking surprisingly long.
PG 9.1.13 on Debian.

The actual table is 584G on a SAN, plus 324G of indexes on local disk. The
system was IO-starved until about 5 days ago, after offloading half the work
to a different server and waiting for the purging of old data (went from
keeping 4 weeks to 2) to catch up (so probably 2/3rd of the table is currently
bloat, which I'd like to get back asap). Currently about 80% of the IO is
devoted to the vacuum process (on average throughout the day, as extrapolated
from atop output).

Is that 80% of the actually occurring IO, or 80% of the maximum possible IO?

I've tried raising autovacuum_vacuum_cost_limit from 500 to 5000, but
apparently the already-running vacuum process didn't pick up the change (I did
send a sighup, and new vacuum processes do run faster). I tried running a
manual vacuum on that table (to benefit from the new settings and from the
more aggressive behavior of manual vacuums), but it's apparently waiting for
the wraparound vacuum to finish first.

My currrent settings:
autovacuum | on | default
autovacuum_analyze_scale_factor | 0.1 | default
autovacuum_analyze_threshold | 10000 | configuration file
autovacuum_freeze_max_age | 200000000 | default
autovacuum_max_workers | 3 | default
autovacuum_naptime | 60 | default
autovacuum_vacuum_cost_delay | 10 | configuration file
autovacuum_vacuum_cost_limit | 5000 | configuration file
autovacuum_vacuum_scale_factor | 0.002 | configuration file
autovacuum_vacuum_threshold | 10000 | configuration file
maintenance_work_mem | 524288 | configuration file
The vacuum_scale_factor is tweaked to trigger once a day for most tables, and
should trigger every 3 days on the problem table.

Increasing maintenance_work_mem even further, at least temporarily and locally for this operation, might be a good idea.

My first question is: is there a way to speedup the currently-running vacuum
process ? I guess killing it to let it be replaced by a more agressively-
configured one would work, but I'd hate to lose 2 weeks of processing (is
there a way to estimate how much more vacuuming work remains to be done ?),
and I'm being a bit more cautious with the wraparound-preventing kind.

I don't know of a way to speed it up gracefully. That has frustrated me a few times, and a solution would really be nice.

If you kill it, the new process will have to re-read the entire table, but it will have much less work to do since the killed process already 'plowed a path' for it. In a sense, killing the process will throw away all of the sequential read work on the table that has already been done, but the index reads and the writing workload is not all lost, it will save the new process time on those.

You can `strace` for the lseek command to see which file handles it is currently working on, and

use lsof to turn those into names. You want to look at where it is in the table files, not the index files.

Second question is: how come we reached the wraparound threshold on this table
at all ? We've only been keeping 28 days of data in this table, doing
incremental deletes every day (there are no updates at all). I find it very
unlikely that we'd go through 2M transactions in that timespan (that'd need
890 transactions per second, we're well below 100). The pg_class.relfozenxid
on that table is at 680M, while most other tables are around 860M. Could it be
that routine vacuums haven't been able to update the relfrozenxid in a long
long time, or am I missing something else ?

PostgreSQL doesn't know what your delete routine is like. It has to verify with its own eyes that there are

no rows over a certain age.

I don't think that routine vacuums even attempts to update relfrozenxid, or at least doesn't try very hard.

Are you sure that routine vacuums have been running to completion on this table, as opposed to getting interrupted by something before finishing each time?

Cheers,

Jeff

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Vincent de Phily

Date:

09 December 2014, 00:57:13

On Monday 08 December 2014 10:17:37 Jeff Janes wrote:
> On Mon, Dec 8, 2014 at 4:54 AM, Vincent de Phily <
>
> > bloat, which I'd like to get back asap). Currently about 80% of the IO is
> > devoted to the vacuum process (on average throughout the day, as
> > extrapolated
> > from atop output).
>
> Is that 80% of the actually occurring IO, or 80% of the maximum possible IO?

80% of max possible IO, although I'm not sure how atop determines what the max
is. It's a fairly reliable metric of "the drive is a busy as can be" in my
experience.

> Increasing maintenance_work_mem even further, at least temporarily and
> locally for this operation, might be a good idea.

Ok, I thought 512M was already overgenerous, but I'll try increasing that too.

> > My first question is: is there a way to speedup the currently-running
> > vacuum
> > process ? I guess killing it to let it be replaced by a more agressively-
> > configured one would work, but I'd hate to lose 2 weeks of processing (is
> > there a way to estimate how much more vacuuming work remains to be done
> > ?),
> > and I'm being a bit more cautious with the wraparound-preventing kind.
>
> I don't know of a way to speed it up gracefully.  That has frustrated me a
> few times, and a solution would really be nice.
>
> If you kill it, the new process will have to re-read the entire table, but
> it will have much less work to do since the killed process already 'plowed
> a path' for it.  In a sense, killing the process will throw away all of the
> sequential read work on the table that has already been done, but the index
> reads and the writing workload is not all lost, it will save the new
> process time on those.

Ah good point, even if it has to re-read all the pages, it'll only have to
write fozenxid for the pages that weren't processed before, that's good to
hear.

Isn'there also something about vacuum marking a page as "all empty", so that
it can be skiped by the next run ?

I don't get what index read is saved between vacuum runs ? I have 64G of RAM
on this box, so there's no hope of the index staying in memory.

> You can `strace` for the lseek command to see which file handles it is
> currently working on, and
> use lsof to turn those into names.  You want to look at where it is in the
> table files, not the index files.

Neat trick, thanks. I'll try it in the morning when I'm fresh. Are the files
read in C-sorting order ?

> > Second question is: how come we reached the wraparound threshold on this
> > table
> > at all ? We've only been keeping 28 days of data in this table, doing
> > incremental deletes every day (there are no updates at all). I find it
> > very
> > unlikely that we'd go through 2M transactions in that timespan (that'd
> > need
> > 890 transactions per second, we're well below 100). The
> > pg_class.relfozenxid
> > on that table is at 680M, while most other tables are around 860M. Could
> > it be
> > that routine vacuums haven't been able to update the relfrozenxid in a
> > long
> > long time, or am I missing something else ?
>
> PostgreSQL doesn't know what your delete routine is like.  It has to verify
> with its own eyes that there are no rows over a certain age.

My assumption was that routine vacuuming was able to update the table's
relfroxenxid, but I realize now that PG needs to do a full scan before
updating that.

> I don't think that routine vacuums even attempts to update relfrozenxid, or
> at least doesn't try very hard.

AFAIU it does (the 6th paragraph in 23.1.4 seems to imply it), and
vacuum_freeze_min_age controls when. Perhaps lowering that value would help
prepare a lot of the anti-wraparound work.

Pity there's no "frozen pages map" (or is there ?) to keep track of pages will
all-frozen tuples, it could speed up that anti-wraparound vacuum.

> Are you sure that routine vacuums have been running to completion on this
> table, as opposed to getting interrupted by something before finishing each
> time?

I checked pg_stat_user_tables.last_{auto_,}vacuum for this table and they are
both null. This is seriously worrying. I've seen autovacuum take a few days on
this table but always assumed that it finished properly. And I'm pretty sure
vacuuming does some work, otherwise my disk usage woul grow much faster. I
have no idea what could cause vacuuming to systematically stop before the end.
Maybe I'll find something in the logs.

Thanks for your answers.

--
Vincent de Phily

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Vincent de Phily

Date:

09 December 2014, 21:50:48

On Tuesday 09 December 2014 01:58:11 Vincent de Phily wrote:
> On Monday 08 December 2014 10:17:37 Jeff Janes wrote:
> > You can `strace` for the lseek command to see which file handles it is
> > currently working on, and
> > use lsof to turn those into names.  You want to look at where it is in the
> > table files, not the index files.
>
> Neat trick, thanks. I'll try it in the morning when I'm fresh. Are the files
> read in C-sorting order ?

I did this (stracing the first 1000 read() and write() every 30s) and kept an
eye on progress throughout the day. It follows a (to me) surprising pattern,
which looks unnecessarily time-consuming :

It reads about 8G of the table (often doing a similar number of writes, but
not always), then starts reading the pkey index and the second index (only 2
indexes on this table), reading both of them fully (some writes as well, but
not as many as for the table), which takes around 8h.

And the cycle apparently repeats: process a few more GB of the table, then go
reprocess both indexes fully. A rough estimate is that it spends ~6x more time
(re)processing the indexes as it does processing the table (looking at data
size alone the ratio would be 41x, but the indexes go faster). I'm probably
lucky to only have two indexes on this table.

Is that the expected behaviour ? Why ? I can imagine that it skips some pages
and needs to go back, but then it should only do this once at the end of the
process, or it should only revisit a handfull of pages. Is that something that
can be improved by throwing more maintenance_work_mem at it ?

--
Vincent de Phily

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Tom Lane

Date:

09 December 2014, 21:56:46

Vincent de Phily <vincent.dephily@mobile-devices.fr> writes:
> It reads about 8G of the table (often doing a similar number of writes, but
> not always), then starts reading the pkey index and the second index (only 2
> indexes on this table), reading both of them fully (some writes as well, but
> not as many as for the table), which takes around 8h.

> And the cycle apparently repeats: process a few more GB of the table, then go
> reprocess both indexes fully. A rough estimate is that it spends ~6x more time
> (re)processing the indexes as it does processing the table (looking at data
> size alone the ratio would be 41x, but the indexes go faster). I'm probably
> lucky to only have two indexes on this table.

> Is that the expected behaviour ?

Yes.  It can only remember so many dead tuples at a time, and it has
to go clean the indexes when the dead-TIDs buffer fills up.  You could
increase maintenance_work_mem to increase the size of that buffer.

            regards, tom lane

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Vincent de Phily

Date:

09 December 2014, 22:28:15

On Tuesday 09 December 2014 16:56:39 Tom Lane wrote:
> Vincent de Phily <vincent.dephily@mobile-devices.fr> writes:
> > It reads about 8G of the table (often doing a similar number of writes,
> > but
> > not always), then starts reading the pkey index and the second index (only
> > 2 indexes on this table), reading both of them fully (some writes as
> > well, but not as many as for the table), which takes around 8h.
> >
> > And the cycle apparently repeats: process a few more GB of the table, then
> > go reprocess both indexes fully. A rough estimate is that it spends ~6x
> > more time (re)processing the indexes as it does processing the table
> > (looking at data size alone the ratio would be 41x, but the indexes go
> > faster). I'm probably lucky to only have two indexes on this table.
> >
> > Is that the expected behaviour ?
>
> Yes.  It can only remember so many dead tuples at a time, and it has
> to go clean the indexes when the dead-TIDs buffer fills up.

Fair enough. And I guess it scans the whole index each time because the dead
tuples are spread all over ?

What happens when vacuum is killed before it had time to go though the index
with its dead-TID buffer ? Surely the index isn't irreversibly bloated; and
whatever is done then could be done in the normal case ?

It still feels like a lot of wasted IO.

> You could
> increase maintenance_work_mem to increase the size of that buffer.

Will do, thanks.

--
Vincent de Phily

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Tom Lane

Date:

09 December 2014, 22:32:37

Vincent de Phily <vincent.dephily@mobile-devices.fr> writes:
> What happens when vacuum is killed before it had time to go though the index
> with its dead-TID buffer ?

The next run will go through the index again, looking for those same TIDs
(and possibly more).

            regards, tom lane

Re: Speeding up an in-progress wraparound-preventing vacuum

From

Jeff Janes

Date:

10 December 2014, 20:28:46

On Mon, Dec 8, 2014 at 4:58 PM, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:

On Monday 08 December 2014 10:17:37 Jeff Janes wrote:
> On Mon, Dec 8, 2014 at 4:54 AM, Vincent de Phily <

> I don't think that routine vacuums even attempts to update relfrozenxid, or
> at least doesn't try very hard.

AFAIU it does (the 6th paragraph in 23.1.4 seems to imply it), and
vacuum_freeze_min_age controls when. Perhaps lowering that value would help
prepare a lot of the anti-wraparound work.

There are two things that make autovac do an anti-wraparound vacuum. One of them is if the table is already going to be vacuumed anyway and vacuum_freeze_table_age is exceeded. In that case, the vacuum that was going to happen anyway gets promoted to be an anti-wraparound vacuum. The other case if when autovacuum_freeze_max_age is exceeded. Then it forces a vacuum to happen even when there is no other reason for it to occur. I think the language in the 6th paragraph is contrasting these two ways to get an anti-wraparound vacuum, not contrasting ordinary vacuum with anti-wraparound vacuum.

Pity there's no "frozen pages map" (or is there ?) to keep track of pages will
all-frozen tuples, it could speed up that anti-wraparound vacuum.

But not speed it up by much. As you discovered, most of the time is spent dealing with the indexes.

> Are you sure that routine vacuums have been running to completion on this
> table, as opposed to getting interrupted by something before finishing each
> time?

I checked pg_stat_user_tables.last_{auto_,}vacuum for this table and they are
both null. This is seriously worrying. I've seen autovacuum take a few days on
this table but always assumed that it finished properly. And I'm pretty sure
vacuuming does some work, otherwise my disk usage woul grow much faster. I
have no idea what could cause vacuuming to systematically stop before the end.
Maybe I'll find something in the logs.

If it gets interrupted by another process needing the lock, it will be in the logs. But more recent versions than 9.1 will log more useful information.

If it is interrupted by a database shutdown or restart, it only logs that it was shutdown, it doesn't log what it was in the middle of doing at the time. (Perhaps that should be changed?).

Cheers,

Jeff