Thread: VACUUM (INTERRUPTIBLE)?

VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

08 September 2020, 18:49:36

Hi,

Jeff Janes in [1] found that I ([2]) broke autovacuum cancellations. It
obviously would be good to add a test for this, but it seems hard to
have that be reliable given how long it can take for autovacuum actually
get around to vacuum a specific table.

That made me wonder if it'd be worthwhile to add an option that'd make
user invoked VACUUM be interruptible by conflicting lock requests, just
like autovacuum is.

That's actually something I've wanted, and see other people want, a
couple times. Sometimes one wants to vacuum a table, but not block out
more important things like DDL. Which isn't really possible right now.

So how about adding an INTERRUPTIBLE option to VACUUM and ANALYZE?

Implementation wise it seems like the best way to implement this would
be to replace PROC_VACUUM_FOR_WRAPAROUND with
PROC_INTERRUPTIBLE_VACANALYZE or such (i.e. inverting the meaning).

One question I'm a bit split on is whether we'd want to continue
restricting the signalling in ProcSleep() to commands running VACUUM or
ANALYZE.

We could have a generic PROC_INTERRUPTIBLE_COMMAND or such, and have
deadlock.c / proc.c trigger whether to kill based on just that, without
looking at PROC_IS_AUTOVACUUM / PROC_IN_VACUUM.

Alternatively, if we do want to restrict it to VACUUM and ANALYZE, we'd
have to re-introduce PROC_IN_ANALYZE ;). After 12 years of not being
used and removed just weeks ago...

Greetings,

Andres Freund

[1] https://postgr.es/m/20200827213506.4baaanygq62q7pcj%40alap3.anarazel.de
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d72f6c750

Re: VACUUM (INTERRUPTIBLE)?

From

Magnus Hagander

Date:

08 September 2020, 20:30:40

On Tue, Sep 8, 2020 at 8:49 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

Jeff Janes in [1] found that I ([2]) broke autovacuum cancellations. It
obviously would be good to add a test for this, but it seems hard to
have that be reliable given how long it can take for autovacuum actually
get around to vacuum a specific table.

That made me wonder if it'd be worthwhile to add an option that'd make
user invoked VACUUM be interruptible by conflicting lock requests, just
like autovacuum is.

That's actually something I've wanted, and see other people want, a
couple times. Sometimes one wants to vacuum a table, but not block out
more important things like DDL. Which isn't really possible right now.

So how about adding an INTERRUPTIBLE option to VACUUM and ANALYZE?

Implementation wise it seems like the best way to implement this would
be to replace PROC_VACUUM_FOR_WRAPAROUND with
PROC_INTERRUPTIBLE_VACANALYZE or such (i.e. inverting the meaning).

One question I'm a bit split on is whether we'd want to continue
restricting the signalling in ProcSleep() to commands running VACUUM or
ANALYZE.

We could have a generic PROC_INTERRUPTIBLE_COMMAND or such, and have
deadlock.c / proc.c trigger whether to kill based on just that, without
looking at PROC_IS_AUTOVACUUM / PROC_IN_VACUUM.

Alternatively, if we do want to restrict it to VACUUM and ANALYZE, we'd
have to re-introduce PROC_IN_ANALYZE ;). After 12 years of not being
used and removed just weeks ago...

One thing I've been wanting many times but never properly got around to investigating how much work it would be to make happen, was to be able to trigger an autovacuum manually (yeah, strange choice of terms). That is, instead of the above, you'd have something like "VACUUM BACKGROUND" which would trigger an autovacuum worker to do the work, and then release your session. The points then being both (1) the ability to interrupt it, and (2) that it'd run in the backgorund and thus the foreground session could disconnect.

I think both would probably solve your problem, and being able to trigger a background one would add some extra value? But we'd have to figure out and be clear about what to do if all workers are busy for example - queue or error?

Worth considering, or am I missing something?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: VACUUM (INTERRUPTIBLE)?

From

Alvaro Herrera

Date:

08 September 2020, 20:36:00

On 2020-Sep-08, Andres Freund wrote:

> That made me wonder if it'd be worthwhile to add an option that'd make
> user invoked VACUUM be interruptible by conflicting lock requests, just
> like autovacuum is.

Yeah, I recall a request for this in the past, too.

> So how about adding an INTERRUPTIBLE option to VACUUM and ANALYZE?

+1 on adding it to VACUUM.  I'm not sure about ANALYZE ... most of the
time it is not as bad as VACUUM in terms of blocking other things, and
things get ugly if that ANALYZE is part of a transaction.  I think I'd
leave that out, since we don't have to cover all DDL that could take
ShareUpdateExclusive lock (CIC etc).  VACUUM is especially problematic,
ISTM, which is we would do it for that.

> Alternatively, if we do want to restrict it to VACUUM and ANALYZE, we'd
> have to re-introduce PROC_IN_ANALYZE ;). After 12 years of not being
> used and removed just weeks ago...

Hah :-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

08 September 2020, 21:36:02

Hi,

On 2020-09-08 17:36:00 -0300, Alvaro Herrera wrote:
> On 2020-Sep-08, Andres Freund wrote:
> 
> > That made me wonder if it'd be worthwhile to add an option that'd make
> > user invoked VACUUM be interruptible by conflicting lock requests, just
> > like autovacuum is.
> 
> Yeah, I recall a request for this in the past, too.
> 
> > So how about adding an INTERRUPTIBLE option to VACUUM and ANALYZE?
> 
> +1 on adding it to VACUUM.  I'm not sure about ANALYZE ... most of the
> time it is not as bad as VACUUM in terms of blocking other things, and
> things get ugly if that ANALYZE is part of a transaction.  I think I'd
> leave that out, since we don't have to cover all DDL that could take
> ShareUpdateExclusive lock (CIC etc).  VACUUM is especially problematic,
> ISTM, which is we would do it for that.

ANALYZE sometimes is worse than VACUUM in some ways, it'll often take
longer on large tables (more random IO, no freeze map equivalent). And
there's still TRUNCATE / DROP TABLE etc that you want to be able to
interrupt (auto-)analyze.

I'm not sure it's really necessary to distinguish transaction /
non-transactional for ANALYZE. We continue to wait for the lock, right?
So signalling the transaction is fine, it'll error out. Even in the
case of ANALYZE in a subtransaction, we'll not do something incorrect,
we might just wait, if the top level transaction also has a lock, and
doesn't abort immediately.

So it doesn't seem that useful to not make manual analyze interruptible?

Greetings,

Andres Freund

Re: VACUUM (INTERRUPTIBLE)?

From

Tom Lane

Date:

08 September 2020, 22:37:05

Andres Freund <andres@anarazel.de> writes:
> So it doesn't seem that useful to not make manual analyze interruptible?

I find the main attraction of this idea is that instead of saying
"autovacuum/autoanalyze have these magic behaviors", we can say
"autovacuum/autoanalyze use option FOO".  So whatever behavior
autoanalyze has today, we should make available to manual analyze,
without quibbling too hard over how much of a use-case there is.

            regards, tom lane

Re: VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

08 September 2020, 22:58:05

Hi,

On 2020-09-08 22:30:40 +0200, Magnus Hagander wrote:
> One thing I've been wanting many times but never properly got around to
> investigating how much work it would be to make happen, was to be able to
> trigger an autovacuum manually (yeah, strange choice of terms). That is,
> instead of the above, you'd have something like "VACUUM BACKGROUND" which
> would trigger an autovacuum worker to do the work, and then release your
> session. The points then being both (1) the ability to interrupt it, and
> (2) that it'd run in the backgorund and thus the foreground session could
> disconnect.
> 
> I think both would probably solve your problem, and being able to trigger a
> background one would add some extra value? But we'd have to figure out and
> be clear about what to do if all workers are busy for example - queue or
> error?
> 
> Worth considering, or am I missing something?

It seems like it could be useful in general. Not that much for my case
however. It'd be much easier to test whether vacuum was successfully
cancelled if we can see the cancellation, which we can't in the
autovacuum case. And there'd likely be some fairly ugly logic around
needing to wait until the "autovacuum request" is processed etc,
including the case that all workers are currently busy.

So my INTERRUPTIBLE idea seems to be a better fit for the tests, and
independently quite useful. E.g. wanting to know whether VACUUM errored
out is useful for scripts that want their VACUUMs to be interruptible,
and that doesn't work well with the "backgrounding" idea of yours.

Having said that, your idea does seem like it could be helpful. The
difficulty seems to depend a bit on the exact desired
semantics. E.g. would the "queue" command block until vacuum started or
not? The latter would obviously be much easier...

Greetings,

Andres Freund

Re: VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

09 September 2020, 02:30:40

Hi,

On 2020-09-08 18:37:05 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > So it doesn't seem that useful to not make manual analyze interruptible?
> 
> I find the main attraction of this idea is that instead of saying
> "autovacuum/autoanalyze have these magic behaviors", we can say
> "autovacuum/autoanalyze use option FOO".  So whatever behavior
> autoanalyze has today, we should make available to manual analyze,
> without quibbling too hard over how much of a use-case there is.

Yea, I think that's a fair point. It's basically more work to
distinguish the two anyway.

It was fairly straight forward to implement the basics of
INTERRUPTIBLE. However it seems like it might not actually address my
main concern, i.e. making this reliably testable with isolation
tester. At least not the way I envisioned it...

My idea for the test was to have one isolationtester session start a
seqscan, which would then reliably block a concurrent VACUUM (FREEZE,
INTERRUPTIBLE). That'd then give a stable point to switch back to the
first session, which would interrupt the VACUUM by doing a LOCK.

But because there's no known waiting-for pid for a buffer pin wait, we
currently don't detect that we're blocked :(.

Now, it'd not be too hard to make pg_isolation_test_session_is_blocked
also report being blocked when we're waiting for a buffer pin (ignoring
interesting_pids), similar to the safe snapshot test.  However I'm
worried that that could lead to "false" reports of blocking? But maybe
that's a small enough risk because there's few unconditional cleanup
lock acquisitions?

Hacking such a wait condition test into
pg_isolation_test_session_is_blocked() successfully allows my test to
work for VACUUM.

Not yet sure about what a suitable way to test this for analyze would
be, unless we'd also accept VacuumDelay as a wait condition :(. I guess
one fairly easy way would be to include an expression index, and have
the expression index acquire an unavailable lock. Which'd allow to
switch to another session.

Greetings,

Andres Freund

Re: VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

09 September 2020, 04:11:47

Hi,

On 2020-09-08 19:30:40 -0700, Andres Freund wrote:
> It was fairly straight forward to implement the basics of
> INTERRUPTIBLE. However it seems like it might not actually address my
> main concern, i.e. making this reliably testable with isolation
> tester. At least not the way I envisioned it...
>
> My idea for the test was to have one isolationtester session start a
> seqscan, which would then reliably block a concurrent VACUUM (FREEZE,
> INTERRUPTIBLE). That'd then give a stable point to switch back to the
> first session, which would interrupt the VACUUM by doing a LOCK.
>
> But because there's no known waiting-for pid for a buffer pin wait, we
> currently don't detect that we're blocked :(.
>
>
> Now, it'd not be too hard to make pg_isolation_test_session_is_blocked
> also report being blocked when we're waiting for a buffer pin (ignoring
> interesting_pids), similar to the safe snapshot test.  However I'm
> worried that that could lead to "false" reports of blocking? But maybe
> that's a small enough risk because there's few unconditional cleanup
> lock acquisitions?
>
> Hacking such a wait condition test into
> pg_isolation_test_session_is_blocked() successfully allows my test to
> work for VACUUM.

Here's a patch series implementing that:

0001: Adds INTERRUPTIBLE to VACUUM ANALYZE
    There's quite a few XXX's inside. And it needs some none
    isolationtester test.
0002: Treat BufferPin as a waiting condition in isolationtest.
    That's the aforementioned workaround.
0003: A test, finally.
    But it only tests VACUUM, not yet ANALYZE. Perhaps also a test for
    not allowing interruptions, somehow?

Clearly WIP, but good enough for some comments, I hope?

A few comments:
- Right now there can only be one such blocking task, because
  PROC_IS_INTERRUPTIBLE is only set by vacuum / analyze, and the lock
  levels are self exclusive. But it's generically named now, so it seems
  just a matter of time until somebody adds that to other commands. I
  think it's ok to not add support for ProcSleep() killing multiple
  other processes?
- It's a bit annoying that normal user backends just see a generic
  "ERROR: canceling statement due to user request". Do we need something
  better?
- The order in which VACUUM parameters are documented & implemented
  seems fairly random. Perhaps it'd make sense to reorder
  alphabetically?


> Not yet sure about what a suitable way to test this for analyze would
> be, unless we'd also accept VacuumDelay as a wait condition :(. I guess
> one fairly easy way would be to include an expression index, and have
> the expression index acquire an unavailable lock. Which'd allow to
> switch to another session.

But here I've not yet done anything. That just seems too ugly & fragile.

Greetings,

Andres Freund

Attachment

Re: VACUUM (INTERRUPTIBLE)?

From

Magnus Hagander

Date:

09 September 2020, 16:07:31

On Wed, Sep 9, 2020 at 12:58 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-09-08 22:30:40 +0200, Magnus Hagander wrote:
> One thing I've been wanting many times but never properly got around to
> investigating how much work it would be to make happen, was to be able to
> trigger an autovacuum manually (yeah, strange choice of terms). That is,
> instead of the above, you'd have something like "VACUUM BACKGROUND" which
> would trigger an autovacuum worker to do the work, and then release your
> session. The points then being both (1) the ability to interrupt it, and
> (2) that it'd run in the backgorund and thus the foreground session could
> disconnect.
>
> I think both would probably solve your problem, and being able to trigger a
> background one would add some extra value? But we'd have to figure out and
> be clear about what to do if all workers are busy for example - queue or
> error?
>
> Worth considering, or am I missing something?

It seems like it could be useful in general. Not that much for my case
however. It'd be much easier to test whether vacuum was successfully
cancelled if we can see the cancellation, which we can't in the
autovacuum case. And there'd likely be some fairly ugly logic around

That does bring up the other thing that I even put together some hacky patch for at one point (long since lost or badly-rebased-into-nothingness) which is to have the stats collector track on a per-relation basis the number of autovacuum interruptions that have happened on a specific table :)

But yes, with that it would still not be *as easy* to use, definitely.

needing to wait until the "autovacuum request" is processed etc,
including the case that all workers are currently busy.

So my INTERRUPTIBLE idea seems to be a better fit for the tests, and
independently quite useful. E.g. wanting to know whether VACUUM errored
out is useful for scripts that want their VACUUMs to be interruptible,
and that doesn't work well with the "backgrounding" idea of yours.

Having said that, your idea does seem like it could be helpful. The
difficulty seems to depend a bit on the exact desired
semantics. E.g. would the "queue" command block until vacuum started or
not? The latter would obviously be much easier...

Yeah, that's where I stalled in my own thoughts about it I think. OTOH, why wait for it to start, if you're not waiting for it to finish... But also, if there is a max size of the queue, what do you do if you hit that one?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: VACUUM (INTERRUPTIBLE)?

From

Andres Freund

Date:

01 October 2020, 03:27:02

Hi,

On 2020-09-08 21:11:47 -0700, Andres Freund wrote:
> > Not yet sure about what a suitable way to test this for analyze would
> > be, unless we'd also accept VacuumDelay as a wait condition :(. I guess
> > one fairly easy way would be to include an expression index, and have
> > the expression index acquire an unavailable lock. Which'd allow to
> > switch to another session.
> 
> But here I've not yet done anything. That just seems too ugly & fragile.

I found a better way after a bunch of tinkering. Having an
isolationtester session lock pg_class in SHARE mode works, because both
VACUUM and ANALYZE update the relation's row. I *think* this is
reliable, due to the report_multiple_error_messages() logic.

This version also adds setting of PROC_INTERRUPTIBLE for ANALYZE
(INTERUPTIBLE), which the previous version didn't yet contain. As
currently implemented this means that autovacuum started ANALYZEs are
interruptible, which they were not before. That's dead trivial to
change, but to me it makes sense to leave it this way. But I also see an
argument that it'd be better to do so separately?

> - It's a bit annoying that normal user backends just see a generic
>   "ERROR: canceling statement due to user request". Do we need something
>   better?

I think this is ok for this iteration, and instead at some point solve
this in a bit more generic manner. This isn't the only place where
carrying a bit more information about why and by whom a query is
cancelled would be very useful.

There's one XXX left in here, which is:

@@ -1340,8 +1338,12 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
                 LWLockRelease(ProcArrayLock);

                 /* send the autovacuum worker Back to Old Kent Road */
+                /*
+                 * FIXME: do we want to continue to identify autovacuum here?
+                 * Could do so based on PROC_IS_AUTOVACUUM.
+                 */
                 ereport(DEBUG1,
-                        (errmsg("sending cancel to blocking autovacuum PID %d",
+                        (errmsg("sending cancel to blocking autovacuum|XXX PID %d",
                                 pid),
                          errdetail_log("%s", logbuf.data)));

given that this is a DEBUG1 I'm inclined to just replace this with
"... blocking interruptible process with PID %d"
or such?

Comments?

Greetings,

Andres Freund

Attachment

v2-0001-Add-INTERRUPTIBLE-option-to-VACUUM-ANALYZE-test-i.patch