Thread: Allow single table VACUUM in transaction block

Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

27 October 2022, 09:31:31

It is a common user annoyance to have a script fail because someone
added a VACUUM, especially when using --single-transaction option.
Fix, so that this works without issue:

BEGIN;
....
VACUUM (ANALYZE) vactst;
....
COMMIT;

Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.

When in a xact block, we do not set PROC_IN_VACUUM,
nor update datfrozenxid.

Tests, docs.

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

single_table_vacuum.v1.patch

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

27 October 2022, 16:18:45

On Thu, 27 Oct 2022 at 10:31, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> Tests, docs.

The patch tester says that a pg_upgrade test is failing on Windows,
but works for me.

t/002_pg_upgrade.pl .. ok

Anybody shed any light on that, much appreciated.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Bharath Rupireddy

Date:

27 October 2022, 16:21:30

On Thu, Oct 27, 2022 at 9:49 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 27 Oct 2022 at 10:31, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> > Tests, docs.
>
> The patch tester says that a pg_upgrade test is failing on Windows,
> but works for me.
>
> t/002_pg_upgrade.pl .. ok
>
> Anybody shed any light on that, much appreciated.

Please see a recent thread on pg_upgrade failure -
https://www.postgresql.org/message-id/Y04mN0ZLNzJywrad%40paquier.xyz.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Allow single table VACUUM in transaction block

From

Justin Pryzby

Date:

27 October 2022, 20:07:42

On Thu, Oct 27, 2022 at 10:31:31AM +0100, Simon Riggs wrote:
> Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.

Maybe I misunderstood what you meant: you said "not VACUUM FULL", but
with your patch, that works:

postgres=# begin; VACUUM FULL pg_class; commit;
BEGIN
VACUUM
COMMIT

Actually, I've thought before that it was bit weird that CLUSTER can be
run within a transaction, but VACUUM FULL cannot (even though it does a
CLUSTER behind the scenes).  VACUUM FULL can process multiple relations,
whereas CLUSTER can't, but it seems nice to allow vacuum full for the
case of a single relation.

I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
within a user txn.

Maybe the error message needs to be qualified "...when multiple
relations are specified".

ERROR:  VACUUM cannot run inside a transaction block

-- 
Justin

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

01 November 2022, 23:56:17

On Thu, 27 Oct 2022 at 21:07, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Oct 27, 2022 at 10:31:31AM +0100, Simon Riggs wrote:
> > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.
>
> Maybe I misunderstood what you meant: you said "not VACUUM FULL", but
> with your patch, that works:
>
> postgres=# begin; VACUUM FULL pg_class; commit;
> BEGIN
> VACUUM
> COMMIT
>
> Actually, I've thought before that it was bit weird that CLUSTER can be
> run within a transaction, but VACUUM FULL cannot (even though it does a
> CLUSTER behind the scenes).  VACUUM FULL can process multiple relations,
> whereas CLUSTER can't, but it seems nice to allow vacuum full for the
> case of a single relation.
>
> I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> within a user txn.

My intention was to prevent that. I am certainly quite uneasy about
changing anything related to CLUSTER/VF, since they are old, complex
and bug prone.

So for now, I will block VF, as was my original intent.

I will be guided by what others think... so you may yet get your wish.


> Maybe the error message needs to be qualified "...when multiple
> relations are specified".
>
> ERROR:  VACUUM cannot run inside a transaction block

Hmm, that is standard wording based on the statement type, but I can
set a CONTEXT message also. Will update accordingly.

Thanks for your input.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

03 November 2022, 10:23:27

On Tue, 1 Nov 2022 at 23:56, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> > I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> > within a user txn.
>
> My intention was to prevent that. I am certainly quite uneasy about
> changing anything related to CLUSTER/VF, since they are old, complex
> and bug prone.
>
> So for now, I will block VF, as was my original intent.
>
> I will be guided by what others think... so you may yet get your wish.
>
>
> > Maybe the error message needs to be qualified "...when multiple
> > relations are specified".
> >
> > ERROR:  VACUUM cannot run inside a transaction block
>
> Hmm, that is standard wording based on the statement type, but I can
> set a CONTEXT message also. Will update accordingly.
>
> Thanks for your input.

New version attached, as described.

Other review comments and alternate opinions welcome.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

single_table_vacuum.v2.patch

Re: Allow single table VACUUM in transaction block

From

Rahila Syed

Date:

04 November 2022, 04:45:44

Hi Simon,

On Thu, Nov 3, 2022 at 3:53 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Tue, 1 Nov 2022 at 23:56, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> > I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> > within a user txn.
>
> My intention was to prevent that. I am certainly quite uneasy about
> changing anything related to CLUSTER/VF, since they are old, complex
> and bug prone.
>
> So for now, I will block VF, as was my original intent.
>
> I will be guided by what others think... so you may yet get your wish.
>
>
> > Maybe the error message needs to be qualified "...when multiple
> > relations are specified".
> >
> > ERROR: VACUUM cannot run inside a transaction block
>
> Hmm, that is standard wording based on the statement type, but I can
> set a CONTEXT message also. Will update accordingly.
>
> Thanks for your input.

New version attached, as described.

Other review comments and alternate opinions welcome.

I applied and did some basic testing on the patch, it works as described.

I would like to bring up a few points that I came across while looking into the vacuum code.

1. As a result of this change to allow VACUUM inside a user transaction, I think there is some chance of causing

a block/delay of concurrent VACUUMs if a VACUUM is being run under a long running transaction.

2. Also, if a user runs VACUUM in a transaction, performance optimizations like PROC_IN_VACUUM won't work.

3. Also, if VACUUM happens towards the end of a long running transaction, the snapshot will be old

and xmin horizon for vacuum would be somewhat old as compared to current lazy vacuum which

acquires a new snapshot just before scanning the table.

So, while I understand the need of the feature, I am wondering if there should be some mention

of above caveats in documentation with the recommendation that VACUUM should be run outside

a transaction, in general.

Thank you,

Rahila Syed

Re: Allow single table VACUUM in transaction block

From

Rahila Syed

Date:

04 November 2022, 07:36:54

Hi Simon,

On Fri, Nov 4, 2022 at 10:15 AM Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi Simon,

On Thu, Nov 3, 2022 at 3:53 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Tue, 1 Nov 2022 at 23:56, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> > I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> > within a user txn.
>
> My intention was to prevent that. I am certainly quite uneasy about
> changing anything related to CLUSTER/VF, since they are old, complex
> and bug prone.
>
> So for now, I will block VF, as was my original intent.
>
> I will be guided by what others think... so you may yet get your wish.
>
>
> > Maybe the error message needs to be qualified "...when multiple
> > relations are specified".
> >
> > ERROR: VACUUM cannot run inside a transaction block
>
> Hmm, that is standard wording based on the statement type, but I can
> set a CONTEXT message also. Will update accordingly.
>
> Thanks for your input.

New version attached, as described.

Other review comments and alternate opinions welcome.

I applied and did some basic testing on the patch, it works as described.

I would like to bring up a few points that I came across while looking into the vacuum code.

1. As a result of this change to allow VACUUM inside a user transaction, I think there is some chance of causing
a block/delay of concurrent VACUUMs if a VACUUM is being run under a long running transaction.
2. Also, if a user runs VACUUM in a transaction, performance optimizations like PROC_IN_VACUUM won't work.
3. Also, if VACUUM happens towards the end of a long running transaction, the snapshot will be old
and xmin horizon for vacuum would be somewhat old as compared to current lazy vacuum which
acquires a new snapshot just before scanning the table.

So, while I understand the need of the feature, I am wondering if there should be some mention
of above caveats in documentation with the recommendation that VACUUM should be run outside
a transaction, in general.

Sorry, I just noticed that you have already mentioned some of these in the documentation as follows, so it seems

it is already taken care of.

+ <command>VACUUM</command> cannot be executed inside a transaction block,
+ unless a single table is specified and <literal>FULL</literal> is not
+ specified. When executing inside a transaction block the vacuum scan can
+ hold back the xmin horizon and does not update the database datfrozenxid,
+ as a result this usage is not useful for database maintenance, but is provided
+ to allow vacuuming in special circumstances, such as temporary or private
+ work tables.

Thank you,

Rahila Syed

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

04 November 2022, 09:09:34

Hi Rahila,

Thanks for your review.

On Fri, 4 Nov 2022 at 07:37, Rahila Syed <rahilasyed90@gmail.com> wrote:

>> I would like to bring up a few points that I came across while looking into the vacuum code.
>>
>> 1.  As a result of this change to allow VACUUM inside a user transaction, I think there is some chance of causing
>> a block/delay of concurrent VACUUMs if a VACUUM is being run under a long running transaction.
>> 2. Also, if a user runs VACUUM in a transaction, performance optimizations like PROC_IN_VACUUM won't work.
>> 3. Also, if VACUUM happens towards the end of a long running transaction, the snapshot will be old
>> and xmin horizon for vacuum would be somewhat old as compared to current lazy vacuum which
>> acquires a new snapshot just before scanning the table.
>>
>> So, while I understand the need of the feature, I am wondering if there should be some mention
>> of above caveats in documentation with the recommendation that VACUUM should be run outside
>> a transaction, in general.
>>
>
> Sorry, I just noticed that you have already mentioned some of these in the documentation as follows, so it seems
> it is already taken care of.
>
> +    <command>VACUUM</command> cannot be executed inside a transaction block,
> +    unless a single table is specified and <literal>FULL</literal> is not
> +    specified.  When executing inside a transaction block the vacuum scan can
> +    hold back the xmin horizon and does not update the database datfrozenxid,
> +    as a result this usage is not useful for database maintenance, but is provided
> +    to allow vacuuming in special circumstances, such as temporary or private
> +    work tables.

Yes, I wondered whether we should have a NOTICE or WARNING to remind
people of those points?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Rahila Syed

Date:

04 November 2022, 11:48:08

Hi,

On Fri, Nov 4, 2022 at 2:39 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

Hi Rahila,

Thanks for your review.

On Fri, 4 Nov 2022 at 07:37, Rahila Syed <rahilasyed90@gmail.com> wrote:

>> I would like to bring up a few points that I came across while looking into the vacuum code.
>>
>> 1. As a result of this change to allow VACUUM inside a user transaction, I think there is some chance of causing
>> a block/delay of concurrent VACUUMs if a VACUUM is being run under a long running transaction.
>> 2. Also, if a user runs VACUUM in a transaction, performance optimizations like PROC_IN_VACUUM won't work.
>> 3. Also, if VACUUM happens towards the end of a long running transaction, the snapshot will be old
>> and xmin horizon for vacuum would be somewhat old as compared to current lazy vacuum which
>> acquires a new snapshot just before scanning the table.
>>
>> So, while I understand the need of the feature, I am wondering if there should be some mention
>> of above caveats in documentation with the recommendation that VACUUM should be run outside
>> a transaction, in general.
>>
>
> Sorry, I just noticed that you have already mentioned some of these in the documentation as follows, so it seems
> it is already taken care of.
>
> + <command>VACUUM</command> cannot be executed inside a transaction block,
> + unless a single table is specified and <literal>FULL</literal> is not
> + specified. When executing inside a transaction block the vacuum scan can
> + hold back the xmin horizon and does not update the database datfrozenxid,
> + as a result this usage is not useful for database maintenance, but is provided
> + to allow vacuuming in special circumstances, such as temporary or private
> + work tables.

Yes, I wondered whether we should have a NOTICE or WARNING to remind
people of those points?

+1 . My vote for NOTICE over WARNING because I think

it is useful information for the user rather than any potential problem.

Thank you,

Rahila Syed

Re: Allow single table VACUUM in transaction block

From

Peter Geoghegan

Date:

06 November 2022, 18:50:24

On Thu, Oct 27, 2022 at 2:31 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> Fix, so that this works without issue:
>
> BEGIN;
> ....
> VACUUM (ANALYZE) vactst;
> ....
> COMMIT;
>
> Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.
>
> When in a xact block, we do not set PROC_IN_VACUUM,
> nor update datfrozenxid.

It doesn't seem like a good idea to add various new special cases to
VACUUM just to make scripts like this work. I'm pretty sure that there
are several deep, subtle reasons why VACUUM cannot be assumed safe to
run in a user transaction.

For example, the whole way that index page deletion is decoupled from
recycling in access methods like nbtree (see "Placing deleted pages in
the FSM" from the nbtree README) rests upon delicate assumptions about
whether or not there could be an "in-flight" B-tree descent that is
at risk of landing on a deleted page as it is concurrently recycled.
In general the deleted page has to remain in place as a tombstone,
until that is definitely not possible anymore. This relies on the backend
that runs VACUUM having no references to the page pending deletion.
(Commit 9dd963ae25 added an optimization that heavily leaned on the
idea that the state within the backend running VACUUM couldn't
possibly have a live page reference that is at risk of being broken by
the optimization, though I'm pretty sure that you'd have problems even
without that commit/optimization in place.)

My guess is that there are more things like that. Possibly even things
that were never directly considered. VACUUM evolved in a world where
we absolutely took not running in a transaction for granted. Changing
that now is a pretty big deal. Maybe it would all be worth it if the end
result was a super compelling feature. But I for one don't think that
it will be.

If we absolutely have to do this, then the least worst approach might
be to make VACUUM into a no-op rather than throwing an ERROR -- demote
the ERROR into a WARNING. You could argue that we're just arbitrarily
deciding to not do a VACUUM just to be able to avoid throwing an error
if we do that. But isn't that already true with the patch that we
have? Is it really a "true VACUUM" if the operation can never advance
datfrozenxid? At least selectively demoting the ERROR to a WARNING is
"transparent" about it.

--
Peter Geoghegan

Re: Allow single table VACUUM in transaction block

From

Tom Lane

Date:

06 November 2022, 19:14:28

Peter Geoghegan <pg@bowt.ie> writes:
> My guess is that there are more things like that. Possibly even things
> that were never directly considered. VACUUM evolved in a world where
> we absolutely took not running in a transaction for granted. Changing
> that now is a pretty big deal. Maybe it would all be worth it if the end
> result was a super compelling feature. But I for one don't think that
> it will be.

Yeah.  To be blunt, this proposal scares the sh*t out of me.  I agree
that there are tons of subtle dependencies on our current assumptions
about how VACUUM works, and I strongly suspect that we won't find all of
them until after users have lost data.  I cannot believe that running
VACUUM inside transactions is valuable enough to take that risk ...
especially if it's a modified limited kind of VACUUM that doesn't
eliminate the need for periodic real VACUUMs.

In general, I do not believe in encouraging users to run VACUUM
manually in the first place.  We would be far better served by
spending our effort to improve autovacuum's shortcomings.  I'd
like to see some sort of direct attack on its inability to deal
with temp tables, for instance.  (Force the owning backend to
do it?  Temporarily change the access rules so that the data
moves to shared buffers?  Dunno, but we sure haven't tried hard.)
However many bugs such a thing might have initially, at least
they'd only put temporary data at hazard.

            regards, tom lane

Re: Allow single table VACUUM in transaction block

From

Peter Geoghegan

Date:

06 November 2022, 20:40:30

On Sun, Nov 6, 2022 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In general, I do not believe in encouraging users to run VACUUM
> manually in the first place.  We would be far better served by
> spending our effort to improve autovacuum's shortcomings.

I couldn't agree more. A lot of problems seem related to the idea that
VACUUM is just a command that the DBA periodically runs to get a
predictable fixed result, a little like CREATE INDEX. That conceptual
model isn't exactly wrong; it just makes it much harder to apply any
kind of context about the needs of the table over time. There is a
natural cycle to how VACUUM (really autovacuum) is run, and the
details matter.

There is a significant amount of relevant context that we can't really
use right now. That wouldn't be true if VACUUM only ran within an
autovacuum worker (by definition). The VACUUM command itself would
still be available, and support the same user interface, more or less.
Under the hood the VACUUM command would work by enqueueing a VACUUM
job, to be performed asynchronously by an autovacuum worker. Perhaps
the initial enqueue operation could be transactional, fixing Simon's complaint.

"No more VACUUMs outside of autovacuum" would enable more advanced
autovacuum.c scheduling, allowing us to apply a lot more context about
the costs and benefits, without having to treat manual VACUUM as an
independent thing. We could coalesce together redundant VACUUM jobs,
suspend and resume VACUUM operations, and have more strategies to deal
with problems as they emerge.

> I'd like to see some sort of direct attack on its inability to deal
> with temp tables, for instance.  (Force the owning backend to
> do it?  Temporarily change the access rules so that the data
> moves to shared buffers?  Dunno, but we sure haven't tried hard.)

This is a good example of the kind of thing I have in mind. Perhaps it
could work by killing the backend that owns the temp relation when
things truly get out of hand? I think that that would be a perfectly
reasonable trade-off.

Another related idea: better behavior in the event of a manually
issued VACUUM (now just an enqueued autovacuum) that cannot do useful
work due to the presence of a long running snapshot. The VACUUM
doesn't have to dutifully report "success" when there is no practical
sense in which it was successful. There could be a back and forth
conversation between autovacuum.c and vacuumlazy.c that makes sure
that something useful happens sooner or later. The passage of time
really matters here.

As a bonus, we might be able to get rid of the autovacuum GUC
variants. Plus the current autovacuum logging would just be how we'd
log every VACUUM.

--
Peter Geoghegan

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

07 November 2022, 07:58:13

On Sun, 6 Nov 2022 at 18:50, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Oct 27, 2022 at 2:31 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > Fix, so that this works without issue:
> >
> > BEGIN;
> > ....
> > VACUUM (ANALYZE) vactst;
> > ....
> > COMMIT;
> >
> > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.
> >
> > When in a xact block, we do not set PROC_IN_VACUUM,
> > nor update datfrozenxid.
>
> It doesn't seem like a good idea to add various new special cases to
> VACUUM just to make scripts like this work.

Usability is a major concern that doesn't get a high enough priority.

> I'm pretty sure that there
> are several deep, subtle reasons why VACUUM cannot be assumed safe to
> run in a user transaction.

I expected there were, so it's good to discuss them. Thanks for the input.

> If we absolutely have to do this, then the least worst approach might
> be to make VACUUM into a no-op rather than throwing an ERROR -- demote
> the ERROR into a WARNING. You could argue that we're just arbitrarily
> deciding to not do a VACUUM just to be able to avoid throwing an error
> if we do that. But isn't that already true with the patch that we
> have? Is it really a "true VACUUM" if the operation can never advance
> datfrozenxid? At least selectively demoting the ERROR to a WARNING is
> "transparent" about it.

I'll answer that part in my reply to Tom, since there are good ideas in both.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

07 November 2022, 08:20:32

On Sun, 6 Nov 2022 at 20:40, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sun, Nov 6, 2022 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > In general, I do not believe in encouraging users to run VACUUM
> > manually in the first place.  We would be far better served by
> > spending our effort to improve autovacuum's shortcomings.
>
> I couldn't agree more. A lot of problems seem related to the idea that
> VACUUM is just a command that the DBA periodically runs to get a
> predictable fixed result, a little like CREATE INDEX. That conceptual
> model isn't exactly wrong; it just makes it much harder to apply any
> kind of context about the needs of the table over time. There is a
> natural cycle to how VACUUM (really autovacuum) is run, and the
> details matter.
>
> There is a significant amount of relevant context that we can't really
> use right now. That wouldn't be true if VACUUM only ran within an
> autovacuum worker (by definition). The VACUUM command itself would
> still be available, and support the same user interface, more or less.
> Under the hood the VACUUM command would work by enqueueing a VACUUM
> job, to be performed asynchronously by an autovacuum worker. Perhaps
> the initial enqueue operation could be transactional, fixing Simon's complaint.

Ah, I see you got to this idea first!

Yes, what we need is for the "VACUUM command" to not fail in a script.
Not sure anyone cares where the work takes place.

Enqueuing a request for autovacuum to do that work, then blocking
until it is complete would do the job.

> "No more VACUUMs outside of autovacuum" would enable more advanced
> autovacuum.c scheduling, allowing us to apply a lot more context about
> the costs and benefits, without having to treat manual VACUUM as an
> independent thing. We could coalesce together redundant VACUUM jobs,
> suspend and resume VACUUM operations, and have more strategies to deal
> with problems as they emerge.

+1, but clearly this would not make temp table VACUUMs work.

> > I'd like to see some sort of direct attack on its inability to deal
> > with temp tables, for instance.  (Force the owning backend to
> > do it?  Temporarily change the access rules so that the data
> > moves to shared buffers?  Dunno, but we sure haven't tried hard.)

This was a $DIRECT attack on making temp tables work! ;-)

Temp tables are actually easier, since we don't need any of the
concurrency features we get with lazy vacuum. So the answer is to
always run a VACUUM FULL on temp tables since this skips any issues
with indexes etc..

We would need to check a few things first.... maybe something like
this (mostly borrowed heavily from COPY)

        InvalidateCatalogSnapshot();
        if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
            ereport(WARNING,
                    (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
                     errmsg("vacuum of temporary table ignored because
of prior transaction activity")));
       CheckTableNotInUse(rel, "VACUUM");

> This is a good example of the kind of thing I have in mind. Perhaps it
> could work by killing the backend that owns the temp relation when
> things truly get out of hand? I think that that would be a perfectly
> reasonable trade-off.

+1

> Another related idea: better behavior in the event of a manually
> issued VACUUM (now just an enqueued autovacuum) that cannot do useful
> work due to the presence of a long running snapshot. The VACUUM
> doesn't have to dutifully report "success" when there is no practical
> sense in which it was successful. There could be a back and forth
> conversation between autovacuum.c and vacuumlazy.c that makes sure
> that something useful happens sooner or later. The passage of time
> really matters here.

Regrettably, neither vacuum nor autovacuum waits for xmin to change;
perhaps it should.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Peter Geoghegan

Date:

08 November 2022, 00:34:27

On Mon, Nov 7, 2022 at 12:20 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> > Another related idea: better behavior in the event of a manually
> > issued VACUUM (now just an enqueued autovacuum) that cannot do useful
> > work due to the presence of a long running snapshot. The VACUUM
> > doesn't have to dutifully report "success" when there is no practical
> > sense in which it was successful. There could be a back and forth
> > conversation between autovacuum.c and vacuumlazy.c that makes sure
> > that something useful happens sooner or later. The passage of time
> > really matters here.
>
> Regrettably, neither vacuum nor autovacuum waits for xmin to change;
> perhaps it should.

Yes, it's very primitive right now. In fact I recently discovered that
just using the reloption version (not the GUC version) of
autovacuum_freeze_max_age in a totally straightforward way is all it
takes to utterly confuse autovacuum.c:

https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com

It's easy to convince autovacuum.c to launch antiwraparound
autovacuums that reliably have no chance of advancing relfrozenxid to
a degree that satisfies autovacuum.c. It will launch antiwraparound
autovacuums again and again, never realizing that VACUUM doesn't
really care about what it expects (at least not with the reloption in
use). Clearly that's just broken. It also suggests a more general
design problem, at least in my mind.

--
Peter Geoghegan

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

08 November 2022, 03:10:03

On Mon, 7 Nov 2022 at 08:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> Temp tables are actually easier, since we don't need any of the
> concurrency features we get with lazy vacuum. So the answer is to
> always run a VACUUM FULL on temp tables since this skips any issues
> with indexes etc..

So I see 3 options for what to do next

1. Force the FULL option for all tables, when executed in a
transaction block. This gets round the reasonable objections to
running a concurrent vacuum in a shared xact block. As Justin points
out, CLUSTER is already supported, which uses the same code.

2. Force the FULL option for temp tables, when executed in a
transaction block. In a later patch, queue up an autovacuum run for
regular tables.

3. Return with feedback this patch. (But then what happens with temp tables?)

Thoughts?

--
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

14 November 2022, 19:52:04

On Tue, 8 Nov 2022 at 03:10, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Mon, 7 Nov 2022 at 08:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> > Temp tables are actually easier, since we don't need any of the
> > concurrency features we get with lazy vacuum.

> Thoughts?

New patch, which does this, when in a xact block

1. For temp tables, only VACUUM FULL is allowed
2. For persistent tables, an AV task is created to perform the vacuum,
which eventually performs a vacuum

The patch works, but there are various aspects of the design that need
input. Thoughts?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

single_table_vacuum_in_xact.v3.patch

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

15 November 2022, 09:13:36

On Mon, 14 Nov 2022 at 19:52, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Tue, 8 Nov 2022 at 03:10, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > On Mon, 7 Nov 2022 at 08:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > > Temp tables are actually easier, since we don't need any of the
> > > concurrency features we get with lazy vacuum.
>
> > Thoughts?
>
> New patch, which does this, when in a xact block
>
> 1. For temp tables, only VACUUM FULL is allowed
> 2. For persistent tables, an AV task is created to perform the vacuum,
> which eventually performs a vacuum
>
> The patch works, but there are various aspects of the design that need
> input. Thoughts?

New version.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

single_table_vacuum_in_xact.v4.patch

Re: Allow single table VACUUM in transaction block

From

Greg Stark

Date:

16 November 2022, 22:14:07

I think the idea of being able to request an autovacuum worker for a
specific table is actually very good. I think it's what most users
actually want when they are running vacuum. In fact in previous jobs
people have built infrastructure that basically duplicates autovacuum
just so they could do this.

However I'm not a fan of commands that sometimes do one thing and
sometimes magically do something very different. I don't like the idea
that the same vacuum command would sometimes run in-process and
sometimes do this out of process request. And the rules for when it
does which are fairly complex to explain -- it runs in process unless
you're in a transaction when it runs out of process unless it's a
temporary table ...

I think this requesting autovacuum worker should be a distinct
command. Or at least an explicit option to vacuum.

Also, I was confused reading the thread above about mention of VACUUM
FULL. I don't understand why it's relevant at all. We certainly can't
force VACUUM FULL when it wasn't requested on potentially large
tables.

Re: Allow single table VACUUM in transaction block

From

Tom Lane

Date:

16 November 2022, 22:26:03

Greg Stark <stark@mit.edu> writes:
> I think this requesting autovacuum worker should be a distinct
> command. Or at least an explicit option to vacuum.

+1.  That'd reduce confusion, and perhaps we could remove some
of the restrictions.

            regards, tom lane

Re: Allow single table VACUUM in transaction block

From

Justin Pryzby

Date:

17 November 2022, 20:00:45

On Wed, Nov 16, 2022 at 05:14:07PM -0500, Greg Stark wrote:
> I think this requesting autovacuum worker should be a distinct
> command. Or at least an explicit option to vacuum.

+1.  I was going to suggest VACUUM (NOWAIT) ..

-- 
Justin

Re: Allow single table VACUUM in transaction block

From

Robert Haas

Date:

17 November 2022, 20:06:43

On Wed, Nov 16, 2022 at 5:14 PM Greg Stark <stark@mit.edu> wrote:
> However I'm not a fan of commands that sometimes do one thing and
> sometimes magically do something very different. I don't like the idea
> that the same vacuum command would sometimes run in-process and
> sometimes do this out of process request. And the rules for when it
> does which are fairly complex to explain -- it runs in process unless
> you're in a transaction when it runs out of process unless it's a
> temporary table ...

100% agree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

18 November 2022, 11:54:25

On Thu, 17 Nov 2022 at 20:00, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Nov 16, 2022 at 05:14:07PM -0500, Greg Stark wrote:
> > I think this requesting autovacuum worker should be a distinct
> > command. Or at least an explicit option to vacuum.
>
> +1.  I was going to suggest VACUUM (NOWAIT) ..

Yes, I have no problem with an explicit command.

At the moment the patch runs VACUUM in the background in an autovacuum
process, but the call is asynchronous, since we do not wait for the
command to finish (or even start).

So the command names I was thinking of would be one of these:

VACUUM (BACKGROUND) or VACUUM (AUTOVACUUM) - which might be clearer
or
VACUUM (ASYNC) - which is more descriptive of the behavior

or we could go for both
VACUUM (BACKGROUND, ASYNC) - since this allows us to have a
BACKGROUND, SYNC version in the future

Thoughts?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

18 November 2022, 12:04:02

On Thu, 17 Nov 2022 at 20:06, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Nov 16, 2022 at 5:14 PM Greg Stark <stark@mit.edu> wrote:
> > However I'm not a fan of commands that sometimes do one thing and
> > sometimes magically do something very different. I don't like the idea
> > that the same vacuum command would sometimes run in-process and
> > sometimes do this out of process request. And the rules for when it
> > does which are fairly complex to explain -- it runs in process unless
> > you're in a transaction when it runs out of process unless it's a
> > temporary table ...
>
> 100% agree.

I agree as well.

At the moment, the problem (OT) is that VACUUM behaves inconsistently

Outside a transaction - works perfectly
In a transaction - throws ERROR, which prevents a whole script from
executing correctly

What we are trying to do is avoid the ERROR. I don't want them to
behave like this, but that's the only option possible to avoid ERROR.

So if consistency is also a strong requirement, then maybe we should
make that new command the default, i.e. make VACUUM always just a
request to vacuum in background. That way it will be consistent.

Can we at least have a vacuum_runs_in_background = on | off, to allow
users to take advantage of this WITHOUT needing to rewrite all of
their scripts?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Robert Haas

Date:

18 November 2022, 16:59:59

On Fri, Nov 18, 2022 at 7:04 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> Outside a transaction - works perfectly
> In a transaction - throws ERROR, which prevents a whole script from
> executing correctly

Right, but your proposal would move that inconsistency to a different
place. It wouldn't eliminate it. I don't think we can pretend that
nobody will notice their operation being moved to the background. For
instance, there might not be an available background worker for a long
time, which could mean that some vacuums work right away and others
just sit there for reasons that aren't obvious to the user.

> So if consistency is also a strong requirement, then maybe we should
> make that new command the default, i.e. make VACUUM always just a
> request to vacuum in background. That way it will be consistent.

Since one fairly common reason for running vacuum in the foreground is
needing to vacuum a table when all autovacuum workers are busy, or
when they are vacuuming it with a cost limit and it needs to get done
sooner, I think this would surprise a lot of users in a negative way.

> Can we at least have a vacuum_runs_in_background = on | off, to allow
> users to take advantage of this WITHOUT needing to rewrite all of
> their scripts?

I'm not entirely convinced that's a good idea, but happy to hear what
others think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow single table VACUUM in transaction block

From

Tom Lane

Date:

18 November 2022, 18:26:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Nov 18, 2022 at 7:04 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
>> So if consistency is also a strong requirement, then maybe we should
>> make that new command the default, i.e. make VACUUM always just a
>> request to vacuum in background. That way it will be consistent.

> Since one fairly common reason for running vacuum in the foreground is
> needing to vacuum a table when all autovacuum workers are busy, or
> when they are vacuuming it with a cost limit and it needs to get done
> sooner, I think this would surprise a lot of users in a negative way.

It would also break a bunch of our regression tests, which expect a
VACUUM to complete immediately.

>> Can we at least have a vacuum_runs_in_background = on | off, to allow
>> users to take advantage of this WITHOUT needing to rewrite all of
>> their scripts?

> I'm not entirely convinced that's a good idea, but happy to hear what
> others think.

I think the answer to that one is flat no.  We learned long ago that GUCs
with significant semantic impact on queries are a bad idea.  For example,
if a user issues VACUUM expecting behavior A and she gets behavior B
because somebody changed the postgresql.conf entry, she won't be happy.

Basically, I am not buying Simon's requirement that this be transparent.
I think the downsides would completely outweigh whatever upside there
may be (and given the shortage of prior complaints, I don't think the
upside is very large).

            regards, tom lane

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

21 November 2022, 13:36:41

On Fri, 18 Nov 2022 at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Nov 18, 2022 at 7:04 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> >> So if consistency is also a strong requirement, then maybe we should
> >> make that new command the default, i.e. make VACUUM always just a
> >> request to vacuum in background. That way it will be consistent.
>
> > Since one fairly common reason for running vacuum in the foreground is
> > needing to vacuum a table when all autovacuum workers are busy, or
> > when they are vacuuming it with a cost limit and it needs to get done
> > sooner, I think this would surprise a lot of users in a negative way.
>
> It would also break a bunch of our regression tests, which expect a
> VACUUM to complete immediately.
>
> >> Can we at least have a vacuum_runs_in_background = on | off, to allow
> >> users to take advantage of this WITHOUT needing to rewrite all of
> >> their scripts?
>
> > I'm not entirely convinced that's a good idea, but happy to hear what
> > others think.
>
> I think the answer to that one is flat no.  We learned long ago that GUCs
> with significant semantic impact on queries are a bad idea.  For example,
> if a user issues VACUUM expecting behavior A and she gets behavior B
> because somebody changed the postgresql.conf entry, she won't be happy.
>
> Basically, I am not buying Simon's requirement that this be transparent.
> I think the downsides would completely outweigh whatever upside there
> may be (and given the shortage of prior complaints, I don't think the
> upside is very large).

Just to say I'm happy with that decision and will switch to the
request for a background vacuum.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

21 November 2022, 15:07:25

On Fri, 18 Nov 2022 at 11:54, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 17 Nov 2022 at 20:00, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Nov 16, 2022 at 05:14:07PM -0500, Greg Stark wrote:
> > > I think this requesting autovacuum worker should be a distinct
> > > command. Or at least an explicit option to vacuum.
> >
> > +1.  I was going to suggest VACUUM (NOWAIT) ..
>
> Yes, I have no problem with an explicit command.
>
> At the moment the patch runs VACUUM in the background in an autovacuum
> process, but the call is asynchronous, since we do not wait for the
> command to finish (or even start).
>
> So the command names I was thinking of would be one of these:
>
> VACUUM (BACKGROUND) or VACUUM (AUTOVACUUM) - which might be clearer
> or
> VACUUM (ASYNC) - which is more descriptive of the behavior
>
> or we could go for both
> VACUUM (BACKGROUND, ASYNC) - since this allows us to have a
> BACKGROUND, SYNC version in the future


Attached patch implements VACUUM (BACKGROUND).

There are quite a few small details to consider; please read the docs
and comments.

There is a noticeable delay before the background vacuum starts.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

background_vacuum.v3.patch

Re: Allow single table VACUUM in transaction block

From

Justin Pryzby

Date:

22 November 2022, 16:43:25

On Mon, Nov 21, 2022 at 03:07:25PM +0000, Simon Riggs wrote:
> Attached patch implements VACUUM (BACKGROUND).
> 
> There are quite a few small details to consider; please read the docs
> and comments.
> 
> There is a noticeable delay before the background vacuum starts.

You disallowed some combinations of unsupported options, but not others,
like FULL, PARALLEL, etc.  They should either be supported or
prohibited.

+                                       /* use default values */
+                                       tab.at_params.log_min_duration = 0;

0 isn't the default ?

Maybe VERBOSE should mean to set min_duration=0, otherwise it should use
the default ?

You only handle one rel, but ExecVacuum() has a loop around rels.

+NOTICE:  autovacuum of "vactst" has been requested, using the options specified

=> I don't think it's useful to say "using the options specified".

Should autovacuum de-duplicate requests ?
BRIN doesn't do that, but it's intended for append-only tables, so the
issue doesn't really come up.

Could add psql tab-completion.

Is it going to be confusing that the session's GUC variables won't be
transmitted to autovacuum ?  For example, the freeze and costing
parameters.

-- 
Justin

Re: Allow single table VACUUM in transaction block

From

Simon Riggs

Date:

22 November 2022, 17:16:59

On Tue, 22 Nov 2022 at 16:43, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Nov 21, 2022 at 03:07:25PM +0000, Simon Riggs wrote:
> > Attached patch implements VACUUM (BACKGROUND).
> >
> > There are quite a few small details to consider; please read the docs
> > and comments.
> >
> > There is a noticeable delay before the background vacuum starts.
>
> You disallowed some combinations of unsupported options, but not others,
> like FULL, PARALLEL, etc.  They should either be supported or
> prohibited.
>
> +                                       /* use default values */
> +                                       tab.at_params.log_min_duration = 0;
>
> 0 isn't the default ?
>
> Maybe VERBOSE should mean to set min_duration=0, otherwise it should use
> the default ?

+1

> You only handle one rel, but ExecVacuum() has a loop around rels.
>
> +NOTICE:  autovacuum of "vactst" has been requested, using the options specified
>
> => I don't think it's useful to say "using the options specified".
>
> Should autovacuum de-duplicate requests ?
> BRIN doesn't do that, but it's intended for append-only tables, so the
> issue doesn't really come up.

Easy to do

> Could add psql tab-completion.
>
> Is it going to be confusing that the session's GUC variables won't be
> transmitted to autovacuum ?  For example, the freeze and costing
> parameters.

I think we should start with the "how do I want it to behave" parts of
the above and leave spelling and tab completion as final items.

Other questions are whether there should be a limit on number of
background vacuums submitted at any time.
Whether there should be a GUC that specifies the max number of queued tasks.
Do we need a query that shows what items are queued?
etc

Justin, if you wanted to take up the patch from here, I would be more
than happy. You have the knowledge and insight to make this work
right.

We should probably start a new CF patch entry so we can return the
original patch as rejected, then continue with this new idea
separately.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Allow single table VACUUM in transaction block

From

Justin Pryzby

Date:

23 November 2022, 21:46:36

On Tue, Nov 22, 2022 at 05:16:59PM +0000, Simon Riggs wrote:
> Justin, if you wanted to take up the patch from here, I would be more
> than happy. You have the knowledge and insight to make this work
> right.

I have no particular use for this, so I wouldn't be a good person to
finish or shepherd the patch.

-- 
Justin