Re: Allow single table VACUUM in transaction block - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Allow single table VACUUM in transaction block
Date
Msg-id CANbhV-EV=Xwhz+nZ+1qwbFOQWgEi-z9drRgzJeETpc02wOVjsg@mail.gmail.com
Whole thread Raw
In response to Re: Allow single table VACUUM in transaction block  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Allow single table VACUUM in transaction block
Re: Allow single table VACUUM in transaction block
List pgsql-hackers
On Sun, 6 Nov 2022 at 20:40, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sun, Nov 6, 2022 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > In general, I do not believe in encouraging users to run VACUUM
> > manually in the first place.  We would be far better served by
> > spending our effort to improve autovacuum's shortcomings.
>
> I couldn't agree more. A lot of problems seem related to the idea that
> VACUUM is just a command that the DBA periodically runs to get a
> predictable fixed result, a little like CREATE INDEX. That conceptual
> model isn't exactly wrong; it just makes it much harder to apply any
> kind of context about the needs of the table over time. There is a
> natural cycle to how VACUUM (really autovacuum) is run, and the
> details matter.
>
> There is a significant amount of relevant context that we can't really
> use right now. That wouldn't be true if VACUUM only ran within an
> autovacuum worker (by definition). The VACUUM command itself would
> still be available, and support the same user interface, more or less.
> Under the hood the VACUUM command would work by enqueueing a VACUUM
> job, to be performed asynchronously by an autovacuum worker. Perhaps
> the initial enqueue operation could be transactional, fixing Simon's complaint.

Ah, I see you got to this idea first!

Yes, what we need is for the "VACUUM command" to not fail in a script.
Not sure anyone cares where the work takes place.

Enqueuing a request for autovacuum to do that work, then blocking
until it is complete would do the job.

> "No more VACUUMs outside of autovacuum" would enable more advanced
> autovacuum.c scheduling, allowing us to apply a lot more context about
> the costs and benefits, without having to treat manual VACUUM as an
> independent thing. We could coalesce together redundant VACUUM jobs,
> suspend and resume VACUUM operations, and have more strategies to deal
> with problems as they emerge.

+1, but clearly this would not make temp table VACUUMs work.

> > I'd like to see some sort of direct attack on its inability to deal
> > with temp tables, for instance.  (Force the owning backend to
> > do it?  Temporarily change the access rules so that the data
> > moves to shared buffers?  Dunno, but we sure haven't tried hard.)

This was a $DIRECT attack on making temp tables work! ;-)

Temp tables are actually easier, since we don't need any of the
concurrency features we get with lazy vacuum. So the answer is to
always run a VACUUM FULL on temp tables since this skips any issues
with indexes etc..

We would need to check a few things first.... maybe something like
this (mostly borrowed heavily from COPY)

        InvalidateCatalogSnapshot();
        if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
            ereport(WARNING,
                    (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
                     errmsg("vacuum of temporary table ignored because
of prior transaction activity")));
       CheckTableNotInUse(rel, "VACUUM");

> This is a good example of the kind of thing I have in mind. Perhaps it
> could work by killing the backend that owns the temp relation when
> things truly get out of hand? I think that that would be a perfectly
> reasonable trade-off.

+1

> Another related idea: better behavior in the event of a manually
> issued VACUUM (now just an enqueued autovacuum) that cannot do useful
> work due to the presence of a long running snapshot. The VACUUM
> doesn't have to dutifully report "success" when there is no practical
> sense in which it was successful. There could be a back and forth
> conversation between autovacuum.c and vacuumlazy.c that makes sure
> that something useful happens sooner or later. The passage of time
> really matters here.

Regrettably, neither vacuum nor autovacuum waits for xmin to change;
perhaps it should.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Karina Litskevich
Date:
Subject: Re: Error for WITH options on partitioned tables