Re: More aggressive vacuuming of temporary tables - Mailing list pgsql-hackers

From Andres Freund
Subject Re: More aggressive vacuuming of temporary tables
Date
Msg-id 20200909180205.k3pff6kmaim6hmvb@alap3.anarazel.de
Whole thread Raw
In response to Re: More aggressive vacuuming of temporary tables  (Stephen Frost <sfrost@snowman.net>)
Responses Re: More aggressive vacuuming of temporary tables
List pgsql-hackers
Hi,

On 2020-09-09 10:14:04 -0400, Stephen Frost wrote:
> > I've been toying with a patch that introduces more smarts about when a
> > row is removable, by looking more closely whether a specific row
> > versions are visible (e.g. in the common case of one old snapshot and
> > lots of newer rows). But that's orders of magnitude more complicated. So
> > going for something as simple as this seems like a good idea.
> 
> I've wondered about this for a long time- very cool that you've found
> time to actually work on a patch.  A couple of different ideas were
> discussed previously about how to do that kind of a check- mind talking
> about what method you're using, or perhaps just sharing that patch? :)

It's very very early, and it doesn't really work. I basically tried to
just plug a bit more intelligence into the dead tuple detection (which
now can easily store more and incrementally built state with the recent
changes for snapshot scalability).  Detection that tuples newer than the
horizon are dead isn't that complicated - what's hard is not breaking
things due to ctid chains lacking intermediate versions.  To avoid that
I had to restrict it to inserted (not updated) tuples that were
subsequently deleted. And my heuristic only supported only one old
snapshot.

Building a bsearchable list of ranges of valid (xmin-xmax] ranges isn't
that hard. Some care needs to be taken to make the list non-overlapping,
but that's easy enough by just merging entries.

Obviously lookup in such a more complicated structure isn't free. Nor is
building it. So we'd need some heuristics about when to do so. It'd
probably be OK to occasionally look at the age of the oldest in-progress
statement, to infer the time for old snapshots based on that. And then
we could have a GUC that says when it's worth doing more complicated
lookups.

I don't have a handle on how to deal with the ctid chaining for
intermediate row versions.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Global snapshots
Next
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes