Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Allow "snapshot too old" error, to prevent bloat
Date
Msg-id 20150218223226.GD6717@tamriel.snowman.net
Whole thread Raw
In response to Re: Allow "snapshot too old" error, to prevent bloat  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Allow "snapshot too old" error, to prevent bloat  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Kevin,

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Magnus Hagander <magnus@hagander.net> wrote:
> > On Feb 17, 2015 12:26 AM, "Andres Freund" <andres@2ndquadrant.com> wrote:
> >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
>
> >>> It seems we already have a mechanism in place that allows
> >>> tuning of query cancel on standbys vs. preventing standby
> >>> queries from seeing old data, specifically
> >>> max_standby_streaming_delay/max_standby_archive_delay.  We
> >>> obsessed about how users were going to react to these odd
> >>> variables, but there has been little negative feedback.
> >>
> >> FWIW, I think that's a somewhat skewed perception. I think it
> >> was right to introduce those, because we didn't really have any
> >> alternatives.
>
> As far as I can see, the "alternatives" suggested so far are all
> about causing heap bloat to progress more slowly, but still without
> limit.  I suggest, based on a lot of user feedback (from the
> customer I've talked about at some length on this thread, as well
> as numerous others), that unlimited bloat based on the activity of
> one connection is a serious deficiency in the product; and that
> there is no real alternative to something like a "snapshot too old"
> error if we want to fix that deficiency.  Enhancements to associate
> a snapshot with a database and using a separate vacuum xmin per
> database, not limiting vacuum of a particular object by snapshots
> that cannot see that snapshot, etc., are all Very Good Things and I
> hope those changes are made, but none of that fixes a very
> fundamental flaw.

I also agree with the general idea that it makes sense to provide a way
to control bloat, but I think you've missed what Andres was getting at
with his suggestion (if I understand correctly, apologies if I don't).

The problem is that we're only looking at the overall xmin / xmax
horizon when it comes to deciding if a given tuple is dead.  That's
not quite right- the tuple might actually be dead to all *current*
transactions by being newer than the oldest transaction but dead for all
later transactions.  Basically, there exist gaps between our cluster
wide xmin / xmax where we might find actually dead rows.  Marking those
rows dead and reusable *would* stop the bloat, not just slow it down.

In the end, with a single long-running transaction, the worst bloat you
would have is double the size of the system at the time the long-running
transaction started.  Another way of thinking about it is with this
timeline:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/\                  /\             /\                 /\
Long running        row created    row deleted        vacuum
transaction starts

Now, if all transactions currently running started either before the row
was created or after the row was deleted, then the row is dead and
vacuum can reclaim it.  Finding those gaps in the transaction space for
all of the currently running processes might not be cheap, but for this
kind of a use-case, it might be worth the effort.  On a high-churn
system where the actual set of rows being copied over constantly isn't
very high then you could end up with some amount of duplication of rows-
those to satisfy the ancient transaction and the current working set,
but there wouldn't be any further bloat and it'd almost cerainly be a
much better situation than what is being seen now.

> Particularly my initial suggestion, which was to base snapshot
> "age" it on the number of transaction IDs assigned.  Does this look
> any better to you if it is something that can be set to '20min' or
> '1h'?  Just to restate, that would not automatically cancel the
> snapshots past that age; it would allow vacuum of any tuples which
> became "dead" that long ago, and would cause a "snapshot too old"
> message for any read of a page modified more than that long ago
> using a snapshot which was older than that.
>
> Unless this idea gets some +1 votes Real Soon Now, I will mark the
> patch as Rejected and move on.

I'm not against having a knob like this, which is defaulted to off,
but I do think we'd be a lot better off with a system that could
realize when rows are not visible to any currently running transaction
and clean them up.  If this knob's default is off then I don't think
we'd be likely to get the complaints which are being discussed (or, if
we did, we could point the individual at the admin who set the knob...).
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Next
From: Petr Jelinek
Date:
Subject: Re: Replication identifiers, take 4