Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Allow "snapshot too old" error, to prevent bloat
Date
Msg-id 20150219011401.GG6717@tamriel.snowman.net
Whole thread Raw
In response to Re: Allow "snapshot too old" error, to prevent bloat  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Allow "snapshot too old" error, to prevent bloat  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > I also agree with the general idea that it makes sense to provide a way
> > to control bloat, but I think you've missed what Andres was getting at
> > with his suggestion (if I understand correctly, apologies if I don't).
> >
> > The problem is that we're only looking at the overall xmin / xmax
> > horizon when it comes to deciding if a given tuple is dead.  That's
> > not quite right- the tuple might actually be dead to all *current*
> > transactions by being newer than the oldest transaction but dead for all
> > later transactions.  Basically, there exist gaps between our cluster
> > wide xmin / xmax where we might find actually dead rows.  Marking those
> > rows dead and reusable *would* stop the bloat, not just slow it down.
> >
> > In the end, with a single long-running transaction, the worst bloat you
> > would have is double the size of the system at the time the long-running
> > transaction started.
>
> I agree that limiting bloat to one dead tuple for every live one
> for each old snapshot is a limit that has value, and it was unfair
> of me to characterize that as not being a limit.  Sorry for that.
>
> This possible solution was discussed with the user whose feedback
> caused me to write this patch, but there were several reasons they
> dismissed that as a viable total solution for them, two of which I
> can share:
>
> (1)  They have a pool of connections each of which can have several
> long-running cursors, so the limit from that isn't just doubling
> the size of their database, it is limiting it to some two-or-three
> digit multiple of the necessary size.

This strikes me as a bit off-the-cuff; was an analysis done which
deteremined that would be the result?  If there is overlap between the
long-running cursors then there would be less bloat, and most systems
which I'm familiar with don't turn the entire database over in 20
minutes, 20 hours, or even 20 days except in pretty specific cases.
Perhaps this is one of those, and if so then I'm all wet, but the
feeling I get is that this is a way to dismiss this solution because
it's not what's wanted, which is "what Oracle did."

> (2)  They are already prepared to deal with "snapshot too old"
> errors on queries that run more than about 20 minutes and which
> access tables which are modified.  They would rather do that than
> suffer the bloat beyond that point.

That, really, is the crux here- they've already got support for dealing
with it the way Oracle did and they'd like PG to do that too.
Unfortunately, that, by itself, isn't a good reason for a particular
capability (we certainly aren't going to be trying to duplicate PL/SQL
in PG any time soon).  That said, there are capabilities in other
RDBMS's which are valuable and which we *do* want, so the fact that
Oracle does this also isn't a reason to not include it.

> IMO all of these changes people are working are very valuable, and
> complement each other.  This particular approach is likely to be
> especially appealing to those moving from Oracle because it is a
> familiar mechanism, and one which some of them have written their
> software to be aware of and deal with gracefully.

For my 2c, I'd much rather provide them with a system where they don't
have to deal with broken snapshots than give them a way to have them the
way Oracle provided them. :)  That said, even the approach Andres
outlined will cause bloat and it may be beyond what's acceptable in some
environments, and it's certainly more complicated and unlikely to get
done in the short term.

> > I'm not against having a knob like this, which is defaulted to off,
>
> Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
> sound like a -1.  :-)

So, at the time I wrote that, I wasn't sure if it was a +1 or not
myself.  I've been thinking about it since then, however, and I'm
leaning more towards having the capability than not, so perhaps that's a
+1, but it doesn't excuse the need to come up with an implementation
that everyone can be happy with and what you've come up with so far
doesn't have a lot of appeal, based on the feedback (I've only glanced
through it myself, but I agree with Andres and Tom that it's a larger
footprint than we'd want for this and the number of places having to be
touched is concerning as it could lead to future bugs).

A lot of that would go away if there was a way to avoid having to mess
with the index AMs, I'd think, but I wonder if we'd actually need more
done there- it's not immediately obvious to me how an index-only scan is
safe with this.  Whenever an actual page is visited, we can check the
LSN, and the relation can't be truncated by vacuum since the transaction
will still have a lock on the table which prevents it, but does the
visibility-map update check make sure to never mark pages all-visible
when one of these old transactions is running around?  On what basis?

> > but I do think we'd be a lot better off with a system that could
> > realize when rows are not visible to any currently running transaction
> > and clean them up.
>
> +1
>
> But they are not mutually exclusive; I see them as complementary.

I can see how they would be, provided we can be confident that we're
going to actually throw an error when the snapshot is out of date and
not end up returning incorrect results.  We need to be darn sure of
that, both now and in a few years from now when many of us may have
forgotten about this knob.. ;)

> > If this knob's default is off then I don't think
> > we'd be likely to get the complaints which are being discussed (or, if
> > we did, we could point the individual at the admin who set the knob...).
>
> That's how I see it, too.  I would not suggest making the default
> anything other than "off", but there are situations where it would
> be a nice tool to have in the toolbox.

Agreed.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Allow "snapshot too old" error, to prevent bloat
Next
From: Michael Paquier
Date:
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]