Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Allow "snapshot too old" error, to prevent bloat
Date
Msg-id 20150216084159.GK15326@awork2.anarazel.de
Whole thread Raw
In response to Re: Allow "snapshot too old" error, to prevent bloat  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Allow "snapshot too old" error, to prevent bloat  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 2015-02-16 05:19:11 +0000, Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> >> On 2/15/15 10:36 AM, Tom Lane wrote:
> >>> I wonder if we couldn't achieve largely the same positive effects through
> >>> adding a simple transaction-level timeout option.
> 
> We suggested this to our customer and got out of the meeting with
> it looking like it *might* fly.  In the next meeting, however, they
> said they had run it by others and reviewed the code and it was
> completely unacceptable -- they would not consider pg with this as
> the solution.
> 
> >> A common use-case is long-running reports hitting relatively stable data
> >> in a database that also has tables with a high churn rate (ie: queue
> >> tables). In those scenarios your only options right now are to suffer
> >> huge amounts of bloat in the high-churn or not do your reporting. A
> >> simple transaction timeout only "solves" this by denying you reporting
> >> queries.
> >
> > Agreed, but Kevin's proposal has exactly the same problem only worse,
> > because (a) the reporting transactions might or might not fail (per
> > Murphy, they'll fail consistently only when you're on deadline), and
> > (b) if they do fail, there's nothing you can do short of increasing the
> > slack db-wide.
> 
> These they were comfortable with, and did *not* consider to be
> unpredictable or something they could not do something about.
> I really don't feel I can say more than that, though, without
> disclosing more than I should.

I agree that we need to do something about the dangers of long
snapshots, I'm not sure this is it. I'm not sure of the contrary either.

What I'm definitely not a fan of though, is this implementation. Having
to add checks to a large number of places is a recipe for silently wrong
query results.


One thing I was wondering about recently was introducing an optimization
for repeatedly updated rows into the vacuum code: A row that has xmin =
xmax where these have committed can be removed, even if the xid is above
the xmin horizon - no other transaction is ever going to see it. While
there's some hairy-ness implementing that, it doesn't seem too hard. And
there's a fair number of cases where that'd allow less bloat to
accumulate.  Obviously it'd be better if we had logic to do that for
other patterns as well (where the updates aren't in the same xact), but
it seems like a good start.



> > There might be something in that, but again it's not much like this patch.
> > The key point I think we're both making is that nondeterministic failures
> > are bad, especially when you're talking about long-running, expensive-to-
> > retry transactions.
> 
> What the customer most doesn't want to be "nondeterministic" is
> whether the error is generated only when the snapshot has been used
> to read a page which has been modified since the snapshot was
> taken.  If tables or materialized views are set up before a query
> and then not subsequently modified during the execution of the
> query, that query must not be canceled even if it runs for days,
> but it must not cause unconstrained bloat during the run.  So far I
> don't see any way to avoid including the LSN with the snapshot or
> modifying the index AMs.  Let's be clear on the footprint for that;
> for the btree implementation it is:

IIUC you never would want cancellations when accessing the the tables
these longrunning backends actually access. The errors about too old
snapshots are just a stopgap because we can't compute the xmin per table
in a meaningful way atm. Right?

Is the bloat caused by rows these transactions actually can see or are
the not-removed rows newer than the transaction's xmax?

Since you actually don't want cancellations for the long running
reporting queries it very much might be sensible to switch to a more
complicated version of HeapTupleSatisfiesVacuum() if there's longrunning
queries. One that can recognize if rows are actually visible to any
backend, or if there are just snapshots that see older rows. I've
previously wondered how hard this would be, but I don't think it's
*that* hard. And it'd be a much more general solution.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Next
From: Heikki Linnakangas
Date:
Subject: Re: Replication identifiers, take 4