Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Allow "snapshot too old" error, to prevent bloat
Date
Msg-id 54E0EC79.3060009@BlueTreble.com
Whole thread Raw
In response to Re: Allow "snapshot too old" error, to prevent bloat  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allow "snapshot too old" error, to prevent bloat  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2/15/15 10:36 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Kevin Grittner <kgrittn@ymail.com> writes:
>>>> What this patch does is add a GUC call old_snapshot_threshold.  It
>>>> defaults to -1, which leaves behavior matching unpatched code.
>>>> Above that it allows tuples to be vacuumed away after the number of
>>>> transaction IDs specified by the GUC have been consumed.
>
>>> TBH, I'm not sure why we'd wish to emulate Oracle's single worst
>>> operational feature.
>
>> I've run into cases where people have suffered horribly bloated
>> databases because of one ill-behaved connection.  Some companies
>> don't want to be vulnerable to that and the disruption that
>> recovery from that bloat causes.
>
> No doubt, preventing bloat is a good thing, but that doesn't mean this
> is the best API we could create for the issue.  The proposition this
> patch offers to DBAs is: "You can turn this knob to reduce bloat by some
> hard-to-quantify factor.  The cost is that some long-running transactions
> might fail.  You won't know which ones are at risk, the failures won't be
> the same from time to time, and you won't be able to do anything to spare
> high-value transactions from that fate except by turning that knob back
> again globally (which requires a database restart)."  Maybe refugees from
> Oracle will think that sounds good, but nobody else will.
>
> I wonder if we couldn't achieve largely the same positive effects through
> adding a simple transaction-level timeout option.  That would be far
> easier for users to understand and manage, it would be trivial to allow
> specific high-value transactions to run with a laxer setting, it does not
> create any implementation-detail-dependent behavior that we'd be having to
> explain to users forevermore, and (not incidentally) it would be a lot
> simpler and more trustworthy to implement.  There's no well-defined
> correlation between your setting and the net effect on database bloat,
> but that's true with the "snapshot too old" approach as well.

A common use-case is long-running reports hitting relatively stable data 
in a database that also has tables with a high churn rate (ie: queue 
tables). In those scenarios your only options right now are to suffer 
huge amounts of bloat in the high-churn or not do your reporting. A 
simple transaction timeout only "solves" this by denying you reporting 
queries.

An idea that I've had on this would be some way to "lock down" the 
tables that a long-running transaction could access. That would allow 
vacuum to ignore any snapshots that transaction had for tables it wasn't 
accessing. That's something that would be deterministic.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: EXPERIMENTAL: mmap-based memory context / allocator
Next
From: Peter Geoghegan
Date:
Subject: Re: New CF app deployment