Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Allow "snapshot too old" error, to prevent bloat |
Date | |
Msg-id | 1183720670.3967956.1424063951728.JavaMail.yahoo@mail.yahoo.com Whole thread Raw |
In response to | Re: Allow "snapshot too old" error, to prevent bloat (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Allow "snapshot too old" error, to prevent bloat
Re: Allow "snapshot too old" error, to prevent bloat |
List | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jim Nasby <Jim.Nasby@bluetreble.com> writes: >> On 2/15/15 10:36 AM, Tom Lane wrote: >>> I wonder if we couldn't achieve largely the same positive effects through >>> adding a simple transaction-level timeout option. We suggested this to our customer and got out of the meeting with it looking like it *might* fly. In the next meeting, however, they said they had run it by others and reviewed the code and it was completely unacceptable -- they would not consider pg with this as the solution. >> A common use-case is long-running reports hitting relatively stable data >> in a database that also has tables with a high churn rate (ie: queue >> tables). In those scenarios your only options right now are to suffer >> huge amounts of bloat in the high-churn or not do your reporting. A >> simple transaction timeout only "solves" this by denying you reporting >> queries. > > Agreed, but Kevin's proposal has exactly the same problem only worse, > because (a) the reporting transactions might or might not fail (per > Murphy, they'll fail consistently only when you're on deadline), and > (b) if they do fail, there's nothing you can do short of increasing the > slack db-wide. These they were comfortable with, and did *not* consider to be unpredictable or something they could not do something about. I really don't feel I can say more than that, though, without disclosing more than I should. >> An idea that I've had on this would be some way to "lock down" the >> tables that a long-running transaction could access. That would allow >> vacuum to ignore any snapshots that transaction had for tables it wasn't >> accessing. That's something that would be deterministic. While this option was not specifically suggested, based on their their reasons that numerous other options we suggested were unacceptable, I feel sure that this would not be acceptable. I think Tom hit the nail on the head when he said "Maybe refugees from Oracle will think that sounds good..." It is precisely those with very large code bases which have been modified over long periods of time to work with Oracle that would find this solution attractive, or perhaps *necessary* to consider a move to Postgres. That's a potential market we don't care to write off. > There might be something in that, but again it's not much like this patch. > The key point I think we're both making is that nondeterministic failures > are bad, especially when you're talking about long-running, expensive-to- > retry transactions. What the customer most doesn't want to be "nondeterministic" is whether the error is generated only when the snapshot has been used to read a page which has been modified since the snapshot was taken. If tables or materialized views are set up before a query and then not subsequently modified during the execution of the query, that query must not be canceled even if it runs for days, but it must not cause unconstrained bloat during the run. So far I don't see any way to avoid including the LSN with the snapshot or modifying the index AMs. Let's be clear on the footprint for that; for the btree implementation it is: src/backend/access/nbtree/nbtinsert.c | 7 ++++---src/backend/access/nbtree/nbtpage.c | 2 +-src/backend/access/nbtree/nbtsearch.c| 43 ++++++++++++++++++++++++++++++++++---------src/include/access/nbtree.h | 7 ++++---4 files changed, 43 insertions(+), 16 deletions(-) What this discussion has made me reconsider is the metric for considering a transaction "too old". The number of transaction IDs consumed seems inferior as the unit of measure for that to LSN or time. It looks to me to be pretty trivial (on the order of maybe 30 lines of code) to specify this GUC in minutes rather than transaction IDs. At first glance this seems like it would be vulnerable to the usual complaints about mismanaged clocks, but those are easily answered by using a cached time in shared memory that we populate in such a way that it can never move backward. Done right, this could even allow the GUC to be changeable on reload rather than only at restart. A badly mismanaged system clock could not cause a query to generate incorrect results; the worst that could happen is that this feature would fail to control bloat as much as expected or reads of modified data could generate the "snapshot too old" error around the time of the clock adjustment. As before, this would default to a magic value to mean that you want the historical PostgreSQL behavior. If that makes the idea more palatable to anyone, I can submit a patch to that effect within the next 24 hours. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: