Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: snapshot too old issues, first around wraparound and then more. |
Date | |
Msg-id | CA+hUKG+h1waWw0MHMkY_cvA2zxo9jn81TOJz9js6on0F4vcPHA@mail.gmail.com Whole thread Raw |
In response to | Re: snapshot too old issues, first around wraparound and then more. (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: snapshot too old issues, first around wraparound and then more.
|
List | pgsql-hackers |
On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I think that it's worth considering whether or not there are a > > significant number of "snapshot too old" users that rarely or never > > rely on old snapshots used by new queries. Kevin said that this > > happens "in some cases", but how many cases? Might it be that many > > "snapshot too old" users could get by with a version of the feature > > that makes the most conservative possible assumptions, totally giving > > up on the idea of differentiating which blocks are truly safe to > > access with an "old" snapshot? (In other words, one that assumes that > > they're *all* unsafe for an "old" snapshot.) > > > > I'm thinking of a version of "snapshot too old" that amounts to a > > statement timeout that gets applied for xmin horizon type purposes in > > the conventional way, while only showing an error to the client if and > > when they access literally any buffer (though not when the relation is > > a system catalog). Is it possible that something along those lines is > > appreciably better than nothing to users? If it is, and if we can find > > a way to manage the transition, then maybe we could tolerate > > supporting this greatly simplified implementation of "snapshot too > > old". > > Interesting idea. I'm keen to try prototyping it to see how well it > works out it practice. Let me know soon if you already have designs > on that and I'll get out of your way, otherwise I'll give it a try and > share what I come up with. Here's a quick and dirty test patch of that idea (or my understanding of it), just for experiments. It introduces snapshot->expire_time and a new timer SNAPSHOT_TIMEOUT to cause the next CHECK_FOR_INTERRUPTS() to set snapshot->too_old on any active or registered snapshots whose time has come, and then try to advance MyPgXact->xmin, without considering the ones marked too old. That gets rid of the concept of "early pruning". You can use just regular pruning, because the snapshot is no longer holding the regular xmin back. Then TestForOldSnapshot() becomes simply if (snapshot->too_old) ereport(...). There are certainly some rough edges, missed details and bugs in here, not least the fact (pointed out to me by Andres in an off-list chat) that we sometimes use short-lived snapshots without registering them; we'd have to fix that. It also does nothing to ensure that TestForOldSnapshot() is actually called at all the right places, which is still required for correct results. If those problems can be fixed, you'd have a situation where snapshot-too-old is a coarse grained, blunt instrument that effectively aborts your transaction even if the whole cluster is read-only. I am not sure if that's really truly useful to anyone (ie if these ODBC cursor users would be satisfied; I'm not sure I understand that use case). Hmm. I suppose it must be possible to put the LSN check back: if (snapshot->too_old && PageGetLSN(page) > snapshot->lsn) ereport(...). Then the granularity would be the same as today -- block level -- but the complexity is transferred from the pruning side (has to deal with xid time map) to the snapshot-owning side (has to deal with timers, CFI() and make sure all snapshots are registered). Maybe not a great deal, and maybe not easier than fixing the existing bugs. One problem is all the new setitimer() syscalls. I feel like that could be improved, as could statement_timeout, by letting existing timers run rather than repeatedly rescheduling eagerly, so that eg a 1 minute timeout never gets rescheduled more than once per minute. I haven't looked into that, but I guess it's no worse than the existing implement's overheads anyway. PS in the patch the GUC is interpreted as milliseconds, which is more fun for testing but it should really be minutes like before.
Attachment
pgsql-hackers by date: