Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Feb 1, 2013 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In any case, I no longer have much faith in the idea that letting
>> GetOldestXmin go backwards is really safe.
> That is admittedly kind of weird behavior, but I think one could
> equally blame this on CLUSTER. This is hardly the first time we've
> had to patch CLUSTER's handling of TOAST tables (cf commits
> 21b446dd0927f8f2a187d9461a0d3f11db836f77,
> 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05,
> 83b7584944b3a9df064cccac06822093f1a83793) and it doesn't seem unlikely
> that we might go the full ten rounds.
Yeah, but I'm not sure whether CLUSTER is the appropriate blamee or
whether it's more like the canary in the coal mine, first to expose
problematic behaviors elsewhere. The general problem here is really
that we're cleaning out toast tuples while the referencing main-heap
tuple still physically exists. How safe do you think that is? That
did not ever happen before we decoupled autovacuuming of main and toast
tables, either --- so a good case could be made that that idea is
fundamentally broken.
> Having said that, I agree that a fix in GetOldestXmin() would be nice
> if we could find one, but since the comment describes at least three
> different ways the value can move backwards, I'm not sure that there's
> really a practical solution there, especially if you want something we
> can back-patch.
Well, if we were tracking the latest value in shared memory, we could
certainly clamp to that to ensure it didn't go backwards. The problem
is where to find storage for a per-DB value.
I thought about storing each session's latest value in its PGPROC and
taking the max over same-DB sessions during GetOldestXmin's ProcArray
scan, but that doesn't work because an autovacuum process might
disappear and thus destroy the needed info just before CLUSTER looks
for it.
regards, tom lane