"Simon Riggs" <simon@2ndquadrant.com> writes:
> Completed all of the agreed changes for TG:
I've just realized that there's a fatal problem with this design.
We've now got tqual.c setting page LSN when it holds only share lock
on the buffer. That will absolutely not work, eg two backends might
concurrently set different values and end up with garbage (since it's
unlikely that LSN store is atomic).
Can we fix it to be a read test instead of a write test, that is, if
we know WAL has been flushed through the target LSN, it's safe to set
the hint bit, else not?
In general, I think a transaction abort should not need to flush
anything, since the default assumption is that it crashed anyway.
Hence for instance recording a transaction abort needn't advance
the LSN of the clog page. (You seem to have it flushing through
the last xlog record written by the backend, which is exactly what
it doesn't need to do.) By extension, it should be OK to set INVALID
(aborted) hint bits in a tuple header without any concerns about
flushing.
Also, I'm sort of wondering if we really need a separate walwriter
process; that code seems awfully duplicative. Is there a reason
not to have the bgwriter include this functionality?
In lesser news:
The caching logic in TransactionGetCommitLSN is obviously broken.
Is there really a use-case for adding a pgstat counter for "guaranteed"
transactions? That adds pgstat overhead, and bloats the patch
noticeably, and I don't entirely see the value of it.
There's some padding junk inserted in XLogCtlData, which as far as I
recall was never discussed, and is certainly not an integral part of the
delayed-commit feature. If you want that you should submit and defend
it separately.
regards, tom lane