On Sun, 2012-04-22 at 00:08 +0100, Greg Stark wrote:
> On Sat, Apr 21, 2012 at 10:40 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > * In addition to detecting random garbage, we also need to be able to
> > detect zeroing of pages. Right now, a zero page is not considered
> > corrupt, so that's a problem. We'll need to WAL table extension
> > operations, and we'll need to mitigate the performance impact of doing
> > so. I think we can do that by extending larger tables by many pages
> > (say, 16 at a time) so we can amortize the cost of WAL and avoid
> > contention.
>
> I haven't seen this come up in discussion.
I don't have any links, and it might just be based on in-person
discussions. I think it's just being left as a loose end for later, but
it will eventually need to be solved.
> WAL logging table
> extensions wouldn't by itself work because currently we treat the file
> size on disk as the size of the table. So you would have to do the
> extension in the critical section or else different backends might see
> the wrong file size and write out conflicting wal entries.
By "critical section", I assume you mean "while holding the relation
extension lock" not "while inside a CRITICAL_SECTION()", right?
There would be some synchronization overhead, to be sure, but I think it
can be done. Ideally, we'd be able to do large enough extensions that,
if there is a parallel bulk load on a single table or something, the
overhead could be made insignificant.
I didn't intend to get too much into the detail in this thread, but if
it's a totally ridiculous or impossible idea, I'll remove it.
> The earlier consensus was to move all the hint bits to a dedicated
> area and exclude them from the checksum. I think double-write buffers
> seem to have become more fashionable but a summary that doesn't
> describe the former is definitely incomplete.
Thank you, that's the kind of omission I was looking to catch.
> That link points to the MVCC-safe truncate patch. I don't follow how
> optimizations in bulk loads are relevant to wal logging hint bit
> updates.
I should have linked to these messages:
http://archives.postgresql.org/message-id/CA
+TgmoYLOzDezzJKyJ8_x2bPeEerAo5dJ-OMvS1fLQOQSQP5jg@mail.gmail.com
http://archives.postgresql.org/message-id/CA
+Tgmoa4Xs1jbZhm=pb9Xi4AGMJXRB2a4GSE9EJtLo=70Zne=g@mail.gmail.com
Though perhaps I'm reading too much into Robert's comments.
Regards,Jeff Davis