Josh Berkus <josh@agliodbs.com> writes:
> As I understand it, we don't currently have any mechanism in Postgres
> which would cause allocated-but-empty pages.
That's not correct: the situation can easily arise after a database
crash. (The scenario is that we've done smgrextend to add the first
page to the file, but not yet completed or WAL-logged insertion of any
data into it. This leaves us with an empty, all-zero page that will be
ignored until we next want to add some data to the table.)
The core problem here is that file extension is not a transactional
operation, because it doesn't roll back on crash.
The current matview design gets around this problem by requiring that
transition between scannable and unscannable states involve a complete
table rewrite, and thus the transactionality issue can be hidden behind
a transactional update of the matview's pg_class.relfilenode field.
IMO, that is obviously a dead-end design, because we are going to want
scannability status updates associated with partial updates of the
matview's contents. So Kevin's summary is leaving out one key desirable
property:
(4) ability to change scannability state without a full table rewrite.
Putting the state into pg_class would preserve that property.
regards, tom lane