Thread: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
From
Matthias van de Meent
Date:
Hi, My collegue Konstantin Knizhnik pointed out that we fail to mark pages with a non-standard page layout with page_std=false in RelationCopyStorageUsingBuffer when we WAL-log them. This causes us to interpret the registered buffer as a standard buffer, and omit the hole in the page, which for FSM/VM pages covers the whole page. The immediate effect of this bug is that replicas and primaries in a physical replication system won't have the same data in their VM- and FSM-forks until the first VACUUM on the new database has WAL-logged these pages again. Whilst not actively harmful for the VM/FSM subsystems, it's definitely suboptimal. Secondary unwanted effects are that AMs that use the buffercache- but which don't use or update the pageheader- also won't see the main data logged in WAL, thus potentially losing user data in the physical replication stream or with a system crash. I've not looked for any such AMs and am unaware of any that would have this issue, but it's better to fix this. PFA a patch that fixes this issue, by assuming that all pages in the source database utilize a non-standard page layout. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Attachment
Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > PFA a patch that fixes this issue, by assuming that all pages in the > source database utilize a non-standard page layout. Surely that cure is worse than the disease? regards, tom lane
Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts
From
Matthias van de Meent
Date:
On Mon, 13 May 2024 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > > PFA a patch that fixes this issue, by assuming that all pages in the > > source database utilize a non-standard page layout. > > Surely that cure is worse than the disease? I don't know where we would get the information whether the selected relation fork's pages are standard-compliant. We could base it off of the fork number (that info is available locally) but that doesn't guarantee much. For VM and FSM-pages we know they're essentially never standard-compliant (hence this thread), but for the main fork it is anyone's guess once the user has installed an additional AM - which we don't detect nor pass through to the offending RelationCopyStorageUsingBuffer. As for "worse", the default template database is still much smaller than the working set of most databases. This will indeed regress the workload a bit, but only by the fraction of holes in the page + all FSM/VM data. I think the additional WAL volume during CREATE DATABASE is worth it when the alternative is losing that data with physical replication/secondary instances. Note that this does not disable page compression, it just stops the logging of holes in pages; holes which generally are only a fraction of the whole database. It's not inconceivable that this will significantly increase WAL volume, but I think we should go for correctness rather than fastest copy. If we went with fastest copy, we'd better just skip logging the FSM and VM forks because we're already ignoring the data of the pages, so why not ignore the pages themselves, too? I don't think that holds water when we want to be crash-proof in CREATE DATABASE, with a full data copy of the template database. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On Mon, May 13, 2024 at 10:53 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > It's not inconceivable that this will significantly increase WAL > volume, but I think we should go for correctness rather than fastest > copy. I don't think we can afford to just do this blindly for the sake of a hypothetical non-core AM that uses nonstandard pages. There must be lots of cases where the holes are large, and where the WAL volume would be a multiple of what it is currently. That's a *big* regression. > If we went with fastest copy, we'd better just skip logging the > FSM and VM forks because we're already ignoring the data of the pages, > so why not ignore the pages themselves, too? I don't think that holds > water when we want to be crash-proof in CREATE DATABASE, with a full > data copy of the template database. This seems like a red herring. Either assuming standard pages is a good idea or it isn't, and either logging the FSM and VM forks is a good idea or it isn't, but those are two separate questions. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, Quick question, are there any more revisions left to be done on this patch from the previous feedback? Or should I continue with reviewing the current patch? Regards, Akshat Jaimini