Thread: [HACKERS] Setting pd_lower in GIN metapage
What are some arguments against setting pd_lower in the GIN metapage as follows? GinMetaPageData *metad = GinPageGetMeta(page); ((PageHeader) page)->pd_lower = ((char *) metad + sizeof(GinMetaPageData)) - (char *) page; I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN. How about porting such a change to the back-branches if we do this at all?I couldn't find any discussion in the archivesabout this. I read in comments that server versions older than 9.4 didn't set pd_lower correctly in any of GIN index pages, so relying on pd_lower value of GIN pages is unreliable in general. The reason I'm asking is that a certain backup tool relies on pd_lower values of data pages (disk blocks in relation files that are known to have a valid PageHeaderData) to be correct to discard the portion of every page that supposedly does not contain any useful information. The assumption doesn't hold in the case of GIN metapage, so any GIN indexes contain corrupted metapage after recovery (metadata overwritten with zeros). Thanks, Amit
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > What are some arguments against setting pd_lower in the GIN metapage as > follows? > > GinMetaPageData *metad = GinPageGetMeta(page); > > ((PageHeader) page)->pd_lower = > ((char *) metad + sizeof(GinMetaPageData)) - (char *) page; > > I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN. > Actually, hash index also has similar code (See _hash_init_metabuffer) and I see no harm in doing this at similar other places. This helps in compressing the hole in metapage during WAL writing. I think that in itself might not be an argument in favor of doing this because there is only one meta page for index and saving ~7K WAL is not huge but OTOH I don't see a reason for not doing it. > How about porting such a change to the back-branches if we do this at all? > I couldn't find any discussion in the archives about this. I read in > comments that server versions older than 9.4 didn't set pd_lower correctly > in any of GIN index pages, so relying on pd_lower value of GIN pages is > unreliable in general. > > The reason I'm asking is that a certain backup tool relies on pd_lower > values of data pages (disk blocks in relation files that are known to have > a valid PageHeaderData) to be correct to discard the portion of every page > that supposedly does not contain any useful information. The assumption > doesn't hold in the case of GIN metapage, so any GIN indexes contain > corrupted metapage after recovery (metadata overwritten with zeros). > Why can't you do a special check for metapage identification? It should be easy to add such a check. I have seen one of such tools (pg_filedump) has similar check to skip metapage in certain code paths. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> What are some arguments against setting pd_lower in the GIN metapage as >> follows? > Actually, hash index also has similar code (See _hash_init_metabuffer) > and I see no harm in doing this at similar other places. Seems reasonable. >> How about porting such a change to the back-branches if we do this at all? >> The reason I'm asking is that a certain backup tool relies on pd_lower >> values of data pages (disk blocks in relation files that are known to have >> a valid PageHeaderData) to be correct to discard the portion of every page >> that supposedly does not contain any useful information. The assumption >> doesn't hold in the case of GIN metapage, so any GIN indexes contain >> corrupted metapage after recovery (metadata overwritten with zeros). I'm not in favor of back-porting such a change. Even if we did, it would only affect subsequently-created indexes not existing ones. That means your tool has to cope with an unset pd_lower in any case --- and will for the foreseeable future, because of pg_upgrade. I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData then don't trust it, but assume all of the page is valid data". regards, tom lane
On 2017/06/19 22:59, Amit Kapila wrote: > On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> What are some arguments against setting pd_lower in the GIN metapage as >> follows? >> >> GinMetaPageData *metad = GinPageGetMeta(page); >> >> ((PageHeader) page)->pd_lower = >> ((char *) metad + sizeof(GinMetaPageData)) - (char *) page; >> >> I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN. >> > > Actually, hash index also has similar code (See _hash_init_metabuffer) > and I see no harm in doing this at similar other places. This helps > in compressing the hole in metapage during WAL writing. I think that > in itself might not be an argument in favor of doing this because > there is only one meta page for index and saving ~7K WAL is not huge > but OTOH I don't see a reason for not doing it. I agree. Will write a patch. >> How about porting such a change to the back-branches if we do this at all? >> I couldn't find any discussion in the archives about this. I read in >> comments that server versions older than 9.4 didn't set pd_lower correctly >> in any of GIN index pages, so relying on pd_lower value of GIN pages is >> unreliable in general. >> >> The reason I'm asking is that a certain backup tool relies on pd_lower >> values of data pages (disk blocks in relation files that are known to have >> a valid PageHeaderData) to be correct to discard the portion of every page >> that supposedly does not contain any useful information. The assumption >> doesn't hold in the case of GIN metapage, so any GIN indexes contain >> corrupted metapage after recovery (metadata overwritten with zeros). >> > > Why can't you do a special check for metapage identification? It > should be easy to add such a check. I have seen one of such tools > (pg_filedump) has similar check to skip metapage in certain code > paths. I tried to add a metapage check, but getting such code to compile requires it to include headers like gin_private.h (in PG versions < 10), which in turn includes other headers that are forbidden to be included in what's supposed to be FRONTEND code. (thread at [1] seems relevant in this regard.) Another way to fix this might be to copy the GinMetaPageData struct definition and a few other symbols into the tool's code and make necessary checks using the same, instead of including gin_private.h. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoZ%3DF%3DGkxV0YEv-A8tb%2BAEGy_Qa7GSiJ8deBKFATnzfEug%40mail.gmail.com
On 2017/06/19 23:31, Tom Lane wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> What are some arguments against setting pd_lower in the GIN metapage as >>> follows? > >> Actually, hash index also has similar code (See _hash_init_metabuffer) >> and I see no harm in doing this at similar other places. > > Seems reasonable. Here is a patch that does it for the GIN metapage. (I am not sure if the changes to gin_mask() that are included in the patch are really necessary.) >>> How about porting such a change to the back-branches if we do this at all? >>> The reason I'm asking is that a certain backup tool relies on pd_lower >>> values of data pages (disk blocks in relation files that are known to have >>> a valid PageHeaderData) to be correct to discard the portion of every page >>> that supposedly does not contain any useful information. The assumption >>> doesn't hold in the case of GIN metapage, so any GIN indexes contain >>> corrupted metapage after recovery (metadata overwritten with zeros). > > I'm not in favor of back-porting such a change. Even if we did, it would > only affect subsequently-created indexes not existing ones. That means > your tool has to cope with an unset pd_lower in any case --- and will for > the foreseeable future, because of pg_upgrade. > > I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData > then don't trust it, but assume all of the page is valid data". Actually, such a check is already in place in the tool, whose condition looks like: if (PageGetPageSize(header) == BLCKSZ && PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && header->pd_lower >= SizeOfPageHeaderData && header->pd_lower <= header->pd_upper && header->pd_upper <= header->pd_special && header->pd_special <= BLCKSZ && header->pd_special == MAXALIGN(header->pd_special) && ... which even GIN metapage passes, making it an eligible data page and hence for omitting the hole between pd_lower and pd_upper. That's because a GIN metapage will always have undergone PageInit() that sets pd_lower to SizeOfPageHeaderData. Which means the tool has to look beyond the standard PageHeaderData to determine whether the area between pd_lower and pd_upper is really a hole. Amit K also suggested the same, but that seems to require either duplicating GIN's private struct definition (of GinMetaPageData) in the tool or including backend's gin_private.h, either of which doesn't seem to be a good thing to do in what is FRONTEND code, but maybe there is no other way. Am I missing something? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/19 23:31, Tom Lane wrote: >> Amit Kapila <amit.kapila16@gmail.com> writes: >>> On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote >>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> What are some arguments against setting pd_lower in the GIN metapage as >>>> follows? >> >>> Actually, hash index also has similar code (See _hash_init_metabuffer) >>> and I see no harm in doing this at similar other places. >> >> Seems reasonable. > > Here is a patch that does it for the GIN metapage. (I am not sure if the > changes to gin_mask() that are included in the patch are really necessary.) > >>>> How about porting such a change to the back-branches if we do this at all? >>>> The reason I'm asking is that a certain backup tool relies on pd_lower >>>> values of data pages (disk blocks in relation files that are known to have >>>> a valid PageHeaderData) to be correct to discard the portion of every page >>>> that supposedly does not contain any useful information. The assumption >>>> doesn't hold in the case of GIN metapage, so any GIN indexes contain >>>> corrupted metapage after recovery (metadata overwritten with zeros). >> >> I'm not in favor of back-porting such a change. Even if we did, it would >> only affect subsequently-created indexes not existing ones. That means >> your tool has to cope with an unset pd_lower in any case --- and will for >> the foreseeable future, because of pg_upgrade. >> >> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData >> then don't trust it, but assume all of the page is valid data". > > Actually, such a check is already in place in the tool, whose condition > looks like: > > if (PageGetPageSize(header) == BLCKSZ && > PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && > (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && > header->pd_lower >= SizeOfPageHeaderData && > header->pd_lower <= header->pd_upper && > header->pd_upper <= header->pd_special && > header->pd_special <= BLCKSZ && > header->pd_special == MAXALIGN(header->pd_special) && ... > > which even GIN metapage passes, making it an eligible data page and hence > for omitting the hole between pd_lower and pd_upper. > Won't checking for GIN_META in header->pd_flags gives you what you want? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2017/06/20 20:37, Amit Kapila wrote: > On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/19 23:31, Tom Lane wrote: >>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData >>> then don't trust it, but assume all of the page is valid data". >> >> Actually, such a check is already in place in the tool, whose condition >> looks like: >> >> if (PageGetPageSize(header) == BLCKSZ && >> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && >> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && >> header->pd_lower >= SizeOfPageHeaderData && >> header->pd_lower <= header->pd_upper && >> header->pd_upper <= header->pd_special && >> header->pd_special <= BLCKSZ && >> header->pd_special == MAXALIGN(header->pd_special) && ... >> >> which even GIN metapage passes, making it an eligible data page and hence >> for omitting the hole between pd_lower and pd_upper. >> > > Won't checking for GIN_META in header->pd_flags gives you what you want? GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags, which still requires including GIN's private header. Thanks, Amit
On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/20 20:37, Amit Kapila wrote: >> On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> On 2017/06/19 23:31, Tom Lane wrote: >>>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData >>>> then don't trust it, but assume all of the page is valid data". >>> >>> Actually, such a check is already in place in the tool, whose condition >>> looks like: >>> >>> if (PageGetPageSize(header) == BLCKSZ && >>> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && >>> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && >>> header->pd_lower >= SizeOfPageHeaderData && >>> header->pd_lower <= header->pd_upper && >>> header->pd_upper <= header->pd_special && >>> header->pd_special <= BLCKSZ && >>> header->pd_special == MAXALIGN(header->pd_special) && ... >>> >>> which even GIN metapage passes, making it an eligible data page and hence >>> for omitting the hole between pd_lower and pd_upper. >>> >> >> Won't checking for GIN_META in header->pd_flags gives you what you want? > > GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags, > which still requires including GIN's private header. Did you check this patch with wal_consistency_checking? I am getting failures so your patch does not have the masking of GIN pages completely right: FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0 CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE: That's easily reproducible with installcheck and a standby replaying the changes. I did not look at the code in details to see what you may be missing here. -- Michael
On 2017/06/22 16:56, Michael Paquier wrote: > On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/20 20:37, Amit Kapila wrote: >>> On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote >>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> On 2017/06/19 23:31, Tom Lane wrote: >>>>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData >>>>> then don't trust it, but assume all of the page is valid data". >>>> >>>> Actually, such a check is already in place in the tool, whose condition >>>> looks like: >>>> >>>> if (PageGetPageSize(header) == BLCKSZ && >>>> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && >>>> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && >>>> header->pd_lower >= SizeOfPageHeaderData && >>>> header->pd_lower <= header->pd_upper && >>>> header->pd_upper <= header->pd_special && >>>> header->pd_special <= BLCKSZ && >>>> header->pd_special == MAXALIGN(header->pd_special) && ... >>>> >>>> which even GIN metapage passes, making it an eligible data page and hence >>>> for omitting the hole between pd_lower and pd_upper. >>>> >>> >>> Won't checking for GIN_META in header->pd_flags gives you what you want? >> >> GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags, >> which still requires including GIN's private header. > > Did you check this patch with wal_consistency_checking? I am getting > failures so your patch does not have the masking of GIN pages > completely right: > FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0 > CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE: > That's easily reproducible with installcheck and a standby replaying > the changes. I did not look at the code in details to see what you may > be missing here. Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking. Actually, the WAL consistency check fails even without patching gin_mask(), so the problem may be with the main patch itself. That is, the patch needs to do something else other than just teaching GinInitMetabuffer() to initialize pd_lower. Will look into that. Thanks, Amit Thanks, Amit
On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/22 16:56, Michael Paquier wrote: >> On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> On 2017/06/20 20:37, Amit Kapila wrote: >>>> On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote >>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>>> On 2017/06/19 23:31, Tom Lane wrote: >>>>>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData >>>>>> then don't trust it, but assume all of the page is valid data". >>>>> >>>>> Actually, such a check is already in place in the tool, whose condition >>>>> looks like: >>>>> >>>>> if (PageGetPageSize(header) == BLCKSZ && >>>>> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION && >>>>> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && >>>>> header->pd_lower >= SizeOfPageHeaderData && >>>>> header->pd_lower <= header->pd_upper && >>>>> header->pd_upper <= header->pd_special && >>>>> header->pd_special <= BLCKSZ && >>>>> header->pd_special == MAXALIGN(header->pd_special) && ... >>>>> >>>>> which even GIN metapage passes, making it an eligible data page and hence >>>>> for omitting the hole between pd_lower and pd_upper. >>>>> >>>> >>>> Won't checking for GIN_META in header->pd_flags gives you what you want? >>> >>> GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags, >>> which still requires including GIN's private header. >> >> Did you check this patch with wal_consistency_checking? I am getting >> failures so your patch does not have the masking of GIN pages >> completely right: >> FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0 >> CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE: >> That's easily reproducible with installcheck and a standby replaying >> the changes. I did not look at the code in details to see what you may >> be missing here. > > Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking. > > Actually, the WAL consistency check fails even without patching > gin_mask(), so the problem may be with the main patch itself. That is, > the patch needs to do something else other than just teaching > GinInitMetabuffer() to initialize pd_lower. Will look into that. > I've not read the code deeply but I guess we should use GinInitMetabuffer() in ginRedoUpdateMetapage() instead of GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is the same. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2017/06/23 10:22, Masahiko Sawada wrote: > On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/22 16:56, Michael Paquier wrote: >>> Did you check this patch with wal_consistency_checking? I am getting >>> failures so your patch does not have the masking of GIN pages >>> completely right: >>> FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0 >>> CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE: >>> That's easily reproducible with installcheck and a standby replaying >>> the changes. I did not look at the code in details to see what you may >>> be missing here. >> >> Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking. >> >> Actually, the WAL consistency check fails even without patching >> gin_mask(), so the problem may be with the main patch itself. That is, >> the patch needs to do something else other than just teaching >> GinInitMetabuffer() to initialize pd_lower. Will look into that. >> > > I've not read the code deeply but I guess we should use > GinInitMetabuffer() in ginRedoUpdateMetapage() instead of > GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is > the same. That was it, thanks for the pointer. Attached updated patch, which I confirmed, passes wal_consistency_check = gin. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/23 10:22, Masahiko Sawada wrote: >> On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> On 2017/06/22 16:56, Michael Paquier wrote: >>>> Did you check this patch with wal_consistency_checking? I am getting >>>> failures so your patch does not have the masking of GIN pages >>>> completely right: >>>> FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0 >>>> CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE: >>>> That's easily reproducible with installcheck and a standby replaying >>>> the changes. I did not look at the code in details to see what you may >>>> be missing here. >>> >>> Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking. >>> >>> Actually, the WAL consistency check fails even without patching >>> gin_mask(), so the problem may be with the main patch itself. That is, >>> the patch needs to do something else other than just teaching >>> GinInitMetabuffer() to initialize pd_lower. Will look into that. >>> >> >> I've not read the code deeply but I guess we should use >> GinInitMetabuffer() in ginRedoUpdateMetapage() instead of >> GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is >> the same. > > That was it, thanks for the pointer. > > Attached updated patch, which I confirmed, passes wal_consistency_check = gin. Thank you for updating the patch. It looks good to me. BTW I'm inclined to have a regression test case where doing 'make check' to the streaming replication environment with wal_consistency_check on standby server so that we can detect a bug around the wal. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for updating the patch. It looks good to me. > BTW I'm inclined to have a regression test case where doing 'make > check' to the streaming replication environment with > wal_consistency_check on standby server so that we can detect a bug > around the wal. This would be very costly. A single run of the main regression tests generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm members would suffer on that. *cough* -- Michael
On 2017/06/23 15:07, Michael Paquier wrote: > On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Thank you for updating the patch. It looks good to me. >> BTW I'm inclined to have a regression test case where doing 'make >> check' to the streaming replication environment with >> wal_consistency_check on standby server so that we can detect a bug >> around the wal. > > This would be very costly. A single run of the main regression tests > generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm > members would suffer on that. *cough* Initially, I had naively set wal_consistency_check = all before running make installcheck and then had to wait for a long time to confirm that WAL generated by the gin test indeed caused consistency check failure on the standby with the v1 patch. But I can see Sawada-san's point that there should be some way for developers writing code that better had gone through WAL consistency checking facility to do it without much hassle. But then again, it may not be that frequent to need that. Thanks, Amit
On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Initially, I had naively set wal_consistency_check = all before running > make installcheck and then had to wait for a long time to confirm that WAL > generated by the gin test indeed caused consistency check failure on the > standby with the v1 patch. wal_consistency_check = gin would have saved you a lot of I/O. > But I can see Sawada-san's point that there should be some way for > developers writing code that better had gone through WAL consistency > checking facility to do it without much hassle. But then again, it may > not be that frequent to need that. Yeah, I have my own set of generic scripts for that. There could be a set of modules out of the main check-world, the risk that those finish rotting is high though... -- Michael
On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Initially, I had naively set wal_consistency_check = all before running >> make installcheck and then had to wait for a long time to confirm that WAL >> generated by the gin test indeed caused consistency check failure on the >> standby with the v1 patch. > > wal_consistency_check = gin would have saved you a lot of I/O. > >> But I can see Sawada-san's point that there should be some way for >> developers writing code that better had gone through WAL consistency >> checking facility to do it without much hassle. But then again, it may >> not be that frequent to need that. Yeah, it should be optional. I imagined providing such an option of pg_regress or TAP test for the developers. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote > > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Initially, I had naively set wal_consistency_check = all before running > >> make installcheck and then had to wait for a long time to confirm that WAL > >> generated by the gin test indeed caused consistency check failure on the > >> standby with the v1 patch. > > > > wal_consistency_check = gin would have saved you a lot of I/O. > > > >> But I can see Sawada-san's point that there should be some way for > >> developers writing code that better had gone through WAL consistency > >> checking facility to do it without much hassle. But then again, it may > >> not be that frequent to need that. > > Yeah, it should be optional. I imagined providing such an option of > pg_regress or TAP test for the developers. As far as I know it is possible to have third-party modules that extend the buildfarm client script so that it runs additional tests that the standard ones. You could have a custom module installed in some powerful machine of yours that runs the WAL consistency check and report the results to the buildfarm. A single animal running that test should be enough, right? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > That was it, thanks for the pointer. GinInitMetabuffer() sets up pd_lower and pd_upper anyway using PageInit so the check of PageIsVerified is guaranteed to work in any case. Upgraded pages will still have their pd_lower set to the previous values, and new pages will have the optimization. So this patch is actually harmless for past pages, while newer ones are seen as more compressible. > Attached updated patch, which I confirmed, passes wal_consistency_check = gin. I have spent some time looking at this patch, playing with pg_upgrade to check the state of the page upgraded. And this looks good to me. One thing that I noticed is that this optimization could as well happen for spgist meta pages. What do others think? -- Michael
On Sat, Jun 24, 2017 at 1:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Masahiko Sawada wrote: >> On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >> > On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote >> > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Initially, I had naively set wal_consistency_check = all before running >> >> make installcheck and then had to wait for a long time to confirm that WAL >> >> generated by the gin test indeed caused consistency check failure on the >> >> standby with the v1 patch. >> > >> > wal_consistency_check = gin would have saved you a lot of I/O. >> > >> >> But I can see Sawada-san's point that there should be some way for >> >> developers writing code that better had gone through WAL consistency >> >> checking facility to do it without much hassle. But then again, it may >> >> not be that frequent to need that. >> >> Yeah, it should be optional. I imagined providing such an option of >> pg_regress or TAP test for the developers. > > As far as I know it is possible to have third-party modules that extend > the buildfarm client script so that it runs additional tests that the > standard ones. You could have a custom module installed in some > powerful machine of yours that runs the WAL consistency check and report > the results to the buildfarm. A single animal running that test should > be enough, right? > Yes, thank you for the information. It's a good idea. I'll try it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jun 26, 2017 at 10:54 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> That was it, thanks for the pointer. > > GinInitMetabuffer() sets up pd_lower and pd_upper anyway using > PageInit so the check of PageIsVerified is guaranteed to work in any > case. Upgraded pages will still have their pd_lower set to the > previous values, and new pages will have the optimization. So this > patch is actually harmless for past pages, while newer ones are seen > as more compressible. > >> Attached updated patch, which I confirmed, passes wal_consistency_check = gin. > > I have spent some time looking at this patch, playing with pg_upgrade > to check the state of the page upgraded. And this looks good to me. > One thing that I noticed is that this optimization could as well > happen for spgist meta pages. What do others think? Good point. I think it could happen for brin meta page as well. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2017/06/26 10:54, Michael Paquier wrote: > On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> That was it, thanks for the pointer. > > GinInitMetabuffer() sets up pd_lower and pd_upper anyway using > PageInit so the check of PageIsVerified is guaranteed to work in any > case. Upgraded pages will still have their pd_lower set to the > previous values, and new pages will have the optimization. So this > patch is actually harmless for past pages, while newer ones are seen > as more compressible. Right. >> Attached updated patch, which I confirmed, passes wal_consistency_check = gin. > > I have spent some time looking at this patch, playing with pg_upgrade > to check the state of the page upgraded. And this looks good to me. Thanks for the review. > One thing that I noticed is that this optimization could as well > happen for spgist meta pages. What do others think? I agree. As Sawada-san mentioned, brin metapage code can use a similar patch. So attached are three patches for gin, brin, and sp-gist respectively. Both brin and sp-gist cases didn't require any special consideration for passing wal_consistency_checking, as the patch didn't cause brin and sp-gist metapages to become invalid when recreated on standby (remember that patch 0001 needed to be updated for that). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Jun 26, 2017 at 3:54 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/26 10:54, Michael Paquier wrote: >> On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> That was it, thanks for the pointer. >> >> GinInitMetabuffer() sets up pd_lower and pd_upper anyway using >> PageInit so the check of PageIsVerified is guaranteed to work in any >> case. Upgraded pages will still have their pd_lower set to the >> previous values, and new pages will have the optimization. So this >> patch is actually harmless for past pages, while newer ones are seen >> as more compressible. > > Right. > >>> Attached updated patch, which I confirmed, passes wal_consistency_check = gin. >> >> I have spent some time looking at this patch, playing with pg_upgrade >> to check the state of the page upgraded. And this looks good to me. > > Thanks for the review. > >> One thing that I noticed is that this optimization could as well >> happen for spgist meta pages. What do others think? > > I agree. As Sawada-san mentioned, brin metapage code can use a similar patch. > > So attached are three patches for gin, brin, and sp-gist respectively. > Both brin and sp-gist cases didn't require any special consideration for > passing wal_consistency_checking, as the patch didn't cause brin and > sp-gist metapages to become invalid when recreated on standby (remember > that patch 0001 needed to be updated for that). > Thank you for the patches! I checked additional patches for brin and spgist. They look good to me. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for the patches! I checked additional patches for brin and > spgist. They look good to me. Last versions are still missing something: brin_mask() and spg_mask() can be updated so as mask_unused_space() is called for meta pages. Except that the patches look to be on the right track. By the way, as this is an optimization and not an actual bug, could you add this patch to the next commit fest? I don't think that this should get into 10. The focus is to stabilize things lately. -- Michael
On 2017/06/27 10:22, Michael Paquier wrote: > On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Thank you for the patches! I checked additional patches for brin and >> spgist. They look good to me. > > Last versions are still missing something: brin_mask() and spg_mask() > can be updated so as mask_unused_space() is called for meta pages. > Except that the patches look to be on the right track. Thanks for the review. I updated brin_mask() and spg_mask() in the attached updated patches so that they consider meta pages as containing unused space. > By the way, as this is an optimization and not an actual bug, could > you add this patch to the next commit fest? I don't think that this > should get into 10. The focus is to stabilize things lately. Sure, done. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/27 10:22, Michael Paquier wrote: >> On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Thank you for the patches! I checked additional patches for brin and >>> spgist. They look good to me. >> >> Last versions are still missing something: brin_mask() and spg_mask() >> can be updated so as mask_unused_space() is called for meta pages. >> Except that the patches look to be on the right track. > > Thanks for the review. > > I updated brin_mask() and spg_mask() in the attached updated patches so > that they consider meta pages as containing unused space. Thanks for the new version. I had an extra look at those patches, and I am happy with its shape. I also have been doing more testing with pg_upgrade and wal_consistency_checking, and found no issues. So switched status as ready for committer. Everything could be put into a single commit. -- Michael
On 2017/08/22 9:39, Michael Paquier wrote: > On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/27 10:22, Michael Paquier wrote: >>> On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> Thank you for the patches! I checked additional patches for brin and >>>> spgist. They look good to me. >>> >>> Last versions are still missing something: brin_mask() and spg_mask() >>> can be updated so as mask_unused_space() is called for meta pages. >>> Except that the patches look to be on the right track. >> >> Thanks for the review. >> >> I updated brin_mask() and spg_mask() in the attached updated patches so >> that they consider meta pages as containing unused space. > > Thanks for the new version. I had an extra look at those patches, and > I am happy with its shape. I also have been doing more testing with > pg_upgrade and wal_consistency_checking, and found no issues. So > switched status as ready for committer. Everything could be put into a > single commit. Thanks for the review. Agreed about committing these together. Regards, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2017/08/22 9:39, Michael Paquier wrote: >> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> I updated brin_mask() and spg_mask() in the attached updated patches so >>> that they consider meta pages as containing unused space. I looked briefly at these patches. I'm not sure that it's safe for the mask functions to assume that meta pages always have valid pd_lower. What happens when replaying log data concerning an old index that doesn't have that field filled? regards, tom lane
On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2017/08/22 9:39, Michael Paquier wrote: >>> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote >>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> I updated brin_mask() and spg_mask() in the attached updated patches so >>>> that they consider meta pages as containing unused space. > > I looked briefly at these patches. I'm not sure that it's safe for the > mask functions to assume that meta pages always have valid pd_lower. > What happens when replaying log data concerning an old index that doesn't > have that field filled? There will be inconsistency between the pages, and the masking check will complain. My point here is that wal_consistency_checking is primarily used by developers on newly-deployed clusters to check WAL consistency by using installcheck. So an upgraded cluster would see diffs because of that, but I would think that nobody would really face them. Perhaps we should document this point for wal_consistency_check? Getting the patch discussed on this thread into v10 would have saved the day here, but this boat has sailed already. -- Michael
On 2017/09/07 8:51, Michael Paquier wrote: > On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >>> On 2017/08/22 9:39, Michael Paquier wrote: >>>> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote >>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>>> I updated brin_mask() and spg_mask() in the attached updated patches so >>>>> that they consider meta pages as containing unused space. >> >> I looked briefly at these patches. I'm not sure that it's safe for the >> mask functions to assume that meta pages always have valid pd_lower. >> What happens when replaying log data concerning an old index that doesn't >> have that field filled? > > There will be inconsistency between the pages, and the masking check > will complain. My point here is that wal_consistency_checking is > primarily used by developers on newly-deployed clusters to check WAL > consistency by using installcheck. So an upgraded cluster would see > diffs because of that, but I would think that nobody would really face > them. I too tend to think that any users who use this masking facility would know to expect to get these failures on upgraded clusters with invalid pd_lower in meta pages. (PS: I wonder if it is reasonable to allow configuring the error level used when a masking failure occurs? Currently, checkXLogConsistency() will abort the process (FATAL)) > Perhaps we should document this point for wal_consistency_check? Do you mean permanently under wal_consistency_check parameter documentation or in the release notes under incompatibilities for the affected index types? Thanks, Amit
On Thu, Sep 7, 2017 at 10:42 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I too tend to think that any users who use this masking facility would > know to expect to get these failures on upgraded clusters with invalid > pd_lower in meta pages. Yes, I don't think that an optimization reducing WAL that impacts all users should be stopped by a small set of users who use an option for development purposes. > (PS: I wonder if it is reasonable to allow configuring the error level > used when a masking failure occurs? Currently, checkXLogConsistency() > will abort the process (FATAL)) It definitely is worth it in my opinion, perhaps with an on/off switch to trigger a warning instead. The reason why we use FATAL now is to trigger more easily red flags for any potential buildfarm runs: a startup process facing FATAL takes down the standby. >> Perhaps we should document this point for wal_consistency_check? > > Do you mean permanently under wal_consistency_check parameter > documentation or in the release notes under incompatibilities for the > affected index types? Under the parameter itself, in the spirit of a don't-do-that from an upgraded instance. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I looked briefly at these patches. I'm not sure that it's safe for the >> mask functions to assume that meta pages always have valid pd_lower. >> What happens when replaying log data concerning an old index that doesn't >> have that field filled? > There will be inconsistency between the pages, and the masking check > will complain. That doesn't seem like a pleasant outcome to me. The WAL consistency check code is supposed to complain if there's some kind of replication or replay failure, and this cannot be categorized as either. The idea I'd had was to apply the masking only if pd_lower >= SizeOfPageHeaderData, or if you wanted to be stricter, only if pd_lower != 0. regards, tom lane
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The idea I'd had was to apply the masking only if pd_lower >= > SizeOfPageHeaderData, or if you wanted to be stricter, only if > pd_lower != 0. If putting a check, it seems to me that the stricter one makes the most sense. pd_lower should remain at 0 on pre-10 servers. -- Michael
On 2017/09/07 13:09, Michael Paquier wrote: > On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The idea I'd had was to apply the masking only if pd_lower >= >> SizeOfPageHeaderData, or if you wanted to be stricter, only if >> pd_lower != 0. > > If putting a check, it seems to me that the stricter one makes the > most sense. OK, patches updated that way. > pd_lower should remain at 0 on pre-10 servers. Doesn't PageInit(), which is where any page gets initialized, has always set pd_lower to SizeOfPageHeaderData? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/09/07 13:09, Michael Paquier wrote: >> pd_lower should remain at 0 on pre-10 servers. > > Doesn't PageInit(), which is where any page gets initialized, has always > set pd_lower to SizeOfPageHeaderData? Yes, sorry. I had a mind slippery here.. -- Michael
On Wed, Sep 6, 2017 at 9:42 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I too tend to think that any users who use this masking facility would > know to expect to get these failures on upgraded clusters with invalid > pd_lower in meta pages. I strongly disagree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Michael Paquier <michael.paquier@gmail.com> writes: > On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/09/07 13:09, Michael Paquier wrote: >>> pd_lower should remain at 0 on pre-10 servers. >> Doesn't PageInit(), which is where any page gets initialized, has always >> set pd_lower to SizeOfPageHeaderData? > Yes, sorry. I had a mind slippery here.. Indeed, which raises the question of how did any of this code work at all? Up to now, these pages have been initialized with pd_lower = SizeOfPageHeaderData, *not* zero. Which means that the FPI code would have considered the metadata to be part of a valid "hole" and not stored it, if we'd ever told the FPI code that the metadata page was of standard format. I've just looked through the code for these three index types and can't find any place where they do that --- for instance, they don't pass REGBUF_STANDARD to XLogRegisterBuffer when writing their metapages, and they pass page_std = false to log_newpage when using that for metapages. Good thing, because they'd be completely broken otherwise. This means that the premise of this patch is wrong. Adjusting pd_lower, by itself, would accomplish precisely zip for WAL compression, because we'd still not be telling the WAL code to compress out the hole. To get any benefit, I think we'd need to do all of the following: 1. Initialize pd_lower correctly in the metapage init functions, as here. 2. Any place we are about to write the metapage, set its pd_lower to the correct value, in case it's an old index that doesn't have that set correctly. Fortunately this is cheap enough that we might as well just do it unconditionally. 3. Adjust all the xlog calls to tell them the page is of standard format. Now, one advantage we'd get from this is that we could say confidently that any index metapage appearing in a WAL stream generated by v11 or later has got the right pd_lower; therefore we could dispense with checking for wrong pd_lower in the mask functions (unless they're used in some way I don't know about, which is surely possible). BTW, while nbtree correctly initializes pd_lower, it looks to me like it is not exploiting that, because it seems never to pass REGBUF_STANDARD for the metapage anyway. I think this doesn't matter performance-wise for nbtree, because it seems to always pass REGBUF_WILL_INIT instead. But if we do likewise in other index types then they aren't really going to win anyway. GIN at least looks like it might do that; I have not gone through any of the index types in detail. In short, this patch needs a significant rewrite, and more analysis than you've done so far on whether there's actually any benefit to be gained. It might not be worth messing with. I'll set the CF entry back to Waiting on Author. regards, tom lane
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This means that the premise of this patch is wrong. Adjusting pd_lower, > by itself, would accomplish precisely zip for WAL compression, because > we'd still not be telling the WAL code to compress out the hole. > > To get any benefit, I think we'd need to do all of the following: > > 1. Initialize pd_lower correctly in the metapage init functions, as here. > [...] > In short, this patch needs a significant rewrite, and more analysis than > you've done so far on whether there's actually any benefit to be gained. > It might not be worth messing with. > > I'll set the CF entry back to Waiting on Author. I did some measurements of the compressibility of the GIN meta page, looking at its FPWs with and without wal_compression and you are right: there is no direct compressibility effect when setting pd_lower on the meta page. However, it seems to me that there is an argument still pleading on favor of this patch for wal_consistency_checking. On HEAD pd_lower gets set to 24 and pd_upper to 8184 for GIN meta pages. With the patch, it gets at 80. On top of cleaning up the masking functions GIN, BRIN and SpGist by removing some exceptions in their handling, we are able to get a better masked page because it is possible to mask a portion that we *know* is unused. So even if there are no compressibility benefits, I think that it actually helps in tracking down inconsistencies in meta pages by having a better precision lookup. So I would still vote for integrating the patch as-is, with the addition of a comment to mention that the compressibility optimization is not used yet, though this is helpful when masking the page. The same comment ought to be mentioned for btree. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes: > On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In short, this patch needs a significant rewrite, and more analysis than >> you've done so far on whether there's actually any benefit to be gained. >> It might not be worth messing with. > I did some measurements of the compressibility of the GIN meta page, > looking at its FPWs with and without wal_compression and you are > right: there is no direct compressibility effect when setting pd_lower > on the meta page. However, it seems to me that there is an argument > still pleading on favor of this patch for wal_consistency_checking. I think that would be true if we did both my point 1 and 2, so that the wal replay functions could trust pd_lower to be sane in all cases. But really, if you have to touch all the places that write these metapages, you might as well mark them REGBUF_STANDARD while at it. > The same comment ought to be mentioned for btree. Yeah, I was wondering if we ought not clean up btree/hash while at it. At the very least, their existing comments saying that it's inessential to set pd_lower could use some more detail about why or why not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 8, 2017 at 1:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> On 2017/09/07 13:09, Michael Paquier wrote: >>>> pd_lower should remain at 0 on pre-10 servers. > >>> Doesn't PageInit(), which is where any page gets initialized, has always >>> set pd_lower to SizeOfPageHeaderData? > >> Yes, sorry. I had a mind slippery here.. > > Indeed, which raises the question of how did any of this code work at all? > Up to now, these pages have been initialized with pd_lower = > SizeOfPageHeaderData, *not* zero. Which means that the FPI code would > have considered the metadata to be part of a valid "hole" and not stored > it, if we'd ever told the FPI code that the metadata page was of standard > format. I've just looked through the code for these three index types and > can't find any place where they do that --- for instance, they don't pass > REGBUF_STANDARD to XLogRegisterBuffer when writing their metapages, and > they pass page_std = false to log_newpage when using that for metapages. > Good thing, because they'd be completely broken otherwise. > > This means that the premise of this patch is wrong. Adjusting pd_lower, > by itself, would accomplish precisely zip for WAL compression, because > we'd still not be telling the WAL code to compress out the hole. > > To get any benefit, I think we'd need to do all of the following: > > 1. Initialize pd_lower correctly in the metapage init functions, as here. > > 2. Any place we are about to write the metapage, set its pd_lower to the > correct value, in case it's an old index that doesn't have that set > correctly. Fortunately this is cheap enough that we might as well just > do it unconditionally. > > 3. Adjust all the xlog calls to tell them the page is of standard format. > > Now, one advantage we'd get from this is that we could say confidently > that any index metapage appearing in a WAL stream generated by v11 or > later has got the right pd_lower; therefore we could dispense with > checking for wrong pd_lower in the mask functions (unless they're used > in some way I don't know about, which is surely possible). > > BTW, while nbtree correctly initializes pd_lower, it looks to me like it > is not exploiting that, because it seems never to pass REGBUF_STANDARD for > the metapage anyway. > During the creation of the index, it uses log_newpage to log metapage and there it uses REGBUF_STANDARD. So, there seems to be some use of it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In short, this patch needs a significant rewrite, and more analysis than >>> you've done so far on whether there's actually any benefit to be gained. >>> It might not be worth messing with. > >> I did some measurements of the compressibility of the GIN meta page, >> looking at its FPWs with and without wal_compression and you are >> right: there is no direct compressibility effect when setting pd_lower >> on the meta page. However, it seems to me that there is an argument >> still pleading on favor of this patch for wal_consistency_checking. > > I think that would be true if we did both my point 1 and 2, so that > the wal replay functions could trust pd_lower to be sane in all cases. > But really, if you have to touch all the places that write these > metapages, you might as well mark them REGBUF_STANDARD while at it. > >> The same comment ought to be mentioned for btree. > > Yeah, I was wondering if we ought not clean up btree/hash while at it. > At the very least, their existing comments saying that it's inessential > to set pd_lower could use some more detail about why or why not. > +1. I think we can even use REGBUF_STANDARD in the hash for metapage where currently it is not used. I can give a try to write a patch for hash/btree part if you want. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Michael Paquier <michael.paquier@gmail.com> writes: >>> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> In short, this patch needs a significant rewrite, and more analysis than >>>> you've done so far on whether there's actually any benefit to be gained. >>>> It might not be worth messing with. >> >>> I did some measurements of the compressibility of the GIN meta page, >>> looking at its FPWs with and without wal_compression and you are >>> right: there is no direct compressibility effect when setting pd_lower >>> on the meta page. However, it seems to me that there is an argument >>> still pleading on favor of this patch for wal_consistency_checking. >> >> I think that would be true if we did both my point 1 and 2, so that >> the wal replay functions could trust pd_lower to be sane in all cases. >> But really, if you have to touch all the places that write these >> metapages, you might as well mark them REGBUF_STANDARD while at it. >> >>> The same comment ought to be mentioned for btree. >> >> Yeah, I was wondering if we ought not clean up btree/hash while at it. >> At the very least, their existing comments saying that it's inessential >> to set pd_lower could use some more detail about why or why not. >> > > +1. I think we can even use REGBUF_STANDARD in the hash for metapage > where currently it is not used. I can give a try to write a patch for > hash/btree part if you want. Coordinating efforts here would be nice. If you, Amit K, are taking care of a patch for btree and hash, would you, Amit L, write the part for GIN, BRIN and SpGist? This needs a careful lookup as many code paths need a lookup so it may take time. Please note that I don't mind double-checking this part if you don't have enough room to do so. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Michael Paquier <michael.paquier@gmail.com> writes: >>>> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> In short, this patch needs a significant rewrite, and more analysis than >>>>> you've done so far on whether there's actually any benefit to be gained. >>>>> It might not be worth messing with. >>> >>>> I did some measurements of the compressibility of the GIN meta page, >>>> looking at its FPWs with and without wal_compression and you are >>>> right: there is no direct compressibility effect when setting pd_lower >>>> on the meta page. However, it seems to me that there is an argument >>>> still pleading on favor of this patch for wal_consistency_checking. >>> >>> I think that would be true if we did both my point 1 and 2, so that >>> the wal replay functions could trust pd_lower to be sane in all cases. >>> But really, if you have to touch all the places that write these >>> metapages, you might as well mark them REGBUF_STANDARD while at it. >>> >>>> The same comment ought to be mentioned for btree. >>> >>> Yeah, I was wondering if we ought not clean up btree/hash while at it. >>> At the very least, their existing comments saying that it's inessential >>> to set pd_lower could use some more detail about why or why not. >>> >> >> +1. I think we can even use REGBUF_STANDARD in the hash for metapage >> where currently it is not used. I can give a try to write a patch for >> hash/btree part if you want. > > Coordinating efforts here would be nice. If you, Amit K, are taking > care of a patch for btree and hash > I think here we should first agree on what we want to do. Based on Tom's comment, I was thinking of changing comments in btree/hash part and additionally for hash indexes, I can see if we can pass REGBUF_STANDARD for all usages of metapage. I am not sure if we want similar exercise for btree as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 3:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I think here we should first agree on what we want to do. Based on > Tom's comment, I was thinking of changing comments in btree/hash part > and additionally for hash indexes, I can see if we can pass > REGBUF_STANDARD for all usages of metapage. I am not sure if we want > similar exercise for btree as well. Changing only comments for now looks like a good idea to me. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes: > On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Coordinating efforts here would be nice. If you, Amit K, are taking >> care of a patch for btree and hash > I think here we should first agree on what we want to do. Based on > Tom's comment, I was thinking of changing comments in btree/hash part > and additionally for hash indexes, I can see if we can pass > REGBUF_STANDARD for all usages of metapage. I am not sure if we want > similar exercise for btree as well. FWIW, now that we've noticed the discrepancy, I'm for using REGBUF_STANDARD or equivalent for all metapage calls. Even if it saves no space, inconsistency is bad because it's confusing. And Michael is correct to point out that we can exploit this to improve WAL consistency checking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 1:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, now that we've noticed the discrepancy, I'm for using > REGBUF_STANDARD or equivalent for all metapage calls. Even if it > saves no space, inconsistency is bad because it's confusing. OK, I don't mind having a more aggressive approach, but it would definitely be nicer if the REGBUF additions are done in a second patch that can be applied on top of the one setting pd_lower where needed. Both things are linked, still are aimed to solve different problems. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> Coordinating efforts here would be nice. If you, Amit K, are taking >>> care of a patch for btree and hash > >> I think here we should first agree on what we want to do. Based on >> Tom's comment, I was thinking of changing comments in btree/hash part >> and additionally for hash indexes, I can see if we can pass >> REGBUF_STANDARD for all usages of metapage. I am not sure if we want >> similar exercise for btree as well. > > FWIW, now that we've noticed the discrepancy, I'm for using > REGBUF_STANDARD or equivalent for all metapage calls. Even if it > saves no space, inconsistency is bad because it's confusing. > Agreed. However, I feel there is no harm in doing in two patches, one for hash/btree and second for all other indexes (or maybe separate patches for them as well; I haven't yet looked into the work involved for other indexes) unless you prefer to do it all at a one-shot. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 7:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Sep 10, 2017 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Amit Kapila <amit.kapila16@gmail.com> writes: >>> On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> Coordinating efforts here would be nice. If you, Amit K, are taking >>>> care of a patch for btree and hash >> >>> I think here we should first agree on what we want to do. Based on >>> Tom's comment, I was thinking of changing comments in btree/hash part >>> and additionally for hash indexes, I can see if we can pass >>> REGBUF_STANDARD for all usages of metapage. I am not sure if we want >>> similar exercise for btree as well. >> >> FWIW, now that we've noticed the discrepancy, I'm for using >> REGBUF_STANDARD or equivalent for all metapage calls. Even if it >> saves no space, inconsistency is bad because it's confusing. >> > > Agreed. However, I feel there is no harm in doing in two patches, one > for hash/btree and second for all other indexes (or maybe separate > patches for them as well; I haven't yet looked into the work involved > for other indexes) unless you prefer to do it all at a one-shot. > I have prepared separate patches for hash and btree index. I think for another type of indexes, it is better to first fix the pd_lower issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I have prepared separate patches for hash and btree index. I think > for another type of indexes, it is better to first fix the pd_lower > issue. Just wondering (sorry I have not looked at your patch in details)... Have you tested the compressibility effects of this patch on FPWs with and without wal_compression? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/10 15:22, Michael Paquier wrote: > On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Michael Paquier <michael.paquier@gmail.com> writes: >>>> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> In short, this patch needs a significant rewrite, and more analysis than >>>>> you've done so far on whether there's actually any benefit to be gained. >>>>> It might not be worth messing with. >>> >>>> I did some measurements of the compressibility of the GIN meta page, >>>> looking at its FPWs with and without wal_compression and you are >>>> right: there is no direct compressibility effect when setting pd_lower >>>> on the meta page. However, it seems to me that there is an argument >>>> still pleading on favor of this patch for wal_consistency_checking. >>> >>> I think that would be true if we did both my point 1 and 2, so that >>> the wal replay functions could trust pd_lower to be sane in all cases. >>> But really, if you have to touch all the places that write these >>> metapages, you might as well mark them REGBUF_STANDARD while at it. >>> >>>> The same comment ought to be mentioned for btree. >>> >>> Yeah, I was wondering if we ought not clean up btree/hash while at it. >>> At the very least, their existing comments saying that it's inessential >>> to set pd_lower could use some more detail about why or why not. >>> >> >> +1. I think we can even use REGBUF_STANDARD in the hash for metapage >> where currently it is not used. I can give a try to write a patch for >> hash/btree part if you want. > > Coordinating efforts here would be nice. If you, Amit K, are taking > care of a patch for btree and hash, would you, Amit L, write the part > for GIN, BRIN and SpGist? This needs a careful lookup as many code > paths need a lookup so it may take time. Please note that I don't mind > double-checking this part if you don't have enough room to do so. Sorry, I didn't have time today to carefully go through the recent discussion on this thread (starting with Tom's email wherein he said he set the status of the patch to Waiting on Author). I will try tomorrow. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/09/10 15:22, Michael Paquier wrote: >> On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Michael Paquier <michael.paquier@gmail.com> writes: >>>>> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>> In short, this patch needs a significant rewrite, and more analysis than >>>>>> you've done so far on whether there's actually any benefit to be gained. >>>>>> It might not be worth messing with. >>>> >>>>> I did some measurements of the compressibility of the GIN meta page, >>>>> looking at its FPWs with and without wal_compression and you are >>>>> right: there is no direct compressibility effect when setting pd_lower >>>>> on the meta page. However, it seems to me that there is an argument >>>>> still pleading on favor of this patch for wal_consistency_checking. >>>> >>>> I think that would be true if we did both my point 1 and 2, so that >>>> the wal replay functions could trust pd_lower to be sane in all cases. >>>> But really, if you have to touch all the places that write these >>>> metapages, you might as well mark them REGBUF_STANDARD while at it. >>>> >>>>> The same comment ought to be mentioned for btree. >>>> >>>> Yeah, I was wondering if we ought not clean up btree/hash while at it. >>>> At the very least, their existing comments saying that it's inessential >>>> to set pd_lower could use some more detail about why or why not. >>>> >>> >>> +1. I think we can even use REGBUF_STANDARD in the hash for metapage >>> where currently it is not used. I can give a try to write a patch for >>> hash/btree part if you want. >> >> Coordinating efforts here would be nice. If you, Amit K, are taking >> care of a patch for btree and hash, would you, Amit L, write the part >> for GIN, BRIN and SpGist? This needs a careful lookup as many code >> paths need a lookup so it may take time. Please note that I don't mind >> double-checking this part if you don't have enough room to do so. > > Sorry, I didn't have time today to carefully go through the recent > discussion on this thread (starting with Tom's email wherein he said he > set the status of the patch to Waiting on Author). I will try tomorrow. Thanks for the update! Once you get to this point, please let me know if you would like to work on a more complete patch for brin, gin and spgist. If you don't have enough room, I am fine to produce something. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 12:43 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> I have prepared separate patches for hash and btree index. I think >> for another type of indexes, it is better to first fix the pd_lower >> issue. > > Just wondering (sorry I have not looked at your patch in details)... > Have you tested the compressibility effects of this patch on FPWs with > and without wal_compression? > I have debugged it to see that it is executing the code path to eliminate the hole for the hash index. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/11 18:13, Michael Paquier wrote: > On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote: >> On 2017/09/10 15:22, Michael Paquier wrote: >>> Coordinating efforts here would be nice. If you, Amit K, are taking >>> care of a patch for btree and hash, would you, Amit L, write the part >>> for GIN, BRIN and SpGist? This needs a careful lookup as many code >>> paths need a lookup so it may take time. Please note that I don't mind >>> double-checking this part if you don't have enough room to do so. >> >> Sorry, I didn't have time today to carefully go through the recent >> discussion on this thread (starting with Tom's email wherein he said he >> set the status of the patch to Waiting on Author). I will try tomorrow. > > Thanks for the update! Once you get to this point, please let me know > if you would like to work on a more complete patch for brin, gin and > spgist. If you don't have enough room, I am fine to produce something. I updated the patches for GIN, BRIN, and SP-GiST to include the following changes: 1. Pass REGBUF_STNADARD flag when registering the metapage buffer 2. Pass true for page_std argument of log_newpage() or log_newpage_buffer() when using it to log the metapage 3. Update comment near where pd_lower is set in the respective metapages, similar to what Amit K did in his patches for btree and hash metapages. Did I miss something from the discussion? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/09/11 18:13, Michael Paquier wrote: >> On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote: >>> On 2017/09/10 15:22, Michael Paquier wrote: >>>> Coordinating efforts here would be nice. If you, Amit K, are taking >>>> care of a patch for btree and hash, would you, Amit L, write the part >>>> for GIN, BRIN and SpGist? This needs a careful lookup as many code >>>> paths need a lookup so it may take time. Please note that I don't mind >>>> double-checking this part if you don't have enough room to do so. >>> >>> Sorry, I didn't have time today to carefully go through the recent >>> discussion on this thread (starting with Tom's email wherein he said he >>> set the status of the patch to Waiting on Author). I will try tomorrow. >> >> Thanks for the update! Once you get to this point, please let me know >> if you would like to work on a more complete patch for brin, gin and >> spgist. If you don't have enough room, I am fine to produce something. > > I updated the patches for GIN, BRIN, and SP-GiST to include the following > changes: > > 1. Pass REGBUF_STNADARD flag when registering the metapage buffer > I have looked into brin patch and it seems you have not considered all usages of meta page. The structure BrinRevmap also contains a reference to meta page buffer and when that is modified (ex. in revmap_physical_extend), then also I think you need to consider using REGBUF_STNADARD flag. > > Did I miss something from the discussion? > I think one point which might be missed is that the patch needs to modify pd_lower for all usages of metapage, not only when it is first time initialized. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the review. On 2017/09/12 23:27, Amit Kapila wrote: > On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote wrote: >> I updated the patches for GIN, BRIN, and SP-GiST to include the following >> changes: >> >> 1. Pass REGBUF_STNADARD flag when registering the metapage buffer >> > > I have looked into brin patch and it seems you have not considered all > usages of meta page. The structure BrinRevmap also contains a > reference to meta page buffer and when that is modified (ex. in > revmap_physical_extend), then also I think you need to consider using > REGBUF_STNADARD flag. Fixed. >> Did I miss something from the discussion? >> > > I think one point which might be missed is that the patch needs to > modify pd_lower for all usages of metapage, not only when it is first > time initialized. Maybe I'm missing something, but isn't the metadata size fixed and hence pd_lower won't change once it's initialized? Maybe, it's not true for all index types? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2017/09/12 23:27, Amit Kapila wrote: >> I think one point which might be missed is that the patch needs to >> modify pd_lower for all usages of metapage, not only when it is first >> time initialized. > Maybe I'm missing something, but isn't the metadata size fixed and hence > pd_lower won't change once it's initialized? Maybe, it's not true for all > index types? No, the point is that you might be dealing with an index recently pg_upgraded from v10 or before, which does not have the correct value for pd_lower on that page. This has to be coped with. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 13:05, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2017/09/12 23:27, Amit Kapila wrote: >>> I think one point which might be missed is that the patch needs to >>> modify pd_lower for all usages of metapage, not only when it is first >>> time initialized. > >> Maybe I'm missing something, but isn't the metadata size fixed and hence >> pd_lower won't change once it's initialized? Maybe, it's not true for all >> index types? > > No, the point is that you might be dealing with an index recently > pg_upgraded from v10 or before, which does not have the correct > value for pd_lower on that page. This has to be coped with. Ah, got it. Thanks for the explanation. I updated the patches so that the metapage's pd_lower is set to the correct value just before *every* point where we are about to insert a full page image of the metapage into WAL. That's in addition to doing the same in various metapage init routines, which the original patch did already anyway. I guess this now ensures that wal_consistency_checking masking of these metapages as standard layout pages always works, even for pre-v11 indexes that were upgraded. Also, we now pass the metapage buffer as containing a page of standard layout to XLogRegisterBuffer(), so that any hole in it is compressed when actually writing to WAL. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Sep 13, 2017 at 2:48 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I updated the patches so that the metapage's pd_lower is set to the > correct value just before *every* point where we are about to insert a > full page image of the metapage into WAL. That's in addition to doing the > same in various metapage init routines, which the original patch did > already anyway. I guess this now ensures that wal_consistency_checking > masking of these metapages as standard layout pages always works, even for > pre-v11 indexes that were upgraded. Please note that I do have plans to look at all the patches proposed on this thread for all the indexes next. No report for today though as those deal with many code paths so it requires some attention. I think I'll group the review for all index AMs into the same email if you don't mind, each patch deals with its own thing in its own src/backend/access/ path. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 16:20, Michael Paquier wrote: > On Wed, Sep 13, 2017 at 2:48 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> I updated the patches so that the metapage's pd_lower is set to the >> correct value just before *every* point where we are about to insert a >> full page image of the metapage into WAL. That's in addition to doing the >> same in various metapage init routines, which the original patch did >> already anyway. I guess this now ensures that wal_consistency_checking >> masking of these metapages as standard layout pages always works, even for >> pre-v11 indexes that were upgraded. > > Please note that I do have plans to look at all the patches proposed > on this thread for all the indexes next. No report for today though as > those deal with many code paths so it requires some attention. I think > I'll group the review for all index AMs into the same email if you > don't mind, each patch deals with its own thing in its own > src/backend/access/ path. Sure, no problem. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Sure, no problem. OK, I took a closer look at all patches, but did not run any manual tests to test the compression except some stuff with wal_consistency_checking. + if (opaque->flags & GIN_DELETED) + mask_page_content(page); + else if (pagehdr->pd_lower != 0) + mask_unused_space(page); [...] + /* Mask the unused space, provided the page's pd_lower is set. */ + if (pagehdr->pd_lower != 0) mask_unused_space(page); For the masking functions, shouldn't those check use (pd_lower > SizeOfPageHeaderData) instead of 0? pd_lower gets set to a non-zero value on HEAD, so you would apply the masking even if the meta page is upgraded from an instance that did not enforce the value of pd_lower later on. Those conditions also definitely need comments. That will be a good reminder so as why it needs to be kept. + * + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. */ This comment is in the btree code. But you actually add REGBUF_STANDARD. So I think that this could be just removed. * Set pd_lower just past the end of the metadata. This is not essential - * but it makes the page look compressible to xlog.c. + * but it makes the page look compressible to xlog.c. See + * _bt_initmetapage. This reference could be removed as well as _bt_initmetapage does not provide any information, the existing comment is wrong without your patch, and then becomes right with this patch. After that I have spotted a couple of places for btree, hash and SpGist where the updates of pd_lower are not happening. Let's keep in mind that updated meta pages could come from an upgraded version, so we need to be careful here about all code paths updating meta pages, and WAL-logging them. It seems to me that an update of pd_lower is missing in _bt_getroot(), just before marking the buffer as dirty I think. And there is a second in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally one in _bt_newroot(). For SpGist, I think that there are two missing: spgbuild() and spgGetCache(). For hash, hashbulkdelete(), _hash_vacuum_one_page(), _hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are missing the shot, no? We could have a meta page of a hash index upgraded from PG10. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 14, 2017 at 12:30 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Sure, no problem. > > > + * > + * This won't be of any help unless we use option like REGBUF_STANDARD > + * while registering the buffer for metapage as otherwise, it won't try to > + * remove the hole while logging the full page image. > */ > This comment is in the btree code. But you actually add > REGBUF_STANDARD. So I think that this could be just removed. > > * Set pd_lower just past the end of the metadata. This is not essential > - * but it makes the page look compressible to xlog.c. > + * but it makes the page look compressible to xlog.c. See > + * _bt_initmetapage. > This reference could be removed as well as _bt_initmetapage does not > provide any information, the existing comment is wrong without your > patch, and then becomes right with this patch. > I have added this comment just to add some explanation as to why we are setting pd_lower and what makes it useful. We can change it or remove it, but I am not sure what is the right thing to do here, may be we can defer this to the committer. > > It seems to me that an update of pd_lower is missing in _bt_getroot(), > just before marking the buffer as dirty I think. And there is a second > in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally > one in _bt_newroot(). > > > For hash, hashbulkdelete(), _hash_vacuum_one_page(), > _hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are > missing the shot, no? We could have a meta page of a hash index > upgraded from PG10. > Why do we need to change metapage at every place for btree or hash? Any index that is upgraded should have pd_lower set, do you have any case in mind where it won't be set? For hash, if someone upgrades from a version lower than 9.6, it might not have set, but we already give warning to reindex the hash indexes upgraded from a version lower than 10. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 14, 2017 at 6:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Why do we need to change metapage at every place for btree ... I have been hunting for some time places where meta buffers were marked as dirtied and logged. So in the effort, I think that my hands and mind got hotter, forgetting that pd_lower is set there for ages. Of course feel free to ignore that. > ... or hash? > Any index that is upgraded should have pd_lower set, do you have any > case in mind where it won't be set? For hash, if someone upgrades > from a version lower than 9.6, it might not have set, but we already > give warning to reindex the hash indexes upgraded from a version lower > than 10. Ah yes. You do set pd_lower in 10 as well for hash... So that will be fine. So remains SpGist as a slacking AM based on the current patches. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/14 16:00, Michael Paquier wrote: > On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Sure, no problem. > > OK, I took a closer look at all patches, but did not run any manual > tests to test the compression except some stuff with > wal_consistency_checking. Thanks for the review. > + if (opaque->flags & GIN_DELETED) > + mask_page_content(page); > + else if (pagehdr->pd_lower != 0) > + mask_unused_space(page); > [...] > + /* Mask the unused space, provided the page's pd_lower is set. */ > + if (pagehdr->pd_lower != 0) > mask_unused_space(page); > > For the masking functions, shouldn't those check use (pd_lower > > SizeOfPageHeaderData) instead of 0? pd_lower gets set to a non-zero > value on HEAD, so you would apply the masking even if the meta page is > upgraded from an instance that did not enforce the value of pd_lower > later on. Those conditions also definitely need comments. That will be > a good reminder so as why it needs to be kept. Agreed, done. > + * > + * This won't be of any help unless we use option like REGBUF_STANDARD > + * while registering the buffer for metapage as otherwise, it won't try to > + * remove the hole while logging the full page image. > */ > This comment is in the btree code. But you actually add > REGBUF_STANDARD. So I think that this could be just removed. > > * Set pd_lower just past the end of the metadata. This is not essential > - * but it makes the page look compressible to xlog.c. > + * but it makes the page look compressible to xlog.c. See > + * _bt_initmetapage. > This reference could be removed as well as _bt_initmetapage does not > provide any information, the existing comment is wrong without your > patch, and then becomes right with this patch. Amit K's reply may have addressed these comments. > After that I have spotted a couple of places for btree, hash and > SpGist where the updates of pd_lower are not happening. Let's keep in > mind that updated meta pages could come from an upgraded version, so > we need to be careful here about all code paths updating meta pages, > and WAL-logging them. > > It seems to me that an update of pd_lower is missing in _bt_getroot(), > just before marking the buffer as dirty I think. And there is a second > in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally > one in _bt_newroot(). Amit K's reply about btree and hash should've resolved any doubts for those index types. About SP-Gist, see the comment below. > For SpGist, I think that there are two missing: spgbuild() and spgGetCache(). spgbuild() calls SpGistInitMetapage() before marking the metapage buffer dirty. The latter already sets pd_lower correctly, so we don't need to do it explicitly in spgbuild() itself. spgGetCache() doesn't write the metapage, only reads it: /* Last, get the lastUsedPages data from the metapage */ metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); LockBuffer(metabuffer, BUFFER_LOCK_SHARE); metadata = SpGistPageGetMeta(BufferGetPage(metabuffer)); if (metadata->magicNumber != SPGIST_MAGIC_NUMBER) elog(ERROR, "index \"%s\" is not an SP-GiST index", RelationGetRelationName(index)); cache->lastUsedPages = metadata->lastUsedPages; UnlockReleaseBuffer(metabuffer); So, I think it won't be correct to set pd_lower here, no? > For hash, hashbulkdelete(), _hash_vacuum_one_page(), > _hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are > missing the shot, no? We could have a meta page of a hash index > upgraded from PG10. Amit K's reply. :) Updated patch attached, which implements your suggested changes to the masking functions. By the way, as I noted on another unrelated thread, I will not be able to respond to emails from tomorrow until 9/23. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/09/14 16:00, Michael Paquier wrote: >> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> Sure, no problem. >> >> OK, I took a closer look at all patches, but did not run any manual >> tests to test the compression except some stuff with >> wal_consistency_checking. > > Thanks for the review. > >> For SpGist, I think that there are two missing: spgbuild() and spgGetCache(). > > spgbuild() calls SpGistInitMetapage() before marking the metapage buffer > dirty. The latter already sets pd_lower correctly, so we don't need to do > it explicitly in spgbuild() itself. Check. > spgGetCache() doesn't write the metapage, only reads it: > > /* Last, get the lastUsedPages data from the metapage */ > metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); > LockBuffer(metabuffer, BUFFER_LOCK_SHARE); > > metadata = SpGistPageGetMeta(BufferGetPage(metabuffer)); > > if (metadata->magicNumber != SPGIST_MAGIC_NUMBER) > elog(ERROR, "index \"%s\" is not an SP-GiST index", > RelationGetRelationName(index)); > > cache->lastUsedPages = metadata->lastUsedPages; > > UnlockReleaseBuffer(metabuffer); > > So, I think it won't be correct to set pd_lower here, no? Yeah, I am just reading the code again and there is no alarm here. > Updated patch attached, which implements your suggested changes to the > masking functions. > > By the way, as I noted on another unrelated thread, I will not be able to > respond to emails from tomorrow until 9/23. No problem. Enjoy your vacations. I have spent some time looking at the XLOG insert code, and tested the compressibility of the meta pages. And I have noticed that all pages registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not take a FPW of the block registered because the page will be reinitialized at replay, and in such cases the zero'ed page is filled with the data from the record. log_newpage is used to initialize init forks for unlogged relations, which is where this patch allows page compression to take effect correctly. Still setting REGBUF_STANDARD with REGBUF_WILL_INIT is actually a no-op, except if wal_checking_consistency is used when registering a buffer for WAL insertion. There is one code path though where things are useful all the time: revmap_physical_extend for BRIN. The patch is using the correct logic, still such comments are in my opinion incorrect because of the reason written above: + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. Here REGBUF_STANDARD is actually a no-op for btree. + /* + * Set pd_lower just past the end of the metadata. This is not + * essential but it makes the page look compressible to xlog.c, + * because we pass the buffer containing this page to + * XLogRegisterBuffer() as page with standard layout. + */ And here as well because of REGBUF_WILL_INIT is used. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/09/14 16:00, Michael Paquier wrote: >>> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote >>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>> Sure, no problem. >>> >>> OK, I took a closer look at all patches, but did not run any manual >>> tests to test the compression except some stuff with >>> wal_consistency_checking. >> >> Thanks for the review. >> >>> For SpGist, I think that there are two missing: spgbuild() and spgGetCache(). >> >> spgbuild() calls SpGistInitMetapage() before marking the metapage buffer >> dirty. The latter already sets pd_lower correctly, so we don't need to do >> it explicitly in spgbuild() itself. > > Check. > >> spgGetCache() doesn't write the metapage, only reads it: >> >> /* Last, get the lastUsedPages data from the metapage */ >> metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); >> LockBuffer(metabuffer, BUFFER_LOCK_SHARE); >> >> metadata = SpGistPageGetMeta(BufferGetPage(metabuffer)); >> >> if (metadata->magicNumber != SPGIST_MAGIC_NUMBER) >> elog(ERROR, "index \"%s\" is not an SP-GiST index", >> RelationGetRelationName(index)); >> >> cache->lastUsedPages = metadata->lastUsedPages; >> >> UnlockReleaseBuffer(metabuffer); >> >> So, I think it won't be correct to set pd_lower here, no? > > Yeah, I am just reading the code again and there is no alarm here. > >> Updated patch attached, which implements your suggested changes to the >> masking functions. >> >> By the way, as I noted on another unrelated thread, I will not be able to >> respond to emails from tomorrow until 9/23. > > No problem. Enjoy your vacations. > > I have spent some time looking at the XLOG insert code, and tested the > compressibility of the meta pages. And I have noticed that all pages > registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not > take a FPW of the block registered because the page will be > reinitialized at replay, and in such cases the zero'ed page is filled > with the data from the record. log_newpage is used to initialize init > forks for unlogged relations, which is where this patch allows page > compression to take effect correctly. Still setting REGBUF_STANDARD > with REGBUF_WILL_INIT is actually a no-op, except if > wal_checking_consistency is used when registering a buffer for WAL > insertion. There is one code path though where things are useful all > the time: revmap_physical_extend for BRIN. > > The patch is using the correct logic, still such comments are in my > opinion incorrect because of the reason written above: > + * This won't be of any help unless we use option like REGBUF_STANDARD > + * while registering the buffer for metapage as otherwise, it won't try to > + * remove the hole while logging the full page image. > Here REGBUF_STANDARD is actually a no-op for btree. > You have already noticed above that it will help when wal_checking_consistency is used and that was the main motivation to pass REGBUF_STANDARD apart from maintaining consistency. It is not clear to me what is bothering you. If your only worry about these patches is that you want this sentence to be removed from the comment because you think it is obvious or doesn't make much sense, then I think we can leave this decision to committer. I have added it based on Tom's suggestion above thread about explaining why it is inessential or essential to set pd_lower. I think Amit Langote just tried to mimic what I have done in hash and btree patches to maintain consistency. I am also not very sure if we should write some detailed comment or leave the existing comment as it is. I think it is just a matter of different perspective. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> On 2017/09/14 16:00, Michael Paquier wrote: >>>> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote >>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>>>> Sure, no problem. >>>> >>>> OK, I took a closer look at all patches, but did not run any manual >>>> tests to test the compression except some stuff with >>>> wal_consistency_checking. >>> >>> Thanks for the review. >>> >>>> For SpGist, I think that there are two missing: spgbuild() and spgGetCache(). >>> >>> spgbuild() calls SpGistInitMetapage() before marking the metapage buffer >>> dirty. The latter already sets pd_lower correctly, so we don't need to do >>> it explicitly in spgbuild() itself. >> >> Check. >> >>> spgGetCache() doesn't write the metapage, only reads it: >>> >>> /* Last, get the lastUsedPages data from the metapage */ >>> metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); >>> LockBuffer(metabuffer, BUFFER_LOCK_SHARE); >>> >>> metadata = SpGistPageGetMeta(BufferGetPage(metabuffer)); >>> >>> if (metadata->magicNumber != SPGIST_MAGIC_NUMBER) >>> elog(ERROR, "index \"%s\" is not an SP-GiST index", >>> RelationGetRelationName(index)); >>> >>> cache->lastUsedPages = metadata->lastUsedPages; >>> >>> UnlockReleaseBuffer(metabuffer); >>> >>> So, I think it won't be correct to set pd_lower here, no? >> >> Yeah, I am just reading the code again and there is no alarm here. >> >>> Updated patch attached, which implements your suggested changes to the >>> masking functions. >>> >>> By the way, as I noted on another unrelated thread, I will not be able to >>> respond to emails from tomorrow until 9/23. >> >> No problem. Enjoy your vacations. >> >> I have spent some time looking at the XLOG insert code, and tested the >> compressibility of the meta pages. And I have noticed that all pages >> registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not >> take a FPW of the block registered because the page will be >> reinitialized at replay, and in such cases the zero'ed page is filled >> with the data from the record. log_newpage is used to initialize init >> forks for unlogged relations, which is where this patch allows page >> compression to take effect correctly. Still setting REGBUF_STANDARD >> with REGBUF_WILL_INIT is actually a no-op, except if >> wal_checking_consistency is used when registering a buffer for WAL >> insertion. There is one code path though where things are useful all >> the time: revmap_physical_extend for BRIN. >> >> The patch is using the correct logic, still such comments are in my >> opinion incorrect because of the reason written above: >> + * This won't be of any help unless we use option like REGBUF_STANDARD >> + * while registering the buffer for metapage as otherwise, it won't try to >> + * remove the hole while logging the full page image. >> Here REGBUF_STANDARD is actually a no-op for btree. >> > > You have already noticed above that it will help when > wal_checking_consistency is used and that was the main motivation to > pass REGBUF_STANDARD apart from maintaining consistency. It is not > clear to me what is bothering you. If your only worry about these > patches is that you want this sentence to be removed from the comment > because you think it is obvious or doesn't make much sense, then I > think we can leave this decision to committer. I have added it based > on Tom's suggestion above thread about explaining why it is > inessential or essential to set pd_lower. I think Amit Langote just > tried to mimic what I have done in hash and btree patches to maintain > consistency. I am also not very sure if we should write some detailed > comment or leave the existing comment as it is. I think it is just a > matter of different perspective. What is disturbing me a bit is that the existing comments mention something that could be supported (the compression of pages), but that's actually not done and this is unlikely to happen because the number of bytes associated to a meta page is going to be always cheaper than a FPW, which would cost in CPU to store it for compression is enabled. So I think that we should switch comments to mention that pd_lower is set so as this helps with page masking, but we don't take advantage of XLOG compression in the code. I agree that this is a minor point, so if the wave of this thread is that I am too noisy, please feel free to ignore me: the logic of the patches is still correct, still having those comments feels like cheating a bit ;p -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> >> >> You have already noticed above that it will help when >> wal_checking_consistency is used and that was the main motivation to >> pass REGBUF_STANDARD apart from maintaining consistency. It is not >> clear to me what is bothering you. If your only worry about these >> patches is that you want this sentence to be removed from the comment >> because you think it is obvious or doesn't make much sense, then I >> think we can leave this decision to committer. I have added it based >> on Tom's suggestion above thread about explaining why it is >> inessential or essential to set pd_lower. I think Amit Langote just >> tried to mimic what I have done in hash and btree patches to maintain >> consistency. I am also not very sure if we should write some detailed >> comment or leave the existing comment as it is. I think it is just a >> matter of different perspective. > > What is disturbing me a bit is that the existing comments mention > something that could be supported (the compression of pages), but > that's actually not done and this is unlikely to happen because the > number of bytes associated to a meta page is going to be always > cheaper than a FPW, which would cost in CPU to store it for > compression is enabled. So I think that we should switch comments to > mention that pd_lower is set so as this helps with page masking, but > we don't take advantage of XLOG compression in the code. > I think that is not true because we do need FPW for certain usages of metapage. Consider a case in _hash_doinsert where register metabuf with just REGBUF_STANDARD, it can take advantage of removing the hole if pd_lower is set to its correct position. There are other similar usages in hash index. For other indexes like btree, there is no such usage currently, but it can also take advantage for wal_consistency_checking. Now, probably there is an argument that we use different comments for different indexes as the usage varies, but I think someone looking at code after reading the comments can differentiate such cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> >>> >>> You have already noticed above that it will help when >>> wal_checking_consistency is used and that was the main motivation to >>> pass REGBUF_STANDARD apart from maintaining consistency. It is not >>> clear to me what is bothering you. If your only worry about these >>> patches is that you want this sentence to be removed from the comment >>> because you think it is obvious or doesn't make much sense, then I >>> think we can leave this decision to committer. I have added it based >>> on Tom's suggestion above thread about explaining why it is >>> inessential or essential to set pd_lower. I think Amit Langote just >>> tried to mimic what I have done in hash and btree patches to maintain >>> consistency. I am also not very sure if we should write some detailed >>> comment or leave the existing comment as it is. I think it is just a >>> matter of different perspective. >> >> What is disturbing me a bit is that the existing comments mention >> something that could be supported (the compression of pages), but >> that's actually not done and this is unlikely to happen because the >> number of bytes associated to a meta page is going to be always >> cheaper than a FPW, which would cost in CPU to store it for >> compression is enabled. So I think that we should switch comments to >> mention that pd_lower is set so as this helps with page masking, but >> we don't take advantage of XLOG compression in the code. >> > > I think that is not true because we do need FPW for certain usages of > metapage. Consider a case in _hash_doinsert where register metabuf > with just > REGBUF_STANDARD, it can take advantage of removing the hole if > pd_lower is set to its correct position. I am not saying that no index AMs take advantage FPW compressibility for their meta pages. There are cases like this one, as well as one code path in BRIN where this is useful, and it is useful as well when logging FPWs of the init forks for unlogged relations. > There are other similar > usages in hash index. For other indexes like btree, there is no such > usage currently, but it can also take advantage for > wal_consistency_checking. Now, probably there is an argument that we > use different comments for different indexes as the usage varies, but > I think someone looking at code after reading the comments can > differentiate such cases. I'd think about adjusting the comments the proper way for each AM so as one can read those comments and catch any limitation immediately. The fact this has not been pointed out on this thread before any checks and the many mails exchanged on the matter on this thread make me think that the current code does not outline the current code properties appropriately. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > I'd think about adjusting the comments the proper way for each AM so > as one can read those comments and catch any limitation immediately. > The fact this has not been pointed out on this thread before any > checks and the many mails exchanged on the matter on this thread make > me think that the current code does not outline the current code > properties appropriately. Another thing that we could consider as well is adding an assertion in XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD and REGBUF_NO_IMAGE is forbidden. That's bugging me as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier >>>> <michael.paquier@gmail.com> wrote: >>>>> >>>> >>>> You have already noticed above that it will help when >>>> wal_checking_consistency is used and that was the main motivation to >>>> pass REGBUF_STANDARD apart from maintaining consistency. It is not >>>> clear to me what is bothering you. If your only worry about these >>>> patches is that you want this sentence to be removed from the comment >>>> because you think it is obvious or doesn't make much sense, then I >>>> think we can leave this decision to committer. I have added it based >>>> on Tom's suggestion above thread about explaining why it is >>>> inessential or essential to set pd_lower. I think Amit Langote just >>>> tried to mimic what I have done in hash and btree patches to maintain >>>> consistency. I am also not very sure if we should write some detailed >>>> comment or leave the existing comment as it is. I think it is just a >>>> matter of different perspective. >>> >>> What is disturbing me a bit is that the existing comments mention >>> something that could be supported (the compression of pages), but >>> that's actually not done and this is unlikely to happen because the >>> number of bytes associated to a meta page is going to be always >>> cheaper than a FPW, which would cost in CPU to store it for >>> compression is enabled. So I think that we should switch comments to >>> mention that pd_lower is set so as this helps with page masking, but >>> we don't take advantage of XLOG compression in the code. >>> >> >> I think that is not true because we do need FPW for certain usages of >> metapage. Consider a case in _hash_doinsert where register metabuf >> with just >> REGBUF_STANDARD, it can take advantage of removing the hole if >> pd_lower is set to its correct position. > > I am not saying that no index AMs take advantage FPW compressibility > for their meta pages. There are cases like this one, as well as one > code path in BRIN where this is useful, and it is useful as well when > logging FPWs of the init forks for unlogged relations. > Hmm, why is it useful for logging FPWs of the init forks for unlogged relations? We don't use REGBUF_STANDARD in those cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 9:33 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> I'd think about adjusting the comments the proper way for each AM so >> as one can read those comments and catch any limitation immediately. >> The fact this has not been pointed out on this thread before any >> checks and the many mails exchanged on the matter on this thread make >> me think that the current code does not outline the current code >> properties appropriately. > > Another thing that we could consider as well is adding an assertion in > XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD > and REGBUF_NO_IMAGE is forbidden. That's bugging me as well. > Is that related to this patch? If not, then maybe we can discuss it in a separate thread. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes: > On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> I am not saying that no index AMs take advantage FPW compressibility >> for their meta pages. There are cases like this one, as well as one >> code path in BRIN where this is useful, and it is useful as well when >> logging FPWs of the init forks for unlogged relations. > Hmm, why is it useful for logging FPWs of the init forks for unlogged > relations? We don't use REGBUF_STANDARD in those cases. But if we started to do so, that would be a concrete benefit of this patch ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> I am not saying that no index AMs take advantage FPW compressibility >>> for their meta pages. There are cases like this one, as well as one >>> code path in BRIN where this is useful, and it is useful as well when >>> logging FPWs of the init forks for unlogged relations. > >> Hmm, why is it useful for logging FPWs of the init forks for unlogged >> relations? We don't use REGBUF_STANDARD in those cases. > > But if we started to do so, that would be a concrete benefit of this > patch ... In the proposed set of patches, all the empty() routines part of index AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the right thing by updating log_newpage_buffer(). btree also should have its call to log_newpage updated in btbuildempty(), and your patch is missing that. Also, _hash_init() would need some extra work to generate FPWs, but I don't think that it is necessary per its handling of a per-record meta data either. So REGBUF_STANDARD could be just removed from there, and there is actually no need to patch src/backend/access/hash at all. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Amit Kapila <amit.kapila16@gmail.com> writes: >>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> I am not saying that no index AMs take advantage FPW compressibility >>>> for their meta pages. There are cases like this one, as well as one >>>> code path in BRIN where this is useful, and it is useful as well when >>>> logging FPWs of the init forks for unlogged relations. >> >>> Hmm, why is it useful for logging FPWs of the init forks for unlogged >>> relations? We don't use REGBUF_STANDARD in those cases. >> >> But if we started to do so, that would be a concrete benefit of this >> patch ... > > In the proposed set of patches, all the empty() routines part of index > AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the > right thing by updating log_newpage_buffer(). btree also should have > its call to log_newpage updated in btbuildempty(), and your patch is > missing that. > We can add that for btree patch. > Also, _hash_init() would need some extra work to > generate FPWs, but I don't think that it is necessary per its handling > of a per-record meta data either. So REGBUF_STANDARD could be just > removed from there, and there is actually no need to patch > src/backend/access/hash at all. > I think there is no need to remove it. As per discussion above, we want to keep REGBUF_STANDARD for all metapage initializations for the matter of consistency and that will be useful for wal_consistency_checking in which case we anyway need full page image. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 12:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Also, _hash_init() would need some extra work to >> generate FPWs, but I don't think that it is necessary per its handling >> of a per-record meta data either. So REGBUF_STANDARD could be just >> removed from there, and there is actually no need to patch >> src/backend/access/hash at all. >> > > I think there is no need to remove it. As per discussion above, we > want to keep REGBUF_STANDARD for all metapage initializations for the > matter of consistency and that will be useful for > wal_consistency_checking in which case we anyway need full page image. Arf, yes. You are indeed right. I misread that you still need that anyway in XLogRecordAssemble. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Amit Kapila <amit.kapila16@gmail.com> writes: >>>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier >>>> <michael.paquier@gmail.com> wrote: >>>>> I am not saying that no index AMs take advantage FPW compressibility >>>>> for their meta pages. There are cases like this one, as well as one >>>>> code path in BRIN where this is useful, and it is useful as well when >>>>> logging FPWs of the init forks for unlogged relations. >>> >>>> Hmm, why is it useful for logging FPWs of the init forks for unlogged >>>> relations? We don't use REGBUF_STANDARD in those cases. >>> >>> But if we started to do so, that would be a concrete benefit of this >>> patch ... >> >> In the proposed set of patches, all the empty() routines part of index >> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the >> right thing by updating log_newpage_buffer(). btree also should have >> its call to log_newpage updated in btbuildempty(), and your patch is >> missing that. >> > > We can add that for btree patch. > Added and updated the comments for both btree and hash index patches. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Added and updated the comments for both btree and hash index patches. I don't have real complaints about this patch, this looks fine to me. + * Currently, the advantage of setting pd_lower is in limited cases like + * during wal_consistency_checking or while logging for unlogged relation + * as for all other purposes, we initialize the metapage. Note, it also + * helps in page masking by allowing to mask unused space. I would have reworked this comment a bit, say like that: Setting pd_lower is useful for two cases which make use of WAL compressibility even if the meta page is initialized at replay: - Logging of init forks for unlogged relations. - wal_consistency_checking logs extra full-page writes, and this allows masking of the unused space of the page. Now I often get complains that I suck at this exercise ;) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 10:13 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Added and updated the comments for both btree and hash index patches. > > I don't have real complaints about this patch, this looks fine to me. > > + * Currently, the advantage of setting pd_lower is in limited cases like > + * during wal_consistency_checking or while logging for unlogged relation > + * as for all other purposes, we initialize the metapage. Note, it also > + * helps in page masking by allowing to mask unused space. > I would have reworked this comment a bit, say like that: > Setting pd_lower is useful for two cases which make use of WAL > compressibility even if the meta page is initialized at replay: > - Logging of init forks for unlogged relations. > - wal_consistency_checking logs extra full-page writes, and this > allows masking of the unused space of the page. > > Now I often get complains that I suck at this exercise ;) > I understand that and I think there are always multiple ways to write same information. It might be better to pass this patch series to committer if you don't see any mistake because he might anyway change some comments before committing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I understand that and I think there are always multiple ways to write > same information. It might be better to pass this patch series to > committer if you don't see any mistake because he might anyway change > some comments before committing. Yeah, agreed. I don't mind doing that as well honestly. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi. Trying to catch up. On 2017/09/25 13:43, Michael Paquier wrote: > On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Added and updated the comments for both btree and hash index patches. > > I don't have real complaints about this patch, this looks fine to me. > > + * Currently, the advantage of setting pd_lower is in limited cases like > + * during wal_consistency_checking or while logging for unlogged relation > + * as for all other purposes, we initialize the metapage. Note, it also > + * helps in page masking by allowing to mask unused space. > I would have reworked this comment a bit, say like that: > Setting pd_lower is useful for two cases which make use of WAL > compressibility even if the meta page is initialized at replay: > - Logging of init forks for unlogged relations. > - wal_consistency_checking logs extra full-page writes, and this > allows masking of the unused space of the page. > > Now I often get complains that I suck at this exercise ;) So, I think I understand the discussion so far and the arguments about what we should write to explain why we're setting pd_lower to the correct value. Just to remind, I didn't actually start this thread [1] to address the issue that the FPWs of meta pages written to WAL are not getting compressed. An external backup tool relies on pd_lower to give the correct starting offset of the hole to compress, provided the page's other fields suggest it has the standard page layout. Since we discovered that pd_lower is not set correctly in gin, brin, and spgist meta pages, I created patches to do the same. You (Michael) pointed out that that actually helps compress their FPW images in WAL as well, so we began considering that. Also, you pointed out that WAL checking code masks pages based on the respective masking functions' assumptions about the page's layout properties, which the original patches forgot to consider. So, we updated the patches to change the respective masking functions to mask meta pages as pages with standard page layout, too. But as Tom pointed out [2], WAL compressibility benefit cannot be obtained unless we change how the meta page is passed to xlog.c to be written to WAL. So, we found out all the places that write/register the meta page to WAL and changed the respective XLogRegisterBuffer calls to include the REGBUG_STANDARD flag. Some of these calls already passed REGBUF_WILL_INIT, which would result in no FPW image to be written to the WAL and so there would be no question of compressibility. But, passing REGBUF_STANDARD would still help the case where WAL checking is enabled, because FPW image *is* written in that case. So, ISTM, comments that the patches add should all say that setting the meta pages' pd_lower to the correct value helps to pass those pages to xlog.c as compressible standard layout pages, regardless of whether they are actually passed that way. Even if the patches do take care of the latter as well. Did I miss something? Looking at Amit K's updated patches for btree and hash, it seems that he updated the comment to say that setting pd_lower to the correct value is *essential*, because those pages are passed as REGBUF_STANDARD pages and hence will be compressed. That seems to contradict what I wrote above, but I think that's not wrong, too. So, I updated the gin, brin, and spgist patches to say the same, viz. the following: /* * Set pd_lower just past the end of the metadata. This is essential, * because without doing so, metadata will be lost if xlog.c compresses * the page. */ Attached updated patches. Thanks, Amit [1] https://www.postgresql.org/message-id/0d273805-0e9e-ec1a-cb84-d4da400b8f85%40lab.ntt.co.jp [2] https://www.postgresql.org/message-id/22268.1504815869%40sss.pgh.pa.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Hi. > > Trying to catch up. > > On 2017/09/25 13:43, Michael Paquier wrote: >> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> Added and updated the comments for both btree and hash index patches. >> >> I don't have real complaints about this patch, this looks fine to me. >> >> + * Currently, the advantage of setting pd_lower is in limited cases like >> + * during wal_consistency_checking or while logging for unlogged relation >> + * as for all other purposes, we initialize the metapage. Note, it also >> + * helps in page masking by allowing to mask unused space. >> I would have reworked this comment a bit, say like that: >> Setting pd_lower is useful for two cases which make use of WAL >> compressibility even if the meta page is initialized at replay: >> - Logging of init forks for unlogged relations. >> - wal_consistency_checking logs extra full-page writes, and this >> allows masking of the unused space of the page. >> >> Now I often get complains that I suck at this exercise ;) > > So, I think I understand the discussion so far and the arguments about > what we should write to explain why we're setting pd_lower to the correct > value. > > Just to remind, I didn't actually start this thread [1] to address the > issue that the FPWs of meta pages written to WAL are not getting > compressed. An external backup tool relies on pd_lower to give the > correct starting offset of the hole to compress, provided the page's other > fields suggest it has the standard page layout. Since we discovered that > pd_lower is not set correctly in gin, brin, and spgist meta pages, I > created patches to do the same. You (Michael) pointed out that that > actually helps compress their FPW images in WAL as well, so we began > considering that. Also, you pointed out that WAL checking code masks > pages based on the respective masking functions' assumptions about the > page's layout properties, which the original patches forgot to consider. > So, we updated the patches to change the respective masking functions to > mask meta pages as pages with standard page layout, too. > > But as Tom pointed out [2], WAL compressibility benefit cannot be obtained > unless we change how the meta page is passed to xlog.c to be written to > WAL. So, we found out all the places that write/register the meta page to > WAL and changed the respective XLogRegisterBuffer calls to include the > REGBUG_STANDARD flag. Some of these calls already passed > REGBUF_WILL_INIT, which would result in no FPW image to be written to the > WAL and so there would be no question of compressibility. But, passing > REGBUF_STANDARD would still help the case where WAL checking is enabled, > because FPW image *is* written in that case. > > So, ISTM, comments that the patches add should all say that setting the > meta pages' pd_lower to the correct value helps to pass those pages to > xlog.c as compressible standard layout pages, regardless of whether they > are actually passed that way. Even if the patches do take care of the > latter as well. > > Did I miss something? > > Looking at Amit K's updated patches for btree and hash, it seems that he > updated the comment to say that setting pd_lower to the correct value is > *essential*, because those pages are passed as REGBUF_STANDARD pages and > hence will be compressed. That seems to contradict what I wrote above, > but I think that's not wrong, too. > I think you are missing that there are many cases where we use only REGBUF_STANDARD for meta-pages (cf. hash index). For btree, where the advantage is in fewer cases, I have explicitly stated those cases. Do you still have confusion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 3:48 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Trying to catch up. Glad to see you back. > Just to remind, I didn't actually start this thread [1] to address the > issue that the FPWs of meta pages written to WAL are not getting > compressed. An external backup tool relies on pd_lower to give the > correct starting offset of the hole to compress, provided the page's other > fields suggest it has the standard page layout. Since we discovered that > pd_lower is not set correctly in gin, brin, and spgist meta pages, I > created patches to do the same. You (Michael) pointed out that that > actually helps compress their FPW images in WAL as well, so we began > considering that. Also, you pointed out that WAL checking code masks > pages based on the respective masking functions' assumptions about the > page's layout properties, which the original patches forgot to consider. > So, we updated the patches to change the respective masking functions to > mask meta pages as pages with standard page layout, too. Thanks for summarizing all that happened. > But as Tom pointed out [2], WAL compressibility benefit cannot be obtained > unless we change how the meta page is passed to xlog.c to be written to > WAL. So, we found out all the places that write/register the meta page to > WAL and changed the respective XLogRegisterBuffer calls to include the > REGBUG_STANDARD flag. Some of these calls already passed > REGBUF_WILL_INIT, which would result in no FPW image to be written to the > WAL and so there would be no question of compressibility. But, passing > REGBUF_STANDARD would still help the case where WAL checking is enabled, > because FPW image *is* written in that case. Yep. > So, ISTM, comments that the patches add should all say that setting the > meta pages' pd_lower to the correct value helps to pass those pages to > xlog.c as compressible standard layout pages, regardless of whether they > are actually passed that way. Even if the patches do take care of the > latter as well. > > Did I miss something? Not that I think of. Buffer metabuffer; + Page metapage; SpGistMetaPageData *metadata; metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); + metapage = BufferGetPage(metabuffer); No need to define metapage here and to call BufferGetPage() as long as the lock on the buffer is not taken. Except that small thing, the patches do their duty. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 11:34, Amit Kapila wrote: > On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote wrote: >> So, ISTM, comments that the patches add should all say that setting the >> meta pages' pd_lower to the correct value helps to pass those pages to >> xlog.c as compressible standard layout pages, regardless of whether they >> are actually passed that way. Even if the patches do take care of the >> latter as well. >> >> Did I miss something? >> >> Looking at Amit K's updated patches for btree and hash, it seems that he >> updated the comment to say that setting pd_lower to the correct value is >> *essential*, because those pages are passed as REGBUF_STANDARD pages and >> hence will be compressed. That seems to contradict what I wrote above, >> but I think that's not wrong, too. >> > > I think you are missing that there are many cases where we use only > REGBUF_STANDARD for meta-pages (cf. hash index). For btree, where the > advantage is in fewer cases, I have explicitly stated those cases. I do see that there are some places that use only REGBUF_STANDARD. I also understand that specifying this flag is necessary condition for XLogRecordAssemble() to perform the hole compression, if it is to be performed at all. ISTM, the hole is compressed only if we write the FP image. However, reasons for why FP image needs to be written, AFAICS, are independent of whether the hole is (should be) compressed or not. Whether compression should occur or not depends on whether the REGBUF_STANDARD flag is passed, that is, whether a caller is sure that the page is of standard layout. The last part is only true if page's pd_lower is always set correctly. Perhaps, we should focus on just writing exactly why setting pd_lower to the correct value is essential. I think that's a good change, so I adopted it from your patch. The *only* reason why it's essential to set pd_lower to the correct value, as I see it, is that xloginsert.c *might* compress the hole in the page and its pd_lower better be correct if we want compression to preserve the metadata content (in meta pages). OTOH, it's not clear to me why we should write in that comment *why* the compression would occur or why the FP image would be written in the first place. Whether or not FP image is written has nothing to do with whether we should set pd_lower correctly, does it? Also, since we want to repeat the same comment in multiple places where we now set pd_lower (gin, brin, spgist patches have many more new places where meta page's pd_lower is set), keeping it to the point would be good. But as you said a few times, we should leave it to the committer to decide the exact text to write in these comment, which I think is fine. > Do you still have confusion? This is an area of PG code I don't have much experience with and is also a bit complex, so sorry if I'm saying things that are not quite right. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 12:17, Michael Paquier wrote: > On Mon, Sep 25, 2017 at 3:48 PM, Amit Langote wrote: >> So, ISTM, comments that the patches add should all say that setting the >> meta pages' pd_lower to the correct value helps to pass those pages to >> xlog.c as compressible standard layout pages, regardless of whether they >> are actually passed that way. Even if the patches do take care of the >> latter as well. >> >> Did I miss something? > > Not that I think of. Thanks. > Buffer metabuffer; > + Page metapage; > SpGistMetaPageData *metadata; > > metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); > + metapage = BufferGetPage(metabuffer); > No need to define metapage here and to call BufferGetPage() as long as > the lock on the buffer is not taken. Ah, okay. Moved those additions inside the if (ConditionalLockBuffer(metabuffer)) block. > Except that small thing, the patches do their duty. Thanks, revised patches attached. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Sep 26, 2017 at 4:22 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Except that small thing, the patches do their duty. > > Thanks, revised patches attached. Cool, let's switch it back to a ready for committer status then. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 16:30, Michael Paquier wrote: > On Tue, Sep 26, 2017 at 4:22 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> Except that small thing, the patches do their duty. >> >> Thanks, revised patches attached. > > Cool, let's switch it back to a ready for committer status then. Sure, thanks. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2017/09/26 16:30, Michael Paquier wrote: >> Cool, let's switch it back to a ready for committer status then. > Sure, thanks. Pushed with some cosmetic adjustments --- mostly, making the comments more explicit about why we need the apparently-redundant assignments to pd_lower. I've marked the CF entry closed. However, I'm not sure if we're quite done with this thread. Weren't we going to adjust nbtree and hash to be more aggressive about labeling their metapages as REGBUF_STANDARD? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2017/09/26 16:30, Michael Paquier wrote: >>> Cool, let's switch it back to a ready for committer status then. > >> Sure, thanks. > > Pushed with some cosmetic adjustments --- mostly, making the comments more > explicit about why we need the apparently-redundant assignments to > pd_lower. > > I've marked the CF entry closed. However, I'm not sure if we're quite > done with this thread. Weren't we going to adjust nbtree and hash to > be more aggressive about labeling their metapages as REGBUF_STANDARD? > I have already posted the patches [1] for the same in this thread and those are reviewed [2][3] as well. I have adjusted the comments as per latest commit. Please find updated patches attached. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BTKg6ZK%3DmF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAB7nPqTE4-GCaLtDh%3DJBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've marked the CF entry closed. However, I'm not sure if we're quite >> done with this thread. Weren't we going to adjust nbtree and hash to >> be more aggressive about labeling their metapages as REGBUF_STANDARD? > > I have already posted the patches [1] for the same in this thread and > those are reviewed [2][3] as well. I have adjusted the comments as per > latest commit. Please find updated patches attached. Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set, at least for page masking. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes: > On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I've marked the CF entry closed. However, I'm not sure if we're quite >>> done with this thread. Weren't we going to adjust nbtree and hash to >>> be more aggressive about labeling their metapages as REGBUF_STANDARD? >> I have already posted the patches [1] for the same in this thread and >> those are reviewed [2][3] as well. I have adjusted the comments as per >> latest commit. Please find updated patches attached. > Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set, > at least for page masking. Thanks, I'd forgotten those patches were already posted. Looks good, so pushed. Looking around, I noted that contrib/bloom also had the disease of not telling log_newpage it was writing a standard-format metapage, so I fixed that too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/11/03 6:24, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2017/09/26 16:30, Michael Paquier wrote: >>> Cool, let's switch it back to a ready for committer status then. > >> Sure, thanks. > > Pushed with some cosmetic adjustments --- mostly, making the comments more > explicit about why we need the apparently-redundant assignments to > pd_lower. Thank you. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 4, 2017 at 2:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> I've marked the CF entry closed. However, I'm not sure if we're quite >>>> done with this thread. Weren't we going to adjust nbtree and hash to >>>> be more aggressive about labeling their metapages as REGBUF_STANDARD? > >>> I have already posted the patches [1] for the same in this thread and >>> those are reviewed [2][3] as well. I have adjusted the comments as per >>> latest commit. Please find updated patches attached. > >> Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set, >> at least for page masking. > > Thanks, I'd forgotten those patches were already posted. Looks good, > so pushed. > > Looking around, I noted that contrib/bloom also had the disease of > not telling log_newpage it was writing a standard-format metapage, > so I fixed that too. > Thanks, Michael and Tom for reviewing and committing the work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers