Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Minimal logical decoding on standbys |
Date | |
Msg-id | CA+TgmoaVcu_mbxbH=EccvKG6u8+MdQf9zx98uAL9zsStFwrYsQ@mail.gmail.com Whole thread Raw |
In response to | Re: Minimal logical decoding on standbys ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Minimal logical decoding on standbys
Re: Minimal logical decoding on standbys |
List | pgsql-hackers |
On Wed, Dec 14, 2022 at 8:06 AM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > Please find attached v31 with the changes mentioned above (except that I put your wording into the commit message insteadof a README: I think it helps to make > clear what the "design" for the patch series is). Thanks, I think that's good clarification. I read through 0001 again and I noticed this: typedef struct xl_heap_prune { TransactionId snapshotConflictHorizon; uint16 nredirected; uint16 ndead; + bool onCatalogAccessibleInLogicalDecoding; /* OFFSET NUMBERS are in the block reference 0 */ } xl_heap_prune; I think this is unsafe on alignment-picky machines. I think it will cause the offset numbers to be aligned at an odd address. heap_xlog_prune() doesn't copy the data into aligned memory, so I think this will result in a misaligned pointer being passed down to heap_page_prune_execute. I wonder what the best fix is here. We could (1) have heap_page_prune_execute copy the data into a newly-palloc'd chunk, which seems kind of sad on performance grounds, or we could (2) just make the field here two bytes, or add a second byte as padding, but that bloats the WAL slightly, or we could (3) try to steal a bit from ndirected or ndead, if we think that we don't need all the bits. It seems like the maximum block size is 32kB right now, which means MaxOffsetNumber can't, I think, be more than 16kB. So maybe we could think of replacing nredirected and ndead with uint32 flags and then have accessor macros. But it looks like we also have a bunch of similar issues elsewhere. gistxlogDelete looks like it has the same problem. gistxlogPageReuse is OK because there's no data afterwards. xl_hash_vacuum_one_page looks like it has the same problem. So does xl_heap_prune. xl_heap_freeze_page also has the issue: heap_xlog_freeze_page does memcpy the plans, but not the offsets, and even for the plans, I think for correctness we would need to treat the "plans" pointer as a void * or char * because the pointer might be unaligned and the compiler, not knowing that, could do bad things. xl_btree_reuse_page is OK because no data follows the main record. xl_btree_delete appears to have this problem if you just look at the comments, because it says that offset numbers follow, and thus are probably presumed aligned. However, they don't really follow, because commit d2e5e20e57111cca9e14f6e5a99a186d4c66a5b7 moved the data from the main data to the registered buffer data. However, AIUI, you're not really supposed to assume that the registered buffer data is aligned. I think this just happens to work because the length of the main record is a multiple of the relevant small integer, and the size of the block header is 4, so the buffer data ends up being accidentally aligned. That might be worth fixing somehow independently of this issue. spgxlogVacuumRedirect is OK because the offsets array is part of the struct, using FLEXIBLE_ARRAY_MEMBER, which will cause the offsets field to be aligned properly. It means inserting a padding byte, but it's not broken. If we don't mind adding padding bytes in some of the other cases, we could potentially make use of this technique elsewhere, I think. Other comments: + if RelationIsAccessibleInLogicalDecoding(rel) + xlrec.flags |= VISIBILITYMAP_ON_CATALOG_ACCESSIBLE_IN_LOGICAL_DECODING; This is a few parentheses short of where it should be. Hilariously it still compiles because there are parentheses in the macro definition. + xlrec.onCatalogAccessibleInLogicalDecoding = RelationIsAccessibleInLogicalDecoding(relation); These lines are quite long. I think we should consider (1) picking a shorter name for the xlrec field and, if it's such lines are going to still routinely exceed 80 characters, (2) splitting them into two lines, with the second one indented to match pgindent's preferences in such cases, which I think is something like this: xlrec.onCatalogAccessibleInLogicalDecoding = RelationIsAccessibleInLogicalDecoding(relation); As far as renaming, I think we could at least remove onCatalog part from the identifier, as that doesn't seem to be adding much. And maybe we could even think of changing it to something like logicalDecodingConflict or even decodingConflict, which would shave off a bunch more characters. + if (heapRelation->rd_options) + isusercatalog = ((StdRdOptions *) (heapRelation)->rd_options)->user_catalog_table; Couldn't you get rid of the if statement here and also the initialization at the top of the function and just write isusercatalog = RelationIsUsedAsCatalogTable(heapRelation)? Or even just get rid of the variable entirely and pass RelationIsUsedAsCatalogTable(heapRelation) as the argument to UpdateIndexRelation directly? I think this could use some test cases demonstrating that indisusercatalog gets set correctly in all the relevant cases: table is created with user_catalog_table = true/false, reloption is changed, reloptions are reset, new index is added later, etc. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: