Thread: Separate HEAP WAL replay logic into its own file
Hi PostgreSQL hackers, For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL replaylogic are organized in separate source files. However, the HEAP access method is an exception. Both the access methodand the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other access methodsand to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. Thechanges are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the common heap_execute_freeze_tuple()helper function into the heapam.h header, and adjust the build files. I hope people find this straightforward refactoring helpful. Yong
Attachment
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote: > > Hi PostgreSQL hackers, > > For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WALreplay logic are organized in separate source files. However, the HEAP access method is an exception. Both the accessmethod and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other accessmethods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the commonheap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files. I'm not against this change, but I am curious at what inspired this. Were you looking at Postgres code and simply noticed that there isn't a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you wanted to change that? Or is there some specific reason this would help you as a Postgres developer, user, or ecosystem member? - Melanie
> On Jun 17, 2024, at 23:01, Melanie Plageman <melanieplageman@gmail.com> wrote: > > External Email > > On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote: >> >> Hi PostgreSQL hackers, >> >> For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WALreplay logic are organized in separate source files. However, the HEAP access method is an exception. Both the accessmethod and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other accessmethods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the commonheap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files. > > I'm not against this change, but I am curious at what inspired this. > Were you looking at Postgres code and simply noticed that there isn't > a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you > wanted to change that? Or is there some specific reason this would > help you as a Postgres developer, user, or ecosystem member? > > - Melanie As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me tofind them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the samefor the heap access method. When I finally found them (via text search), it was a small surprise. Having different fileorganizations for different access methods gives me this urge to make everything consistent. I think it will make iteasier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.cnot some “???xlog.c”. Yong
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote: > > As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me tofind them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the samefor the heap access method. When I finally found them (via text search), it was a small surprise. Having different fileorganizations for different access methods gives me this urge to make everything consistent. I think it will make iteasier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.cnot some “???xlog.c”. That makes sense. The branch for PG18 has not been cut yet, so I recommend registering this patch for the July commitfest [1] so it doesn't get lost. - Melanie [1] https://commitfest.postgresql.org/48/
> On Jun 18, 2024, at 20:42, Melanie Plageman <melanieplageman@gmail.com> wrote: > > External Email > > On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote: >> >> As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for meto find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find thesame for the heap access method. When I finally found them (via text search), it was a small surprise. Having differentfile organizations for different access methods gives me this urge to make everything consistent. I think it willmake it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is insidethe heapam.c not some “???xlog.c”. > > That makes sense. The branch for PG18 has not been cut yet, so I > recommend registering this patch for the July commitfest [1] so it > doesn't get lost. > > - Melanie > Thanks for the positive feedback. I’ve added the patch to the July CF. Yong
Hi, I'm reviewing patches in commitfest 2024-07: https://commitfest.postgresql.org/48/ This is the 5th patch: https://commitfest.postgresql.org/48/5054/ FYI: https://commitfest.postgresql.org/48/4681/ is my patch. In <DAB53475-2470-4431-A3D1-A2A855530EAB@ebay.com> "Re: Separate HEAP WAL replay logic into its own file" on Tue, 18 Jun 2024 01:12:42 +0000, "Li, Yong" <yoli@ebay.com> wrote: > As a newcomer, when I was walking through the code looking > for WAL replay related code, it was relatively easy for me > to find them for the B-Tree access method because of the > “xlog” hint in the file names. It took me a while to > find the same for the heap access method. When I finally > found them (via text search), it was a small > surprise. Having different file organizations for > different access methods gives me this urge to make > everything consistent. I think it will make it easier for > newcomers, and it will reduce the mental load for everyone > to remember that heap replay is inside the heapam.c not > some “???xlog.c”. It makes sense. Here are my comments for your patch: 1. Could you create your patch by "git format-patch -vN master" or something? If you create your patch by "git format-patch", we can apply your patch by "git am XXX.patch". 2. I confirmed that all heapam.c -> heapam_xlog.c/heapam.h moves don't change implementations. I re-moved moved codes to heapam.c and there is no diff in heapam.c. 3. Could you remove '#include "access/heapam_xlog.h"' from heapam.c because it's needless now. BTW, it seems that we can remove more includes from heapam.c: ---- diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index bc6d4868975..f1671072576 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -31,42 +31,24 @@ */ #include "postgres.h" -#include "access/bufmask.h" #include "access/heapam.h" -#include "access/heapam_xlog.h" #include "access/heaptoast.h" #include "access/hio.h" #include "access/multixact.h" -#include "access/parallel.h" -#include "access/relscan.h" #include "access/subtrans.h" #include "access/syncscan.h" -#include "access/sysattr.h" -#include "access/tableam.h" -#include "access/transam.h" #include "access/valid.h" #include "access/visibilitymap.h" -#include "access/xact.h" -#include "access/xlog.h" #include "access/xloginsert.h" -#include "access/xlogutils.h" -#include "catalog/catalog.h" #include "commands/vacuum.h" -#include "miscadmin.h" #include "pgstat.h" -#include "port/atomics.h" #include "port/pg_bitutils.h" -#include "storage/bufmgr.h" -#include "storage/freespace.h" #include "storage/lmgr.h" #include "storage/predicate.h" #include "storage/procarray.h" -#include "storage/standby.h" #include "utils/datum.h" #include "utils/injection_point.h" #include "utils/inval.h" -#include "utils/relcache.h" -#include "utils/snapmgr.h" #include "utils/spccache.h" --- We may want to work on removing needless includes as a separated cleanup task. 4. Could you remove needless includes from heapam_xlog.c? It seems that we can remove the following includes: ---- diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c index b372f2b4afc..af4976f382d 100644 --- a/src/backend/access/heap/heapam_xlog.c +++ b/src/backend/access/heap/heapam_xlog.c @@ -16,16 +16,11 @@ #include "access/bufmask.h" #include "access/heapam.h" -#include "access/heapam_xlog.h" -#include "access/transam.h" #include "access/visibilitymap.h" #include "access/xlog.h" #include "access/xlogutils.h" -#include "port/atomics.h" -#include "storage/bufmgr.h" #include "storage/freespace.h" #include "storage/standby.h" -#include "utils/relcache.h" ---- 5. There are still WAL related codes in heapam.c: 4.1. log_heap_update() 4.2. log_heap_new_cid() 4.3. if (RelationNeedsWAL()) {...} in heap_insert() 4.4. if (needwal) {...} in heap_multi_insert() 4.5. if (RelationNeedsWAL()) {...} in heap_delete() 4.6. if (RelationNeedsWAL()) {...}s in heap_update() 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple() 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec() 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative() 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative() 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update() 4.12. log_heap_visible() Should we move them to head_xlog.c too? If we should do it, separated commits will be easy to review. For example, the 0001 patch moves existing codes to head_xlog.c as-is. The 0002 patch extracts WAL related codes in heap_insert() to heap_xlog.c and so on. Thanks, -- kou
> On Jul 23, 2024, at 09:54, Sutou Kouhei <kou@clear-code.com> wrote:
>
>
> Here are my comments for your patch:
>
> 1. Could you create your patch by "git format-patch -vN master"
> or something? If you create your patch by "git format-patch",
> we can apply your patch by "git am XXX.patch".
>
Thanks for your review. I’ve updated the patch to follow your
suggested format.
>
> 3. Could you remove '#include "access/heapam_xlog.h"' from
> heapam.c because it's needless now.
>
> BTW, it seems that we can remove more includes from
> heapam.c:
>
> 4. Could you remove needless includes from heapam_xlog.c? It
> seems that we can remove the following includes:
I have removed the redundant includes in the latest patch.
>
> 5. There are still WAL related codes in heapam.c:
>
> 4.1. log_heap_update()
> 4.2. log_heap_new_cid()
> 4.3. if (RelationNeedsWAL()) {...} in heap_insert()
> 4.4. if (needwal) {...} in heap_multi_insert()
> 4.5. if (RelationNeedsWAL()) {...} in heap_delete()
> 4.6. if (RelationNeedsWAL()) {...}s in heap_update()
> 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
> 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
> 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
> 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
> 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
> 4.12. log_heap_visible()
>
> Should we move them to head_xlog.c too?
>
> If we should do it, separated commits will be easy to
> review. For example, the 0001 patch moves existing codes
> to head_xlog.c as-is. The 0002 patch extracts WAL related
> codes in heap_insert() to heap_xlog.c and so on.
>
>
> Thanks,
> --
> kou
I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.
Please help review the updated file again. Thanks in advance!
Yong
Attachment
Hi, In <599E67D2-2929-4858-B8BC-F9C4AE889BF6@ebay.com> "Re: Separate HEAP WAL replay logic into its own file" on Fri, 26 Jul 2024 07:56:12 +0000, "Li, Yong" <yoli@ebay.com> wrote: >> 1. Could you create your patch by "git format-patch -vN master" >> or something? If you create your patch by "git format-patch", >> we can apply your patch by "git am XXX.patch". >> > > Thanks for your review. I’ve updated the patch to follow your > suggested format. Thanks. I could apply your patch by "git am v2-0001-heapam_refactor.patch". Could you use the following format for the commit message next time? ---- ${TITLE} ${DESCRIPTION} ---- For example: ---- Separate HEAP WAL replay logic into its own file Most access methods (i.e. nbtree and hash) use a separate file with "xlog" in its name for its WAL replay logic. Heap is one exception of this convention. To make it easier for newcomers to find the WAL replay logic for the heap access method, this patch isolates heap's replay logic in a new heapam_xlog.c file. This patch is a pure refactoring with no change to the logic. ---- This is a commonly used Git's commit message format. See also other commit messages by "git log". >> 5. There are still WAL related codes in heapam.c: >> >> 4.1. log_heap_update() >> 4.2. log_heap_new_cid() >> 4.3. if (RelationNeedsWAL()) {...} in heap_insert() >> 4.4. if (needwal) {...} in heap_multi_insert() >> 4.5. if (RelationNeedsWAL()) {...} in heap_delete() >> 4.6. if (RelationNeedsWAL()) {...}s in heap_update() >> 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple() >> 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec() >> 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative() >> 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative() >> 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update() >> 4.12. log_heap_visible() >> >> Should we move them to head_xlog.c too? >> >> If we should do it, separated commits will be easy to >> review. For example, the 0001 patch moves existing codes >> to head_xlog.c as-is. The 0002 patch extracts WAL related >> codes in heap_insert() to heap_xlog.c and so on. > > I followed the convention of most access methods. The “xlog” > file includes the WAL replay logic only. The logic that generates > the log records themselves stays with the code that performs > the changes. Take nbtree as an example, you can also find > WAL generating code in several _bt_insertxxx() functions inside > the nbtinsert.c file. You're right. Sorry. I think that this proposal is reasonable but we need to get attention from a committer to move forward this proposal. Thanks, -- kou
> I think that this proposal is reasonable but we need to get > attention from a committer to move forward this proposal. > > > Thanks, > — > kou Thank you Kou for your review. I will move the CF to the next phase and see what happens. Regards, Yong
> On Sep 12, 2024, at 13:39, Michael Paquier <michael@paquier.xyz> wrote: > > External Email > > > I was looking at all that, and this is only moving code around. While > the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad > but historical, the header cleanup in heapam.c is nice. > > Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to > XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the > opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h > was less interesting. Just to say that I am agreeing with you here > and I have let this part as you suggested originally. > > I was wondering for a bit about the order of the functions for heap > and heap, but these are ordered in their own, which is also OK. I > have added a few more comments at the top of each subroutine for the > records to be more consistent, and applied the result. > — > Michael > I am so glad to see that my patch got committed. Thank you a lot for it! This is my first accepted patch. It really means a lot to me. Yong