Thread: Separate HEAP WAL replay logic into its own file

Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:
Hi PostgreSQL hackers,

For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL
replaylogic are organized in separate source files.  However, the HEAP access method is an exception.  Both the access
methodand the WAL replay logic are collocated in the same heapam.c.  To follow the pattern established by other access
methodsand to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file.
Thechanges are straightforward.  Move the replay related functions into the new heapam_xlog.c file, push the common
heap_execute_freeze_tuple()helper function into the heapam.h header, and adjust the build files.
 

I hope people find this straightforward refactoring helpful.


Yong




Attachment

Re: Separate HEAP WAL replay logic into its own file

From
Melanie Plageman
Date:
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote:
>
> Hi PostgreSQL hackers,
>
> For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its
WALreplay logic are organized in separate source files.  However, the HEAP access method is an exception.  Both the
accessmethod and the WAL replay logic are collocated in the same heapam.c.  To follow the pattern established by other
accessmethods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own
file. The changes are straightforward.  Move the replay related functions into the new heapam_xlog.c file, push the
commonheap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files. 

I'm not against this change, but I am curious at what inspired this.
Were you looking at Postgres code and simply noticed that there isn't
a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
wanted to change that? Or is there some specific reason this would
help you as a Postgres developer, user, or ecosystem member?

- Melanie



Re: Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:

> On Jun 17, 2024, at 23:01, Melanie Plageman <melanieplageman@gmail.com> wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote:
>> 
>> Hi PostgreSQL hackers,
>> 
>> For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its
WALreplay logic are organized in separate source files.  However, the HEAP access method is an exception.  Both the
accessmethod and the WAL replay logic are collocated in the same heapam.c.  To follow the pattern established by other
accessmethods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own
file. The changes are straightforward.  Move the replay related functions into the new heapam_xlog.c file, push the
commonheap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files.
 
> 
> I'm not against this change, but I am curious at what inspired this.
> Were you looking at Postgres code and simply noticed that there isn't
> a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
> wanted to change that? Or is there some specific reason this would
> help you as a Postgres developer, user, or ecosystem member?
> 
> - Melanie

As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me
tofind them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the
samefor the heap access method. When I finally found them (via text search), it was a small surprise. Having different
fileorganizations for different access methods gives me this urge to make everything consistent. I think it will make
iteasier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the
heapam.cnot some “???xlog.c”.
 

Yong

Re: Separate HEAP WAL replay logic into its own file

From
Melanie Plageman
Date:
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote:
>
> As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me
tofind them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the
samefor the heap access method. When I finally found them (via text search), it was a small surprise. Having different
fileorganizations for different access methods gives me this urge to make everything consistent. I think it will make
iteasier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the
heapam.cnot some “???xlog.c”. 

That makes sense. The branch for PG18 has not been cut yet, so I
recommend registering this patch for the July commitfest [1] so it
doesn't get lost.

- Melanie

[1] https://commitfest.postgresql.org/48/



Re: Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:

> On Jun 18, 2024, at 20:42, Melanie Plageman <melanieplageman@gmail.com> wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote:
>> 
>> As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for
meto find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find
thesame for the heap access method. When I finally found them (via text search), it was a small surprise. Having
differentfile organizations for different access methods gives me this urge to make everything consistent. I think it
willmake it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is
insidethe heapam.c not some “???xlog.c”.
 
> 
> That makes sense. The branch for PG18 has not been cut yet, so I
> recommend registering this patch for the July commitfest [1] so it
> doesn't get lost.
> 
> - Melanie
> 

Thanks for the positive feedback. I’ve added the patch to the July CF.

Yong


Re: Separate HEAP WAL replay logic into its own file

From
Sutou Kouhei
Date:
Hi,

I'm reviewing patches in commitfest 2024-07:
https://commitfest.postgresql.org/48/

This is the 5th patch:
https://commitfest.postgresql.org/48/5054/

FYI: https://commitfest.postgresql.org/48/4681/ is my patch.

In <DAB53475-2470-4431-A3D1-A2A855530EAB@ebay.com>
  "Re: Separate HEAP WAL replay logic into its own file" on Tue, 18 Jun 2024 01:12:42 +0000,
  "Li, Yong" <yoli@ebay.com> wrote:

> As a newcomer, when I was walking through the code looking
> for WAL replay related code, it was relatively easy for me
> to find them for the B-Tree access method because of the
> “xlog” hint in the file names. It took me a while to
> find the same for the heap access method. When I finally
> found them (via text search), it was a small
> surprise. Having different file organizations for
> different access methods gives me this urge to make
> everything consistent. I think it will make it easier for
> newcomers, and it will reduce the mental load for everyone
> to remember that heap replay is inside the heapam.c not
> some “???xlog.c”.

It makes sense.


Here are my comments for your patch:

1. Could you create your patch by "git format-patch -vN master"
   or something? If you create your patch by "git format-patch",
   we can apply your patch by "git am XXX.patch".

2. I confirmed that all heapam.c -> heapam_xlog.c/heapam.h
   moves don't change implementations. I re-moved moved
   codes to heapam.c and there is no diff in heapam.c.

3. Could you remove '#include "access/heapam_xlog.h"' from
   heapam.c because it's needless now.

   BTW, it seems that we can remove more includes from
   heapam.c:

----
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bc6d4868975..f1671072576 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -31,42 +31,24 @@
  */
 #include "postgres.h"
 
-#include "access/bufmask.h"
 #include "access/heapam.h"
-#include "access/heapam_xlog.h"
 #include "access/heaptoast.h"
 #include "access/hio.h"
 #include "access/multixact.h"
-#include "access/parallel.h"
-#include "access/relscan.h"
 #include "access/subtrans.h"
 #include "access/syncscan.h"
-#include "access/sysattr.h"
-#include "access/tableam.h"
-#include "access/transam.h"
 #include "access/valid.h"
 #include "access/visibilitymap.h"
-#include "access/xact.h"
-#include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/xlogutils.h"
-#include "catalog/catalog.h"
 #include "commands/vacuum.h"
-#include "miscadmin.h"
 #include "pgstat.h"
-#include "port/atomics.h"
 #include "port/pg_bitutils.h"
-#include "storage/bufmgr.h"
-#include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "storage/procarray.h"
-#include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/injection_point.h"
 #include "utils/inval.h"
-#include "utils/relcache.h"
-#include "utils/snapmgr.h"
 #include "utils/spccache.h"
---

   We may want to work on removing needless includes as a
   separated cleanup task.

4. Could you remove needless includes from heapam_xlog.c? It
   seems that we can remove the following includes:

----
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b372f2b4afc..af4976f382d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -16,16 +16,11 @@
 
 #include "access/bufmask.h"
 #include "access/heapam.h"
-#include "access/heapam_xlog.h"
-#include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xlogutils.h"
-#include "port/atomics.h"
-#include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "storage/standby.h"
-#include "utils/relcache.h"
----

5. There are still WAL related codes in heapam.c:

   4.1. log_heap_update()
   4.2. log_heap_new_cid()
   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
   4.4. if (needwal) {...} in heap_multi_insert()
   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
   4.12. log_heap_visible()

   Should we move them to head_xlog.c too?

   If we should do it, separated commits will be easy to
   review. For example, the 0001 patch moves existing codes
   to head_xlog.c as-is. The 0002 patch extracts WAL related
   codes in heap_insert() to heap_xlog.c and so on.


Thanks,
-- 
kou



Re: Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:


> On Jul 23, 2024, at 09:54, Sutou Kouhei <kou@clear-code.com> wrote:
>
>
> Here are my comments for your patch:
>
> 1. Could you create your patch by "git format-patch -vN master"
>   or something? If you create your patch by "git format-patch",
>   we can apply your patch by "git am XXX.patch".
>

Thanks for your review. I’ve updated the patch to follow your
suggested format.

>
> 3. Could you remove '#include "access/heapam_xlog.h"' from
>   heapam.c because it's needless now.
>
>   BTW, it seems that we can remove more includes from
>   heapam.c:
>
> 4. Could you remove needless includes from heapam_xlog.c? It
>   seems that we can remove the following includes:

I have removed the redundant includes in the latest patch.

>
> 5. There are still WAL related codes in heapam.c:
>
>   4.1. log_heap_update()
>   4.2. log_heap_new_cid()
>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>   4.4. if (needwal) {...} in heap_multi_insert()
>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>   4.12. log_heap_visible()
>
>   Should we move them to head_xlog.c too?
>
>   If we should do it, separated commits will be easy to
>   review. For example, the 0001 patch moves existing codes
>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>   codes in heap_insert() to heap_xlog.c and so on.
>
>
> Thanks,
> --
> kou

I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.

Please help review the updated file again. Thanks in advance!


Yong




Attachment

Re: Separate HEAP WAL replay logic into its own file

From
Sutou Kouhei
Date:
Hi,

In <599E67D2-2929-4858-B8BC-F9C4AE889BF6@ebay.com>
  "Re: Separate HEAP WAL replay logic into its own file" on Fri, 26 Jul 2024 07:56:12 +0000,
  "Li, Yong" <yoli@ebay.com> wrote:

>> 1. Could you create your patch by "git format-patch -vN master"
>>   or something? If you create your patch by "git format-patch",
>>   we can apply your patch by "git am XXX.patch".
>>
> 
> Thanks for your review. I’ve updated the patch to follow your
> suggested format.

Thanks. I could apply your patch by "git am
v2-0001-heapam_refactor.patch".

Could you use the following format for the commit message
next time?

----
${TITLE}

${DESCRIPTION}
----

For example:

----
Separate HEAP WAL replay logic into its own file

Most access methods (i.e. nbtree and hash) use a separate
file with "xlog" in its name for its WAL replay logic. Heap
is one exception of this convention. To make it easier for
newcomers to find the WAL replay logic for the heap access
method, this patch isolates heap's replay logic in a new
heapam_xlog.c file. This patch is a pure refactoring with no
change to the logic.
----

This is a commonly used Git's commit message format. See
also other commit messages by "git log".

>> 5. There are still WAL related codes in heapam.c:
>>
>>   4.1. log_heap_update()
>>   4.2. log_heap_new_cid()
>>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>>   4.4. if (needwal) {...} in heap_multi_insert()
>>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>>   4.12. log_heap_visible()
>>
>>   Should we move them to head_xlog.c too?
>>
>>   If we should do it, separated commits will be easy to
>>   review. For example, the 0001 patch moves existing codes
>>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>>   codes in heap_insert() to heap_xlog.c and so on.
> 
> I followed the convention of most access methods. The “xlog”
> file includes the WAL replay logic only. The logic that generates
> the log records themselves stays with the code that performs
> the changes. Take nbtree as an example, you can also find
> WAL generating code in several _bt_insertxxx() functions inside
> the nbtinsert.c file.

You're right. Sorry.


I think that this proposal is reasonable but we need to get
attention from a committer to move forward this proposal.


Thanks,
-- 
kou



Re: Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:
> I think that this proposal is reasonable but we need to get
> attention from a committer to move forward this proposal.
> 
> 
> Thanks,
> —
> kou

Thank you Kou for your review. I will move the CF to the next
phase and see what happens.


Regards,
Yong



Re: Separate HEAP WAL replay logic into its own file

From
"Li, Yong"
Date:

> On Sep 12, 2024, at 13:39, Michael Paquier <michael@paquier.xyz> wrote:
> 
> External Email
> 
> 
> I was looking at all that, and this is only moving code around.  While
> the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad
> but historical, the header cleanup in heapam.c is nice.
> 
> Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to
> XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the
> opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h
> was less interesting.  Just to say that I am agreeing with you here
> and I have let this part as you suggested originally.
> 
> I was wondering for a bit about the order of the functions for heap
> and heap, but these are ordered in their own, which is also OK.  I
> have added a few more comments at the top of each subroutine for the
> records to be more consistent, and applied the result.
> —
> Michael
> 

I am so glad to see that my patch got committed. Thank you a lot for it!
This is my first accepted patch. It really means a lot to me.

Yong