Thread: SLRUs in the main buffer pool - Page Header definitions

SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Hi all,

PostgreSQL currently maintains several data structures in the SLRU
cache. The SLRU cache has scaling and sizing challenges because of it’s
simple implementation. The goal is to move these caches to the common
buffer cache to benefit from the stronger capabilities of the common
buffercache code. At AWS, we are building on the patch shared by Thomas
Munro [1], which treats the SLRU pages as part of a pseudo-databatabe
of ID 9. We will refer to the pages belonging to SLRU components as
BufferedObject pages going forward.

The current SLRU pages do not have any header, so there is a need to
create a new page header format for these. Our investigations revealed
that we need to:

1. track LSN to ensure durability and consistency of all pages (for redo
   and full page write purposes)
2. have a checksum (for page correctness verification).
3. A flag to identify if the page is a relational or BufferedObject
4. Track version information.

We are suggesting a minimal BufferedObject page header
to be the following, overlapping with the key fields near the beginning
of the regular PageHeaderData:

typedef struct BufferedObjectPageHeaderData
{
    PageXLogRecPtr pd_lsn;
    uint16_t       pd_checksum;
    uint16_t       pd_flags;
    uint16_t       pd_pagesize_version;
} BufferedObjectPageHeaderData;

For reference, the regular page header looks like the following:
typedef struct PageHeaderData
{
    PageXLogRecPtr    pd_lsn;
    uint16_t    pd_checksum;
    uint16_t    pd_flags;
    LocationIndex   pd_lower;
    LocationIndex   pd_upper;
    LocationIndex   pd_special;
    uint16_t           pd_pagesize_version;
    TransactionId   pd_prune_xid;
    ItemIdDataCommon  pd_linp[];
} PageHeaderData;

After careful review, we have trimmed out the heap and index specific
fields from the suggested header that do not add any value to SLRU
components. We plan to use pd_lsn, pd_checksum, and pd_pagesize_version
in the same way that they are in relational pages. These fields are
needed to ensure consistency, durability and page correctness.

We will use the 4th bit of pd_flags to identify a BufferedObject page.
If the bit is set then this denotes a BufferedObject page. Today, bits
1 - 3 are used for determining if there are any free line pointers, if
the page is full, and if all tuples on the page are visible to
everyone, respectively. We will use this information accordingly in the
storage manager to determine which callback functions to use for file
I/O operations. This approach allows the buffercache to have an
universal method to quickly determine what type of page it is dealing
with at any time.

Using the new BufferedObject page header will be space efficient but
introduces a significant change in the codebase to now track two types
of page header data. During upgrade, all SLRU files that exist on the
system must be converted to the new format with page header. This will
require rewriting all the SLRU pages with the page header as part of
pg_upgrade.

We believe that this is the correct approach for the long run. We would
love feedback if there are additional items of data that should be
tracked as well. Alternatively, we could re-use the existing page
header and the unused fields could be used as a padding. This feels
like an unclean approach but would avoid having two page header types
in the database.



[1] - https://www.postgresql.org/message-id/flat/CA+hUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv+bFxghNgdieq8Q@mail.gmail.com



Discussed with: Joe Conway, Nathan Bossart, Shawn Debnath


Rishu Bagga

Amazon Web Services (AWS)




Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2022-06-22 21:06:29 +0000, Bagga, Rishu wrote:
> 3. A flag to identify if the page is a relational or BufferedObject

Why is this needed in the page header?


> Using the new BufferedObject page header will be space efficient but
> introduces a significant change in the codebase to now track two types
> of page header data. During upgrade, all SLRU files that exist on the
> system must be converted to the new format with page header. This will
> require rewriting all the SLRU pages with the page header as part of
> pg_upgrade.

How are you proposing to deal with this in the "key" to "offset in SLRU"
mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I assume
you're thinking to deal with this by making the conversion math a bit more
complicated?

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
Tom Lane
Date:
"Bagga, Rishu" <bagrishu@amazon.com> writes:
> The current SLRU pages do not have any header, so there is a need to
> create a new page header format for these. Our investigations revealed
> that we need to:

> 1. track LSN to ensure durability and consistency of all pages (for redo
>    and full page write purposes)
> 2. have a checksum (for page correctness verification).
> 3. A flag to identify if the page is a relational or BufferedObject
> 4. Track version information.

Isn't this a nonstarter from the standpoint of pg_upgrade?

            regards, tom lane



Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2022-06-22 19:12:14 -0400, Tom Lane wrote:
> "Bagga, Rishu" <bagrishu@amazon.com> writes:
> > The current SLRU pages do not have any header, so there is a need to
> > create a new page header format for these. Our investigations revealed
> > that we need to:
> 
> > 1. track LSN to ensure durability and consistency of all pages (for redo
> >    and full page write purposes)
> > 2. have a checksum (for page correctness verification).
> > 3. A flag to identify if the page is a relational or BufferedObject
> > 4. Track version information.
> 
> Isn't this a nonstarter from the standpoint of pg_upgrade?

We're rewriting some relation forks as part of pg_upgrade (visibility map
IIRC?), so rewriting an SLRU is likely not prohibitive - there's much more of
a limit to the SLRU sizes than the number and aggregate size of relation
forks.

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Hi Andres,
Thanks for your response.

To answer your questions:

>> 3. A flag to identify if the page is a relational or BufferedObject
>Why is this needed in the page header?

Now that we are dealing with two different type of page headers, we need to know how to interpret any given page. We
needto use pd_flags to determine this.
 


>How are you proposing to deal with this in the "key" to "offset in >SLRU"
>mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I >assume
>you're thinking to deal with this by making the conversion math a bit >more
>complicated?

You’re right; we would have to account for this in the conversion math between the ‘key’ and ‘offset’. The change to
themacros would be as following:
 

#define MULTIXACT_OFFSETS_PER_PAGE ((BLCKSZ - SizeOfBufferedObjectPageHeaderData) / sizeof(MultiXactOffset))

Additionally, we need to account for the size of the page header when reading and writing multixacts in memory based
offof the entryno.
 

Rishu Bagga

Amazon Web Services (AWS)


On 6/22/22, 2:40 PM, "Andres Freund" <andres@anarazel.de> wrote:

    Hi,

    On 2022-06-22 21:06:29 +0000, Bagga, Rishu wrote:
    > 3. A flag to identify if the page is a relational or BufferedObject

    Why is this needed in the page header?


    > Using the new BufferedObject page header will be space efficient but
    > introduces a significant change in the codebase to now track two types
    > of page header data. During upgrade, all SLRU files that exist on the
    > system must be converted to the new format with page header. This will
    > require rewriting all the SLRU pages with the page header as part of
    > pg_upgrade.

    How are you proposing to deal with this in the "key" to "offset in SLRU"
    mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I assume
    you're thinking to deal with this by making the conversion math a bit more
    complicated?

    Greetings,

    Andres Freund




Re: SLRUs in the main buffer pool - Page Header definitions

From
Robert Haas
Date:
On Wed, Jun 22, 2022 at 5:06 PM Bagga, Rishu <bagrishu@amazon.com> wrote:
> We are suggesting a minimal BufferedObject page header
> to be the following, overlapping with the key fields near the beginning
> of the regular PageHeaderData:
>
> typedef struct BufferedObjectPageHeaderData
> {
>     PageXLogRecPtr pd_lsn;
>     uint16_t       pd_checksum;
>     uint16_t       pd_flags;
>     uint16_t       pd_pagesize_version;
> } BufferedObjectPageHeaderData;
>
> For reference, the regular page header looks like the following:
> typedef struct PageHeaderData
> {
>     PageXLogRecPtr    pd_lsn;
>     uint16_t    pd_checksum;
>     uint16_t    pd_flags;
>     LocationIndex   pd_lower;
>     LocationIndex   pd_upper;
>     LocationIndex   pd_special;
>     uint16_t           pd_pagesize_version;
>     TransactionId   pd_prune_xid;
>     ItemIdDataCommon  pd_linp[];
> } PageHeaderData;
>
> After careful review, we have trimmed out the heap and index specific
> fields from the suggested header that do not add any value to SLRU
> components. We plan to use pd_lsn, pd_checksum, and pd_pagesize_version
> in the same way that they are in relational pages. These fields are
> needed to ensure consistency, durability and page correctness

I think that it's not worth introducing a new page header format to
save 10 bytes per page. Keeping things on the same format is likely to
save more than the minor waste of space costs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2022-06-23 20:25:21 +0000, Bagga, Rishu wrote:
> >> 3. A flag to identify if the page is a relational or BufferedObject
> >Why is this needed in the page header?
> 
> Now that we are dealing with two different type of page headers, we need to
> know how to interpret any given page. We need to use pd_flags to determine
> this.

When do we need to do so? We should never need to figure out whether an
on-disk block is for an SLRU or something else, without also knowing which
relation / SLRU it is in.

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Hi Andres,


>When do we need to do so? We should never need to figure out whether an
>on-disk block is for an SLRU or something else, without also knowing >which
>relation / SLRU it is in.

You are correct that we wouldn’t need to rely on the pd_flag bit to determine page type for any access to a page where
wecome top down following the hierarchy. However, for the purpose of debugging “from the bottom up” it would be
criticalto know what type of page is being read in a system with multiple page header types.
 

On 6/23/22, 2:22 PM, "Andres Freund" <andres@anarazel.de> wrote:


    Hi,

    On 2022-06-23 20:25:21 +0000, Bagga, Rishu wrote:
    > >> 3. A flag to identify if the page is a relational or BufferedObject
    > >Why is this needed in the page header?
    >
    > Now that we are dealing with two different type of page headers, we need to
    > know how to interpret any given page. We need to use pd_flags to determine
    > this.

    When do we need to do so? We should never need to figure out whether an
    on-disk block is for an SLRU or something else, without also knowing which
    relation / SLRU it is in.

    Greetings,

    Andres Freund


Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2022-06-24 00:39:41 +0000, Bagga, Rishu wrote:
> >When do we need to do so? We should never need to figure out whether an
> >on-disk block is for an SLRU or something else, without also knowing >which
> >relation / SLRU it is in.
> 
> You are correct that we wouldn’t need to rely on the pd_flag bit to
> determine page type for any access to a page where we come top down
> following the hierarchy. However, for the purpose of debugging “from the
> bottom up” it would be critical to know what type of page is being read in a
> system with multiple page header types.

That doesn't seem to justify using a bit on the page. Wouldn't it suffice to
add such information to the BufferDesc?

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
Shawn Debnath
Date:
On Thu, Jun 23, 2022 at 06:06:48PM -0700, Andres Freund wrote:

> > You are correct that we wouldn’t need to rely on the pd_flag bit to
> > determine page type for any access to a page where we come top down
> > following the hierarchy. However, for the purpose of debugging “from the
> > bottom up” it would be critical to know what type of page is being read in a
> > system with multiple page header types.
> 
> That doesn't seem to justify using a bit on the page. Wouldn't it suffice to
> add such information to the BufferDesc?

The goal for the bit in pd_flags is to help identify what page header 
should be present if one were to be looking at the physical page outside 
of the database. One example that comes to mind is pg_upgrade.  There 
are other use cases where physical consistency checks could be applied, 
again outside of a running database.

On Thu, Jun 23, 2022 at 04:27:33PM -0400, Robert Haas wrote:

> I think that it's not worth introducing a new page header format to
> save 10 bytes per page. Keeping things on the same format is likely to
> save more than the minor waste of space costs.

Yeah, I think we are open to both approaches, though we believe it would 
be cleaner to get started with a targeted page header for the new code.  
But do understand having to understand/translate/deal with two page 
header types might not be worth the savings in space.

If we stick with the current page header, of course, changes to pd_flag 
won't be necessary anymore.

Stepping back, it seems like folks are okay with introducing a page 
header to current SLRU components and that we are leaning towards 
sticking with the default one for now. We can proceed with this 
approach, and if needed, change it later if more folks chime in.

Cheers.

-- 
Shawn Debnath
Amazon Web Services (AWS)



Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2022-06-24 22:19:33 +0000, Shawn Debnath wrote:
> On Thu, Jun 23, 2022 at 06:06:48PM -0700, Andres Freund wrote:
>
> > > You are correct that we wouldn’t need to rely on the pd_flag bit to
> > > determine page type for any access to a page where we come top down
> > > following the hierarchy. However, for the purpose of debugging “from the
> > > bottom up” it would be critical to know what type of page is being read in a
> > > system with multiple page header types.
> >
> > That doesn't seem to justify using a bit on the page. Wouldn't it suffice to
> > add such information to the BufferDesc?
>
> The goal for the bit in pd_flags is to help identify what page header
> should be present if one were to be looking at the physical page outside
> of the database. One example that comes to mind is pg_upgrade.  There
> are other use cases where physical consistency checks could be applied,
> again outside of a running database.

Outside the database you'll know the path to the file, which will tell you
it's not another kind of relation.

This really makes no sense to me. We don't have page flags indicating whether
a page is a heap, btree, visibility, fms whatever kind of page either. On a
green field, it'd make sense to have such information in a metapage at the
start of each relation - but not on each page.


> On Thu, Jun 23, 2022 at 04:27:33PM -0400, Robert Haas wrote:
>
> > I think that it's not worth introducing a new page header format to
> > save 10 bytes per page. Keeping things on the same format is likely to
> > save more than the minor waste of space costs.
>
> Yeah, I think we are open to both approaches, though we believe it would
> be cleaner to get started with a targeted page header for the new code.
> But do understand having to understand/translate/deal with two page
> header types might not be worth the savings in space.

Not sure if that changes anything, but it's maybe worth noting that we already
have some types of pages that don't use the full page header (at least
freespace.c, c.f. buffer_std argument to MarkBufferDirtyHint()). I can see an
argument for shrinking the "uniform" part of the page header, and pushing more
things down into AMs. But I think that'd need to change the existing code, not
just introduce something new for new code.


> Stepping back, it seems like folks are okay with introducing a page
> header to current SLRU components and that we are leaning towards
> sticking with the default one for now. We can proceed with this
> approach, and if needed, change it later if more folks chime in.

I think we're clearly going to have to do that at some point not too far
away. There's just too many capabilities that are made harder by not having
that information for SLRU pages. That's not to say that it's a prerequisite to
moving SLRUs into the buffer pool (using a hack like Thomas did until the page
header is introduced). Both are complicated enough projects on their own. I
also could see adding the page header before moving SLRUs in the buffer pool,
there isn't really a hard dependency.

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
Shawn Debnath
Date:
On Fri, Jun 24, 2022 at 03:45:34PM -0700, Andres Freund wrote:

> Outside the database you'll know the path to the file, which will tell you
> it's not another kind of relation.
> 
> This really makes no sense to me. We don't have page flags indicating whether
> a page is a heap, btree, visibility, fms whatever kind of page either. On a
> green field, it'd make sense to have such information in a metapage at the
> start of each relation - but not on each page.

Alright, I agree with the metapage having the necessary info. In
this case, we can rely on the hierarchy to determine the type of header.
Given we do not have an usecase requiring the flag, we should not
consume it at this point.


> > On Thu, Jun 23, 2022 at 04:27:33PM -0400, Robert Haas wrote:
> >
> > > I think that it's not worth introducing a new page header format to
> > > save 10 bytes per page. Keeping things on the same format is likely to
> > > save more than the minor waste of space costs.
> >
> > Yeah, I think we are open to both approaches, though we believe it would
> > be cleaner to get started with a targeted page header for the new code.
> > But do understand having to understand/translate/deal with two page
> > header types might not be worth the savings in space.
> 
> Not sure if that changes anything, but it's maybe worth noting that we already
> have some types of pages that don't use the full page header (at least
> freespace.c, c.f. buffer_std argument to MarkBufferDirtyHint()). I can see an
> argument for shrinking the "uniform" part of the page header, and pushing more
> things down into AMs. But I think that'd need to change the existing code, not
> just introduce something new for new code.

We did think through a universal page header concept that included just
the pd_lsn, pd_checksum, pd_flags and pulling in pd_pagesize_version and other
fields as the non-uniform members for SLRU. Unfortunately, there is a gap of
48 bits after pd_flags which makes it challenging with today's header.  I am
leaning towards the standard page header for now and revisiting the universal/uniform
page header and AM changes in a separate effort. The push down to AM is an
interesting concept and should be worthwhile following up on.


> > Stepping back, it seems like folks are okay with introducing a page
> > header to current SLRU components and that we are leaning towards
> > sticking with the default one for now. We can proceed with this
> > approach, and if needed, change it later if more folks chime in.
> 
> I think we're clearly going to have to do that at some point not too far
> away. There's just too many capabilities that are made harder by not having
> that information for SLRU pages. That's not to say that it's a prerequisite to
> moving SLRUs into the buffer pool (using a hack like Thomas did until the page
> header is introduced). Both are complicated enough projects on their own. I
> also could see adding the page header before moving SLRUs in the buffer pool,
> there isn't really a hard dependency.

To be honest, given the nature of changes, I would prefer to have it
done in one major version release than have it be split into multiple
efforts. The value add really comes in from the consistency checks that
can be done and which are non-existent for SLRU today. 





Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Hi, 

We have been working on adding page headers to the SLRU pages, as part of the migration for SLRU to buffer cache. We’ve
incorporatedThomas Munro’s patch and Heikki’s Storage manager changes[1] and have a patch for early feedback. 
 

As part of our changes we have:

1. Added page headers to the following
  
               *Commit_TS
      *CLOG
      *MultiXact
      *Subtrans
      *Serial (predicate.c)
      *Notify (async.c)

For commit_ts, clog and MultiXact, the PageXLogRecPtr field is populated with the LSN returned during the creation of a
newpage; as there is no WAL record for the rest, PageXLogRecPtr is set to “InvalidXlogRecPtr”.
 

There is one failing assert in predicate.c for SerialPagePrecedes with the page header changes; we are looking into
thisissue.
 

The page_version is set to PG_METAPAGE_LAYOUT_VERSION (which is 1)


2. Change block number passed into ReadSlruBuffer from relative to absolute, and account for SLRU’s 256kb segment size
inmd.c.
 



The changes pass the regression tests. We are still working on handling the upgrade scenario and should have a patch
outfor that soon.
 

Attached is the patch with all changes (Heikki and Munro’s patch and page headers) consolidated 


Thanks,
Rishu Bagga, Amazon Web Services (AWS)

[1] https://www.postgresql.org/message-id/128709bc-992c-b57a-7174-098433b7faa4@iki.fi

[2] https://www.postgresql.org/message-id/CA+hUKG+02ZF-vjtUG4pH8bx+2Dn=eMh8GsT6jasiXZPgVxUXLw@mail.gmail.com







On 9/27/22, 6:54 PM, "Bagga, Rishu" <bagrishu@amazon.com> wrote:

    Hi all,

    PostgreSQL currently maintains several data structures in the SLRU
    cache. The SLRU cache has scaling and sizing challenges because of it’s
    simple implementation. The goal is to move these caches to the common
    buffer cache to benefit from the stronger capabilities of the common
    buffercache code. At AWS, we are building on the patch shared by Thomas
    Munro [1], which treats the SLRU pages as part of a pseudo-databatabe
    of ID 9. We will refer to the pages belonging to SLRU components as
    BufferedObject pages going forward.

    The current SLRU pages do not have any header, so there is a need to
    create a new page header format for these. Our investigations revealed
    that we need to:

    1. track LSN to ensure durability and consistency of all pages (for redo
       and full page write purposes)
    2. have a checksum (for page correctness verification).
    3. A flag to identify if the page is a relational or BufferedObject
    4. Track version information.

    We are suggesting a minimal BufferedObject page header
    to be the following, overlapping with the key fields near the beginning
    of the regular PageHeaderData:

    typedef struct BufferedObjectPageHeaderData
    {
        PageXLogRecPtr pd_lsn;
        uint16_t       pd_checksum;
        uint16_t       pd_flags;
        uint16_t       pd_pagesize_version;
    } BufferedObjectPageHeaderData;

    For reference, the regular page header looks like the following:
    typedef struct PageHeaderData
    {
        PageXLogRecPtr    pd_lsn;
        uint16_t    pd_checksum;
        uint16_t    pd_flags;
        LocationIndex   pd_lower;
        LocationIndex   pd_upper;
        LocationIndex   pd_special;
        uint16_t           pd_pagesize_version;
        TransactionId   pd_prune_xid;
        ItemIdDataCommon  pd_linp[];
    } PageHeaderData;

    After careful review, we have trimmed out the heap and index specific
    fields from the suggested header that do not add any value to SLRU
    components. We plan to use pd_lsn, pd_checksum, and pd_pagesize_version
    in the same way that they are in relational pages. These fields are
    needed to ensure consistency, durability and page correctness.

    We will use the 4th bit of pd_flags to identify a BufferedObject page.
    If the bit is set then this denotes a BufferedObject page. Today, bits
    1 - 3 are used for determining if there are any free line pointers, if
    the page is full, and if all tuples on the page are visible to
    everyone, respectively. We will use this information accordingly in the
    storage manager to determine which callback functions to use for file
    I/O operations. This approach allows the buffercache to have an
    universal method to quickly determine what type of page it is dealing
    with at any time.

    Using the new BufferedObject page header will be space efficient but
    introduces a significant change in the codebase to now track two types
    of page header data. During upgrade, all SLRU files that exist on the
    system must be converted to the new format with page header. This will
    require rewriting all the SLRU pages with the page header as part of
    pg_upgrade.

    We believe that this is the correct approach for the long run. We would
    love feedback if there are additional items of data that should be
    tracked as well. Alternatively, we could re-use the existing page
    header and the unused fields could be used as a padding. This feels
    like an unclean approach but would avoid having two page header types
    in the database.



    [1] -
https://www.postgresql.org/message-id/flat/CA+hUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv+bFxghNgdieq8Q@mail.gmail.com



    Discussed with: Joe Conway, Nathan Bossart, Shawn Debnath


    Rishu Bagga

    Amazon Web Services (AWS)





Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Ian Lawrence Barwick
Date:
2022年9月28日(水) 10:57 Bagga, Rishu <bagrishu@amazon.com>:
>
> Hi,
>
> We have been working on adding page headers to the SLRU pages, as part of the migration for SLRU to buffer cache.
We’veincorporated Thomas Munro’s patch and Heikki’s Storage manager changes[1] and have a patch for early feedback. 
>
> As part of our changes we have:
>
> 1. Added page headers to the following
>
>                *Commit_TS
>         *CLOG
>         *MultiXact
>         *Subtrans
>         *Serial (predicate.c)
>         *Notify (async.c)
>
> For commit_ts, clog and MultiXact, the PageXLogRecPtr field is populated with the LSN returned during the creation of
anew page; as there is no WAL record for the rest, PageXLogRecPtr is set to “InvalidXlogRecPtr”. 
>
> There is one failing assert in predicate.c for SerialPagePrecedes with the page header changes; we are looking into
thisissue. 
>
> The page_version is set to PG_METAPAGE_LAYOUT_VERSION (which is 1)
>
>
> 2. Change block number passed into ReadSlruBuffer from relative to absolute, and account for SLRU’s 256kb segment
sizein md.c. 
>
>
>
> The changes pass the regression tests. We are still working on handling the upgrade scenario and should have a patch
outfor that soon. 
>
> Attached is the patch with all changes (Heikki and Munro’s patch and page headers) consolidated

Hi

cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Thanks

Ian Barwick



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Hi Heikki and Thomas,


I’ve reworked your patches for moving SLRUs to the buffer cache to add page headers to the SLRUs. Additionally, I’ve
rebasedthis patch on top of the latest commit.
 


Changes in this patch include:

1. page headers on SLRU pages
2. pg_upgrade support to add page headers to existing on-disk SLRU pages
3. a new ReadBuffer mode RBM_TRIM for TrimCLOG and TrimMultiXact
4. Removing concept of External LSNs introduced in Heikki’s patch, as page headers can now store LSNs normally for
SLRUs.¬¬¬¬¬
5. Addressed Serial SLRU asserts in previous patch

We still need to fix asserts triggered from memory allocation in the critical section in Munro’s patch in
RecordNewMultiXact.
 

Currently, in GetNewMultiXact we enter the critical section, and end only after we finish our write, after doing
RecordNewMultiXactin MultiXactIdCreateFromMembers. Now that we’re making ReadBuffer calls in RecordNewMultiXact, we
allocatememory in the critical section, but this isn’t allowed.
 

For now, to avoid triggering asserts, I moved the end of the critical section before doing ReadBuffer calls, but this
couldcause potential data corruption for multixacts in the case ReadBuffer fails. 
 

A potential fix for this issue is to hold on to MultiXactGenLock until we successfully read and write to the buffer, to
ensureno but this would cause ReadBuffer to become a bottleneck as no other backends could access the MultiXact state
data.

We should figure out a way to allow ReadBuffer calls in critical sections specifically for multixacts, as the current
behavioris to panic when multixact data write operations fail. 
 

I would appreciate your thoughts on how we could proceed here.


P.S, Ian, thanks for reminding me to rebase the patch!

Sincerely,

Rishu Bagga, 

Amazon Web Services (AWS)


Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:

Hi,

 

 

Rebased and updated a new patch addressing the critical section issue in
RecordNewMultliXact.In GetNewMultiXactId, we now make our ReadBuffer
calls before starting the critical section, but while holding the
MultiXactGenLock, so we always fetch the correct buffers. We store them
in an array that is accessed later in RecordNewMultiXact.
This way we can keep the existing functionality of only holding the MultiXactGenLock while reading in buffers, but can let go when we are writing,
to preserve the existing concurrency paradigm.

 

 

Let me know your thoughts on this approach.

 

 

 

Sincerely,

 

Rishu Bagga, Amazon Web Services (AWS)

 

 

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
vignesh C
Date:
On Fri, 16 Dec 2022 at 04:47, Bagga, Rishu <bagrishu@amazon.com> wrote:
> Rebased and updated a new patch addressing the critical section issue in
> RecordNewMultliXact.In GetNewMultiXactId, we now make our ReadBuffer
> calls before starting the critical section, but while holding the
> MultiXactGenLock, so we always fetch the correct buffers. We store them
> in an array that is accessed later in RecordNewMultiXact.
> This way we can keep the existing functionality of only holding the MultiXactGenLock while reading in buffers, but
canlet go when we are writing,
 
> to preserve the existing concurrency paradigm.
> Let me know your thoughts on this approach.
>

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
92957ed98c5c565362ce665266132a7f08f6b0c0 ===
=== applying patch ./slru_to_buffer_cache_with_page_headers_v3.patch
...
patching file src/include/catalog/catversion.h
Hunk #1 FAILED at 57.
1 out of 1 hunk FAILED -- saving rejects to file
src/include/catalog/catversion.h.rej

[1] - http://cfbot.cputube.org/patch_41_3514.log

Regards,
Vignesh



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
>On 1/3/23, 5:06 AM, "vignesh C" <vignesh21@gmail.com> wrote:
>>On Fri, 16 Dec 2022 at 04:47, Bagga, Rishu <bagrishu@amazon.com> 
>>wrote:
>>Rebased and updated a new patch addressing the critical section issue in
>>RecordNewMultliXact.In GetNewMultiXactId, we now make our ReadBuffer
>>calls before starting the critical section, but while holding the
>>MultiXactGenLock, so we always fetch the correct buffers. We store 
>>them in an array that is accessed later in RecordNewMultiXact.
>>This way we can keep the existing functionality of only holding the 
>>MultiXactGenLock while reading in buffers, but can let go when we are 
>>writing, to preserve the existing concurrency paradigm.
>>Let me know your thoughts on this approach.


> The patch does not apply on top of HEAD as in [1], please post a 
>rebased patch:
>=== Applying patches on top of PostgreSQL commit ID
>92957ed98c5c565362ce665266132a7f08f6b0c0 ===
>=== applying patch ./slru_to_buffer_cache_with_page_headers_v3.patch
...
>patching file src/include/catalog/catversion.h
>Hunk #1 FAILED at 57.
>1 out of 1 hunk FAILED -- saving rejects to file
>src/include/catalog/catversion.h.rej

>[1] - http://cfbot.cputube.org/patch _41_3514.log

>Regards,
>Vignesh

Hi all,

Rebased the patch, and fixed a bug I introduced in the previous patch in 
TrimCLOG. 

We ran a quick set of pgbench tests and observed no regressions. Here 
are the numbers:

3 trials, with scale 10,000, 350 clients, 350 threads, over 30 minutes:

Median TPS:

Control

Trial 1: 58331.0
Trial 2: 57191.0
Trial 3: 57101.3

Average of Medians: 57541.1

SLRUs to BufferCache + Page Headers:

Trial 1: 62605.0
Trial 2: 62891.2
Trial 3: 59906.3

Average of Medians: 61800.8

Machine Specs:

Driver

Instance: m5d.metal
Architecture x86_64
CPUs: 96
RAM: 384 GiB
OS: Amazon Linux 2


Server

Instance: r5dn.metal
Architecture x86_64
CPUs: 64
RAM: 500GiB
OS: Amazon Linux 2


Looking forward to your feedback on this.

Sincerely,
Rishu Bagga, Amazon Web Services (AWS)






Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
> Hi all,

> Rebased the patch, and fixed a bug I introduced in the previous patch 
> in 
> TrimCLOG. 

> Looking forward to your feedback on this.


Hi all, 

Rebased patch as per latest community changes since last email. 


Sincerely,

Rishu Bagga, Amazon Web Services (AWS)



Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

> From 098f37c0a17fc32a94bff43817414e01fcfb234f Mon Sep 17 00:00:00 2001
> From: Rishu Bagga <bagrishu@amazon.com>
> Date: Thu, 15 Sep 2022 00:55:25 +0000
> Subject: [PATCH] slru to buffercache + page headers + upgrade
> 
> ---
>  contrib/amcheck/verify_nbtree.c             |    2 +-
>  [...]
>  65 files changed, 2176 insertions(+), 3258 deletions(-)

Unfortunately a quite large patch, with this little explanation, is hard to
review. I could read through the entire thread to try to figure out what this
is doing, but I really shouldn't have to.

You're changing quite fundamental APIs across the tree. Why is that required
for the topic at hand? Why is it worth doing that, given it'll break loads of
extensions?

Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2023-02-06 19:12:47 +0000, Bagga, Rishu wrote:
> Rebased patch as per latest community changes since last email. 

This version doesn't actually build.

https://cirrus-ci.com/task/4512310190931968

[19:43:20.131] FAILED: src/test/modules/test_slru/test_slru.so.p/test_slru.c.o 
[19:43:20.131] ccache cc -Isrc/test/modules/test_slru/test_slru.so.p -Isrc/include -I../src/include
-Isrc/include/catalog-Isrc/include/nodes -Isrc/include/utils -Isrc/include/storage -fdiagnostics-color=always -pipe
-D_FILE_OFFSET_BITS=64-Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes-Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type-Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation-fPIC -pthread -fvisibility=hidden -MD -MQ
src/test/modules/test_slru/test_slru.so.p/test_slru.c.o-MF src/test/modules/test_slru/test_slru.so.p/test_slru.c.o.d -o
src/test/modules/test_slru/test_slru.so.p/test_slru.c.o-c ../src/test/modules/test_slru/test_slru.c
 
[19:43:20.131] ../src/test/modules/test_slru/test_slru.c:47:8: error: unknown type name ‘SlruCtlData’
[19:43:20.131]    47 | static SlruCtlData TestSlruCtlData;
[19:43:20.131]       |        ^~~~~~~~~~~
[19:43:20.131] ../src/test/modules/test_slru/test_slru.c:57:19: error: unknown type name ‘SlruCtl’
[19:43:20.131]    57 | test_slru_scan_cb(SlruCtl ctl, char *filename, int segpage, void *data)
[19:43:20.131]       |                   ^~~~~~~

...

Andres



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
> Unfortunately a quite large patch, with this little explanation, is
> hard to review. I could read through the entire thread to try to 
> figure out what this is doing, but I really shouldn't have to.
 
> You're changing quite fundamental APIs across the tree. Why is that 
> required for the topic at hand? Why is it worth doing that, given 
> it'll break loads of extensions?
 
Hi Andres, 

Thanks for your response.

To summarize, our underlying effort is to move the SLRUs to the buffer 
cache. We were working with Thomas Munro off a patch he introduced here
[1]. Munro’s patch moves SLRUs to the buffer cache by introducing the
pseudo db id 9 to denote SLRU pages, but maintains the current “raw”
data format of SLRU pages. The addition of page headers in our patch
resolves this issue [2] Munro mentions in this email [3]. 

Heikki Linnakangas then introduced  patch on top of Munro’s patch that 
modularizes the storage manager, allowing SLRUs to use it [4]. Instead
of using db id 9, SLRUs use spcOid 9, and each SLRU is its own relation.
Here, Heikki simplifies the storage manager by having each struct be 
responsible for just one fork of a relation; thus increasing
extensibility of the smgr API, including for SLRUs. [5] We integrated
our changes introducing page headers for SLRU pages, and upgrade logic
to Heikki’s latest patch. 
 
> > Rebased patch as per latest community changes since last email.

> This version doesn't actually build.
> https://cirrus-ci.com/task/4512310190931968
> [19:43:20.131] FAILED: 
> src/test/modules/test_slru/test_slru.so.p/test_slru.c.o


As for the build failures, I was using make install to build, and make
check for regression tests. The build failures in this link  are from
unit tests dependent on custom SLRUs, which would no longer apply. I’ve
attached another patch that removes these tests.

[1] https://www.postgresql.org/message
-id/CA%2BhUKGKCkbtOutcz5M8Z%3D0pgAkwdiR57Lxk7803rGsgiBNST6w%40mail.gmail
.com

[2] “I needed to disable checksums and in-page LSNs, since SLRU pages
hold raw data with no header. We'd probably eventually want regular
(standard? formatted?) pages (the real work here may be implementing
FPI for SLRUs so that checksums don't break your database on torn
writes). ” 

[3] https://www.postgresql.org/message-id/CA%2BhUKGKAYze99B-jk9NoMp-
2BDqAgiRC4oJv%2BbFxghNgdieq8Q%40mail.gmail.com

[4] https://www.postgresql.org/message-id/flat/128709bc-992c-b57a-7174-
098433b7faa4%40iki.fi#a78d6250327795e95b02e9305e2d153e

[5] “I think things would be more clear if we 
unbundled the forks at the SMGR level, so that we would have a separate 
SMgrRelation struct for each fork. And let's rename it to SMgrFile to 
make the role more clear. I think that would reduce the confusion when 
we start using it for SLRUs; an SLRU is not a relation, after all. md.c 
would still segment each logical file into 1 GB segments, but it would 
not need to deal with forks.”






Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Andres Freund
Date:
Hi,

On 2023-02-08 20:04:52 +0000, Bagga, Rishu wrote:
> To summarize, our underlying effort is to move the SLRUs to the buffer
> cache. We were working with Thomas Munro off a patch he introduced here
> [1]. Munro’s patch moves SLRUs to the buffer cache by introducing the
> pseudo db id 9 to denote SLRU pages, but maintains the current “raw”
> data format of SLRU pages. The addition of page headers in our patch
> resolves this issue [2] Munro mentions in this email [3].
>
> Heikki Linnakangas then introduced  patch on top of Munro’s patch that
> modularizes the storage manager, allowing SLRUs to use it [4]. Instead
> of using db id 9, SLRUs use spcOid 9, and each SLRU is its own relation.
> Here, Heikki simplifies the storage manager by having each struct be
> responsible for just one fork of a relation; thus increasing
> extensibility of the smgr API, including for SLRUs. [5] We integrated
> our changes introducing page headers for SLRU pages, and upgrade logic
> to Heikki’s latest patch.

That doesn't explain the bulk of the changes here.  Why e.g. does any of the
above require RelationGetSmgr() to handle the fork as well? Why do we need
smgrtruncate_multi()? And why does all of this happens as one patch?

As is, with a lot of changes mushed together, without distinct explanations
for why is what done, this patch is essentially unreviewable. It'll not make
progress in this form.

It doesn't help that much to reference prior discussions in the email I'm
responding to - the patches need to be mostly understandable on their own,
without reading several threads.  And there needs to be explanations in the
code as well, otherwise we'll have no chance to understand any of this in a
few years.


> > > Rebased patch as per latest community changes since last email.
>
> > This version doesn't actually build.
> > https://cirrus-ci.com/task/4512310190931968
> > [19:43:20.131] FAILED:
> > src/test/modules/test_slru/test_slru.so.p/test_slru.c.o
>
>
> As for the build failures, I was using make install to build, and make
> check for regression tests.

There's a *lot* more tests than the main regression tests. You need to make
sure that they all continue to work. Enable tap tests and se check-world.


> The build failures in this link are from unit tests dependent on custom
> SLRUs, which would no longer apply. I’ve attached another patch that removes
> these tests.

I think you'd need to fix those tests, rather than just removing them.

I suspect we're going to continue to want SLRU specific stats, but your patch
also seems to remove those.


Greetings,

Andres Freund



Re: SLRUs in the main buffer pool - Page Header definitions

From
Heikki Linnakangas
Date:
On 08/02/2023 22:26, Andres Freund wrote:
> On 2023-02-08 20:04:52 +0000, Bagga, Rishu wrote:
>> To summarize, our underlying effort is to move the SLRUs to the buffer
>> cache. We were working with Thomas Munro off a patch he introduced here
>> [1]. Munro’s patch moves SLRUs to the buffer cache by introducing the
>> pseudo db id 9 to denote SLRU pages, but maintains the current “raw”
>> data format of SLRU pages. The addition of page headers in our patch
>> resolves this issue [2] Munro mentions in this email [3].
>>
>> Heikki Linnakangas then introduced  patch on top of Munro’s patch that
>> modularizes the storage manager, allowing SLRUs to use it [4]. Instead
>> of using db id 9, SLRUs use spcOid 9, and each SLRU is its own relation.
>> Here, Heikki simplifies the storage manager by having each struct be
>> responsible for just one fork of a relation; thus increasing
>> extensibility of the smgr API, including for SLRUs. [5] We integrated
>> our changes introducing page headers for SLRU pages, and upgrade logic
>> to Heikki’s latest patch.
> 
> That doesn't explain the bulk of the changes here.  Why e.g. does any of the
> above require RelationGetSmgr() to handle the fork as well? Why do we need
> smgrtruncate_multi()? And why does all of this happens as one patch?
> 
> As is, with a lot of changes mushed together, without distinct explanations
> for why is what done, this patch is essentially unreviewable. It'll not make
> progress in this form.
> 
> It doesn't help that much to reference prior discussions in the email I'm
> responding to - the patches need to be mostly understandable on their own,
> without reading several threads.  And there needs to be explanations in the
> code as well, otherwise we'll have no chance to understand any of this in a
> few years.

Agreed. I rebased this over my rebased patch set from the other thread 
at 
https://www.postgresql.org/message-id/02825393-615a-ac81-0f05-f3cc2e6f875f%40iki.fi. 
Attached is a new patch with only the changes relative to that patch set.

This is still messy, but now I can see what the point is: make the 
SLRUs, which are tracked in the main buffer pool thanks to the other 
patches, use the standard page header.

I'm not sure if I like that or not. I think we should clean up and 
finish the other patches that this builds on first, and then decide if 
we want to use the standard page header for the SLRUs or not. And if we 
decide that we want the SLRU pages to have a page header, clean this up 
or rewrite it from scratch.

- Heikki

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Robert Haas
Date:
On Mon, Feb 27, 2023 at 8:56 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm not sure if I like that or not. I think we should clean up and
> finish the other patches that this builds on first, and then decide if
> we want to use the standard page header for the SLRUs or not. And if we
> decide that we want the SLRU pages to have a page header, clean this up
> or rewrite it from scratch.

I'm not entirely sure either, but I think the idea has some potential.
If SLRU pages have headers, that means that they have LSNs, and
perhaps then we could use those LSNs to figure out when they're safe
to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
group_lsn field.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: SLRUs in the main buffer pool - Page Header definitions

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Feb 27, 2023 at 8:56 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > I'm not sure if I like that or not. I think we should clean up and
> > finish the other patches that this builds on first, and then decide if
> > we want to use the standard page header for the SLRUs or not. And if we
> > decide that we want the SLRU pages to have a page header, clean this up
> > or rewrite it from scratch.
>
> I'm not entirely sure either, but I think the idea has some potential.
> If SLRU pages have headers, that means that they have LSNs, and
> perhaps then we could use those LSNs to figure out when they're safe
> to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
> group_lsn field.

I agree that it's got potential and seems like the right direction to go
in.  That would also allow for checksums for SLRUs and possibly support
for encryption which leverages the LSN and for a dynamic page feature
area which could allow for an extended checksum or perhaps authenticated
encryption with additonal authenticated data.

Thanks,

Stephen

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
>I think you'd need to fix those tests, rather than just removing them.
>I suspect we're going to continue to want SLRU specific stats, but 
>your patch also seems to remove those.

>This is still messy, but now I can see what the point is: make the 
>SLRUs, which are tracked in the main buffer pool thanks to the other 
>patches, use the standard page header.

>I'm not sure if I like that or not. I think we should clean up and 
>finish the other patches that this builds on first, and then decide if 
>we want to use the standard page header for the SLRUs or not. And if 
>we decide that we want the SLRU pages to have a page header, clean 
>this up or rewrite it from scratch.

>I'm not entirely sure either, but I think the idea has some potential.
>If SLRU pages have headers, that means that they have LSNs, and
>perhaps then we could use those LSNs to figure out when they're safe
>to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
>group_lsn field.

>I agree that it's got potential and seems like the right direction to 
>go in. That would also allow for checksums for SLRUs and possibly 
>support for encryption which leverages the LSN and for a dynamic page 
>feature area which could allow for an extended checksum or perhaps 
>authenticated encryption with additonal authenticated data.



Hi all, 

Thank you for the feedback on the last patch. I have prepared a new set 
of patches here that aim to address these concerns; cleaning the patch 
up, and breaking them into smaller parts. Each functional change is 
broken into its own patch. To keep the change as minimal as possible, I 
have removed the tangential changes from Heikki Linnakangas’ patch 
modifying the storage manager, and kept the changes limited to moving 
SLRU components to buffer cache, and page header changes. 

The first patch is the original patch that moves some of the SLRUs to 
the buffercache, which Thomas Munro posted in [1], rebased to the latest 
commit.

The second patch is a patch which fixes problems with trying to allocate 
memory in critical sections in the commit log, and multixacts in Munro’s 
patch. In Munro’s patch - there are three places where we may need to 
allocate memory in a Critical Section, that I have addressed.

1. When recording a transaction status, we may need to allocate memory 
for a buffer pin to bring the clog page into the buffer cache. I added a 
call to ResourceOwnerEnlargeBuffers before entering the critical section 
to resolve this issue.

2. When recording a transaction status, we may need to allocate memory 
for an entry in the storage manager hash table for the commit log in 
smgropen. I added a function “VerifyClogLocatorInHashTable” which forces 
an smgropen call that does this if needed. This function is called 
before entering the Critical Section.

3. When recording a multixact, we enter a critical section while writing 
the multixact members. Now that the multlixact pages are stored in the 
buffer cache, we may need to allocate memory here to retrieve a buffer 
page. I modified GetNewMultiXactId to also prefetch all the buffers we 
will need before we enter critical section, so that we do not need to 
make ReadBuffer calls while in the multixact critical section. 


The third patch brings back the the SLRU control structure, to keep it 
as an extensible feature for now, and renames the handler for the 
components we are moving into the buffercache to NREL (short for non 
relational). nrel.c is essentially a copy of Munro’s modified slru.c, 
and I have restored the original slru.c. This allows for existing 
extensions utilizing SLRUs to keep working, and the test_slru unit tests 
to pass, as well as introducing a more accurate name for the handling of 
components (CLOG, Multixact Offsets/Members, Async, Serial, Subtrans) 
that are no longer handled by an SLRU, but are still non relational 
components. To address Andres’s concern - I modified the slru stats test 
code to still track all these current components and maintain the 
behavior, and confirmed as those tests pass as well.


The fourth patch adds the page headers to these Non Relational (NREL) 
components, and provides the upgrade story to rewrite the clog and 
multixact files with page headers across upgrades.

With the changes from all four patches, they pass all tests with make 
installcheck-world, as well as test_slru.


I hope these patches are easier to read and review, and would appreciate 
any feedback. 


[1] https://www.postgresql.org/message-id/flat/CA+hUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv+bFxghNgdieq8Q@mail.gmail.com



Rishu Bagga,

Amazon Web Services (AWS)


Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Stephen Frost
Date:
Greetings,

[snipped quoted bits]

Would really be helpful to keep who the author of each quoted snipper
was when you quote them; dropping that makes it look like one person
wrote all of them and that's confusing.

* Bagga, Rishu (bagrishu@amazon.com) wrote:
> The third patch brings back the the SLRU control structure, to keep it
> as an extensible feature for now, and renames the handler for the
> components we are moving into the buffercache to NREL (short for non
> relational). nrel.c is essentially a copy of Munro’s modified slru.c,
> and I have restored the original slru.c. This allows for existing
> extensions utilizing SLRUs to keep working, and the test_slru unit tests
> to pass, as well as introducing a more accurate name for the handling of
> components (CLOG, Multixact Offsets/Members, Async, Serial, Subtrans)
> that are no longer handled by an SLRU, but are still non relational
> components. To address Andres’s concern - I modified the slru stats test
> code to still track all these current components and maintain the
> behavior, and confirmed as those tests pass as well.

Haven't really looked over the patches yet but I wanted to push back on
this a bit- you're suggesting that we'd continue to maintain and update
slru.c for the benefit of extensions which use it while none of the core
code uses it?  For how long?  For my 2c, at least, I'd rather we tell
extension authors that they need to update their code instead.  There's
reasons why we're moving the SLRUs into the main buffer pool and having
page headers for them and using the existing page code to read/write
them and extension authors should be eager to gain those advantages too.
Not sure how much concern to place on extensions that aren't willing to
adjust to changes like these.

> The fourth patch adds the page headers to these Non Relational (NREL)
> components, and provides the upgrade story to rewrite the clog and
> multixact files with page headers across upgrades.

Nice.

> With the changes from all four patches, they pass all tests with make
> installcheck-world, as well as test_slru.

Awesome, will try to take a look soon.

Thanks,

Stephen

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
* Frost, Stephen (sfrowt(at)snowman(dot)net) wrote:

> Haven't really looked over the patches yet but I wanted to push back 
> on this a bit- you're suggesting that we'd continue to maintain and 
> update slru.c for the benefit of extensions which use it while none of 
> the core code uses it?  For how long?  For my 2c, at least, I'd rather 
> we tell extension authors that they need to update their code instead.  
> There's reasons why we're moving the SLRUs into the main buffer pool 
> and having page headers for them and using the existing page code to 
> read/write them and extension authors should be eager to gain those 
> advantages too. Not sure how much concern to place on extensions that
> aren't willing to adjust to changes like these.


Hi Stephen,

Thanks for your response. I proposed this version of the patch with the
idea to make the changes gradual, and to minimize disruption of existing
functionality, with the idea of eventually deprecating the SLRUs. If the
community is okay with completely removing the extensible SLRU
mechanism, we don't have any objection to it either.


On another note, I have also attached an updated version of the last
patch-set which is applicable on the latest commits.

Sincerely, 

Rishu Bagga, Amazon Web Services (AWS)


Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Nathan Bossart
Date:
On Fri, Aug 18, 2023 at 08:12:41AM +0000, Bagga, Rishu wrote:
> * Frost, Stephen (sfrowt(at)snowman(dot)net) wrote:
>> Haven't really looked over the patches yet but I wanted to push back 
>> on this a bit- you're suggesting that we'd continue to maintain and 
>> update slru.c for the benefit of extensions which use it while none of 
>> the core code uses it?  For how long?  For my 2c, at least, I'd rather 
>> we tell extension authors that they need to update their code instead.  
>> There's reasons why we're moving the SLRUs into the main buffer pool 
>> and having page headers for them and using the existing page code to 
>> read/write them and extension authors should be eager to gain those 
>> advantages too. Not sure how much concern to place on extensions that
>> aren't willing to adjust to changes like these.
> 
> Thanks for your response. I proposed this version of the patch with the
> idea to make the changes gradual, and to minimize disruption of existing
> functionality, with the idea of eventually deprecating the SLRUs. If the
> community is okay with completely removing the extensible SLRU
> mechanism, we don't have any objection to it either.

I think I agree with Stephen.  We routinely make changes that require
updates to extensions, and I doubt anyone is terribly wild about
maintaining two SLRU systems for several years.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: SLRUs in the main buffer pool - Page Header definitions

From
Robert Haas
Date:
On Fri, Aug 18, 2023 at 12:15 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> I think I agree with Stephen.  We routinely make changes that require
> updates to extensions, and I doubt anyone is terribly wild about
> maintaining two SLRU systems for several years.

Yeah, maintaining two systems doesn't sound like a good idea.

However, this would be a big change. I'm not sure how we validate a
change of this magnitude. There are both correctness and performance
considerations. I saw there had been a few performance results on the
thread from Thomas that spawned this thread; but I guess we'd want to
do more research. One question is: how do you decide how many buffers
to use for each SLRU, and how many to leave available for regular
data?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: SLRUs in the main buffer pool - Page Header definitions

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Aug 18, 2023 at 12:15 PM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
> > I think I agree with Stephen.  We routinely make changes that require
> > updates to extensions, and I doubt anyone is terribly wild about
> > maintaining two SLRU systems for several years.
>
> Yeah, maintaining two systems doesn't sound like a good idea.
>
> However, this would be a big change. I'm not sure how we validate a
> change of this magnitude. There are both correctness and performance
> considerations. I saw there had been a few performance results on the
> thread from Thomas that spawned this thread; but I guess we'd want to
> do more research. One question is: how do you decide how many buffers
> to use for each SLRU, and how many to leave available for regular
> data?

Agreed that we'd certainly want to make sure it's all correct and to do
performance testing but in terms of how many buffers... isn't much of
the point of this that we have data in memory because it's being used
and if it's not then we evict it?  That is, I wouldn't think we'd have
set parts of the buffer pool for SLRUs vs. regular data but would let
the actual demand drive what pages are in the cache and I thought that
was part of this proposal and part of the reasoning behind making this
change.

There's certainly an argument to be made that our current cache
management strategy doesn't account very well for the value of pages
(eg: btree root pages vs. random heap pages, perhaps) and that'd
certainly be a good thing to improve on, but that's independent of this.
If it's not, then that's certainly moving the goal posts a very long way
in terms of this effort.

Thanks,

Stephen

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Aleksander Alekseev
Date:
Hi,

> Agreed that we'd certainly want to make sure it's all correct and to do
> performance testing but in terms of how many buffers... isn't much of
> the point of this that we have data in memory because it's being used
> and if it's not then we evict it?  That is, I wouldn't think we'd have
> set parts of the buffer pool for SLRUs vs. regular data but would let
> the actual demand drive what pages are in the cache and I thought that
> was part of this proposal and part of the reasoning behind making this
> change.
>
> There's certainly an argument to be made that our current cache
> management strategy doesn't account very well for the value of pages
> (eg: btree root pages vs. random heap pages, perhaps) and that'd
> certainly be a good thing to improve on, but that's independent of this.
> If it's not, then that's certainly moving the goal posts a very long way
> in terms of this effort.

During the triage of the patches submitted for the September CF [1] I
noticed that the corresponding CF entry [2] has two threads linked.
Only the first thread was used by CF bot [3], also Heikki and Thomas
were listed as the authors. The patch from the first thread rotted and
was not updated for some time which resulted in marking the patch as
RwF for now [4]

It looks like the patch in *this* thread was never registered on the
commitfest and/or tested by CF bot, unless I'm missing something.
Unfortunately it's a bit late to register it for the September CF
especially considering the fact that it doesn't apply at the moment.

This being said, please consider submitting the patch for the upcoming
CF. Also, please consider joining the efforts and having one thread
with a single patchset rather than different threads with different
competing patches. This will simplify the work of the reviewers a lot.

Personally I would suggest taking one step back and agree on a
particular RFC first and then continue working on a single patchset
according to this RFC. We did it in the past in similar cases and this
approach proved to be productive.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org
[2]: https://commitfest.postgresql.org/44/3514/
[3]: http://cfbot.cputube.org/
[4]: https://postgr.es/m/CAJ7c6TN%3D1EF1bTA6w8W%3D0e_Bj%2B-jsiHK0ap1uC_ZUGjwu%2B4JGw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev



Re: SLRUs in the main buffer pool - Page Header definitions

From
Robert Haas
Date:
On Thu, Aug 24, 2023 at 3:28 PM Stephen Frost <sfrost@snowman.net> wrote:
> Agreed that we'd certainly want to make sure it's all correct and to do
> performance testing but in terms of how many buffers... isn't much of
> the point of this that we have data in memory because it's being used
> and if it's not then we evict it?  That is, I wouldn't think we'd have
> set parts of the buffer pool for SLRUs vs. regular data but would let
> the actual demand drive what pages are in the cache and I thought that
> was part of this proposal and part of the reasoning behind making this
> change.

I think that it's not quite that simple. In the regular buffer pool,
access to pages is controlled by buffer pins and buffer content locks,
but these mechanisms don't exist in the same way in the SLRU code. But
buffer pins drive usage counts which drive eviction decisions. So if
you move SLRU data into the main buffer pool, you either need to keep
the current locking regime and use some new logic to decide how much
of shared_buffers to bequeath to the SLRU pools, OR you need to make
SLRU access use buffer pins and buffer content locks. If you do the
latter, I think you substantially increase the cost of an uncontended
SLRU buffer access, because you now need to pin the buffer, and and
then take and release the content lock, and then release the pin;
whereas today you can do all that by just taking and release the
SLRU's lwlock. That's more atomic operations, and hence more costly, I
think. But even if not, it could perform terribly if SLRU buffers
become more vulnerable to eviction than they are at present, or
alternatively if they take over too much of the buffer pool and force
other important data out.

SLRUs are a performance hotspot, so even relatively minor changes to
their performance characteristics can, I believe, have pretty
noticeable effects on performance overall.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: SLRUs in the main buffer pool - Page Header definitions

From
"Bagga, Rishu"
Date:
Alekseev, Aleksander (aleksander@timescale.com) wrote:

> It looks like the patch in *this* thread was never registered on the
> commitfest and/or tested by CF bot, unless I'm missing something.
> Unfortunately it's a bit late to register it for the September CF
> especially considering the fact that it doesn't apply at the moment.

> This being said, please consider submitting the patch for the upcoming
> CF. Also, please consider joining the efforts and having one thread
> with a single patchset rather than different threads with different
> competing patches. This will simplify the work of the reviewers a lot.


Hi Aleksander,

Thank you for letting me know about this. I’ll follow up on the original 
thread within the next couple weeks with a new and updated patch for the 
next commitfest.


Sincerely,

Rishu Bagga, Amazon Web Services (AWS)



Re: SLRUs in the main buffer pool - Page Header definitions

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Aug 24, 2023 at 3:28 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Agreed that we'd certainly want to make sure it's all correct and to do
> > performance testing but in terms of how many buffers... isn't much of
> > the point of this that we have data in memory because it's being used
> > and if it's not then we evict it?  That is, I wouldn't think we'd have
> > set parts of the buffer pool for SLRUs vs. regular data but would let
> > the actual demand drive what pages are in the cache and I thought that
> > was part of this proposal and part of the reasoning behind making this
> > change.
>
> I think that it's not quite that simple. In the regular buffer pool,
> access to pages is controlled by buffer pins and buffer content locks,
> but these mechanisms don't exist in the same way in the SLRU code. But
> buffer pins drive usage counts which drive eviction decisions. So if
> you move SLRU data into the main buffer pool, you either need to keep
> the current locking regime and use some new logic to decide how much
> of shared_buffers to bequeath to the SLRU pools, OR you need to make
> SLRU access use buffer pins and buffer content locks. If you do the
> latter, I think you substantially increase the cost of an uncontended
> SLRU buffer access, because you now need to pin the buffer, and and
> then take and release the content lock, and then release the pin;
> whereas today you can do all that by just taking and release the
> SLRU's lwlock. That's more atomic operations, and hence more costly, I
> think. But even if not, it could perform terribly if SLRU buffers
> become more vulnerable to eviction than they are at present, or
> alternatively if they take over too much of the buffer pool and force
> other important data out.

An SLRU buffer access does also update the cur_lru_count for the SLRU,
along with the per-page page_lru_count, but those are 32bit and we don't
enforce that they're done in order, so presumably those are less
expensive than the pinning and usage count updates.

This thread started with the issue that our current SLRUs are relatively
small though and simply increasing their size would lead to issues as
we're just doing simple things like a linear search through them all at
times, or at least that's more-or-less what I understood from [1].  More
details on the specific 'scaling and sizing challenges' would be nice to
have.  The current patches were at least claimed to improve performance
while also using ReadBuffer_common [2].  Having an idea as to what is
specifically leading to that would be interesting though with all these
changes likely non-trivial.  pgbench may not be the best way to measure
this, but it's still interesting to see an improvement like that.

Certainly one concern about using the regular buffer pool is that
finding a victim page can be expensive and having that as part of an
SLRU access could be pretty painful.  Though we also have that problem
elsewhere too.

If we're going to effectively segregate the buffer pool into SLRU parts
vs. everything else and then use the existing strategies for SLRUs and
have that be different from what everything else is using ... then
that doesn't seem like it's really changing things.  What would be the
point of moving the SLRUs into the main buffer pool then?

> SLRUs are a performance hotspot, so even relatively minor changes to
> their performance characteristics can, I believe, have pretty
> noticeable effects on performance overall.

Agreed, we certainly need to have a plan for how to performance test
this change and should try to come up with some 'worst case' tests.

Thanks,

Stephen

[1]: https://postgr.es/m/EFAAC0BE-27E9-4186-B925-79B7C696D5AC%40amazon.com
[2]: https://postgr.es/m/A09EAE0D-0D3F-4A34-ADE9-8AC1DCBE7D57%40amazon.com

Attachment

Re: SLRUs in the main buffer pool - Page Header definitions

From
Robert Haas
Date:
On Fri, Sep 8, 2023 at 8:56 AM Stephen Frost <sfrost@snowman.net> wrote:
> If we're going to effectively segregate the buffer pool into SLRU parts
> vs. everything else and then use the existing strategies for SLRUs and
> have that be different from what everything else is using ... then
> that doesn't seem like it's really changing things.  What would be the
> point of moving the SLRUs into the main buffer pool then?

I think that integrating the SLRUs into the same memory allocation as
shared_buffers could potentially be pretty helpful even if the buffers
are managed differently. For instance, each SLRU could evaluate (using
some algorithm) whether it needs more or fewer buffers than it has
currently and either steal more buffers from shared_buffers or give
some back. That does seem less elegant than having everything running
through a single system, but getting to the point where everything
runs through a single system may be difficult, and such an
intermediate state could have a lot of advantages with, perhaps, less
risk of breaking stuff.

As far as I am aware, the optimal amount of memory for any particular
SLRU is almost always quite small compared to shared_buffers, but it
can be quite large compared to what we allocate currently. To do
anything about that, we clearly need to fix the algorithms to scale
better. But even once we've done that, I don't think we want to
allocate the largest amount of clog buffers that anyone could ever
need in all instances, and the same for every other SLRU. That seems
wasteful. We could expose a separate tunable for each one, but that is
a lot for users to tune correctly, and it implies using the same value
for the whole lifetime of the server. Letting the value float around
dynamically would make a lot of sense especially for things like
pg_multixact -- if there's a lot of multixacts, grab some more
buffers, if there aren't, release some or even all of them. The same
kind of thing can happen with other SLRUs -- e.g. as the oldest
running xact gets older, you need more subxact buffers; when
long-running transactions end, you need fewer.

Again, I'm not sure what the right thing to do here actually is, just
that I wouldn't be too quick to reject a partial integration into
shared_buffers.

--
Robert Haas
EDB: http://www.enterprisedb.com