Re: Initdb-time block size specification - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Initdb-time block size specification
Date
Msg-id d1b4d578-e0f8-49f3-714d-8467d8d25cf6@enterprisedb.com
Whole thread Raw
In response to Initdb-time block size specification  (David Christensen <david.christensen@crunchydata.com>)
Responses Re: Initdb-time block size specification
List pgsql-hackers
On 6/30/23 19:35, David Christensen wrote:
> Hi,
> 
> As discussed somewhat at PgCon, enclosed is v1 of a patch to provide
> variable block sizes; basically instead of BLCKSZ being a compile-time
> constant, a single set of binaries can support all of the block sizes
> Pg can support, using the value stored in pg_control as the basis.
> (Possible future plans would be to make this something even more
> dynamic, such as configured per tablespace, but this is out of scope;
> this just sets up the infrastructure for this.)
> 
> Whereas we had traditionally used BLCKSZ to indicate the compile-time selected
> block size, this commit adjusted things so the cluster block size can be
> selected at initdb time.
> 
> In order to code for this, we introduce a few new defines:
> 
> - CLUSTER_BLOCK_SIZE is the blocksize for this cluster itself.  This is not
> valid until BlockSizeInit() has been called in the given backend, which we do as
> early as possible by parsing the ControlFile and using the blcksz field.
> 
> - MIN_BLOCK_SIZE and MAX_BLOCK_SIZE are the limits for the selectable block
> size. It is required that CLUSTER_BLOCK_SIZE is a power of 2 between these two
> constants.
> 
> - DEFAULT_BLOCK_SIZE is the moral equivalent of BLCKSZ; it is the built-in
> default value.  This is used in a few places that just needed a buffer of an
> arbitrary size, but the dynamic value CLUSTER_BLOCK_SIZE should almost always be
> used instead.
> 
> - CLUSTER_RELSEG_SIZE is used instead of RELSEG_SIZE, since internally we are
> storing the segment size in terms of number of blocks.  RELSEG_SIZE is still
> kept, but is used in terms of the number of blocks of DEFAULT_BLOCK_SIZE;
> CLUSTER_RELSEG_SIZE scales appropriately (and is the only thing used internally)
> to keep the same target total segment size regardless of block size.
> 

Do we really want to prefix the option with CLUSTER_? That seems to just
add a lot of noise into the patch, and I don't see much value in this
rename. I'd prefer keeping BLCKSZ and tweak just the couple places that
need "default" to use BLCKSZ_DEFAULT or something like that.

But more importantly, I'd say we use CAPITALIZED_NAMES for compile-time
values, so after making this initdb-time parameter we should abandon
that (just like fc49e24fa69a did for segment sizes). Perhaps something
like cluste_block_size would work ... (yes, that touches all the places
too).

> This patch uses a precalculated table to store the block size itself, as well as
> additional derived values that have traditionally been compile-time
> constants (example: MaxHeapTuplesPerPage).  The traditional macro names are kept
> so code that doesn't care about it should not need to change, however the
> definition of these has changed (see the CalcXXX() routines in blocksize.h for
> details).
> 
> A new function, BlockSizeInit() populates the appropriate values based on the
> target block size.  This should be called as early as possible in any code that
> utilizes block sizes.  This patch adds this in the appropriate place on the
> handful of src/bin/ programs that used BLCKSZ, so this caveat mainly impacts new
> code.
> 
> Code which had previously used BLCKZ should likely be able to get away with
> changing these instances to CLUSTER_BLOCK_SIZE, unless you're using a structure
> allocated on the stack.  In these cases, the compiler will complain about
> dynamic structure.  The solution is to utilize an expression with MAX_BLOCK_SIZE
> instead of BLCKSZ, ensuring enough stack space is allocated for the maximum
> size. This also does require using CLUSTER_BLOCK_SIZE or an expression based on
> it when actually using this structure, so in practice more stack space may be
> allocated then used in principal; as long as there is plenty of stack this
> should have no specific impacts on code.
> 
> Initial (basic) performance testing shows only minor changes with the pgbench -S
> benchmark, though this is obviously an area that will need considerable
> testing/verification across multiple workloads.
> 

I wonder how to best evaluate/benchmark this. AFAICS what we want to
measure is the extra cost of making the values dynamic (which means the
compiler can't just optimize them out). I'd say a "pgbench -S" seems
like a good test.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Cary Huang
Date:
Subject: Re: sslinfo extension - add notbefore and notafter timestamps
Next
From: Nathan Bossart
Date:
Subject: Re: Move un-parenthesized syntax docs to "compatibility" for few SQL commands