Re: Initdb-time block size specification - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Initdb-time block size specification |
Date | |
Msg-id | d1b4d578-e0f8-49f3-714d-8467d8d25cf6@enterprisedb.com Whole thread Raw |
In response to | Initdb-time block size specification (David Christensen <david.christensen@crunchydata.com>) |
Responses |
Re: Initdb-time block size specification
|
List | pgsql-hackers |
On 6/30/23 19:35, David Christensen wrote: > Hi, > > As discussed somewhat at PgCon, enclosed is v1 of a patch to provide > variable block sizes; basically instead of BLCKSZ being a compile-time > constant, a single set of binaries can support all of the block sizes > Pg can support, using the value stored in pg_control as the basis. > (Possible future plans would be to make this something even more > dynamic, such as configured per tablespace, but this is out of scope; > this just sets up the infrastructure for this.) > > Whereas we had traditionally used BLCKSZ to indicate the compile-time selected > block size, this commit adjusted things so the cluster block size can be > selected at initdb time. > > In order to code for this, we introduce a few new defines: > > - CLUSTER_BLOCK_SIZE is the blocksize for this cluster itself. This is not > valid until BlockSizeInit() has been called in the given backend, which we do as > early as possible by parsing the ControlFile and using the blcksz field. > > - MIN_BLOCK_SIZE and MAX_BLOCK_SIZE are the limits for the selectable block > size. It is required that CLUSTER_BLOCK_SIZE is a power of 2 between these two > constants. > > - DEFAULT_BLOCK_SIZE is the moral equivalent of BLCKSZ; it is the built-in > default value. This is used in a few places that just needed a buffer of an > arbitrary size, but the dynamic value CLUSTER_BLOCK_SIZE should almost always be > used instead. > > - CLUSTER_RELSEG_SIZE is used instead of RELSEG_SIZE, since internally we are > storing the segment size in terms of number of blocks. RELSEG_SIZE is still > kept, but is used in terms of the number of blocks of DEFAULT_BLOCK_SIZE; > CLUSTER_RELSEG_SIZE scales appropriately (and is the only thing used internally) > to keep the same target total segment size regardless of block size. > Do we really want to prefix the option with CLUSTER_? That seems to just add a lot of noise into the patch, and I don't see much value in this rename. I'd prefer keeping BLCKSZ and tweak just the couple places that need "default" to use BLCKSZ_DEFAULT or something like that. But more importantly, I'd say we use CAPITALIZED_NAMES for compile-time values, so after making this initdb-time parameter we should abandon that (just like fc49e24fa69a did for segment sizes). Perhaps something like cluste_block_size would work ... (yes, that touches all the places too). > This patch uses a precalculated table to store the block size itself, as well as > additional derived values that have traditionally been compile-time > constants (example: MaxHeapTuplesPerPage). The traditional macro names are kept > so code that doesn't care about it should not need to change, however the > definition of these has changed (see the CalcXXX() routines in blocksize.h for > details). > > A new function, BlockSizeInit() populates the appropriate values based on the > target block size. This should be called as early as possible in any code that > utilizes block sizes. This patch adds this in the appropriate place on the > handful of src/bin/ programs that used BLCKSZ, so this caveat mainly impacts new > code. > > Code which had previously used BLCKZ should likely be able to get away with > changing these instances to CLUSTER_BLOCK_SIZE, unless you're using a structure > allocated on the stack. In these cases, the compiler will complain about > dynamic structure. The solution is to utilize an expression with MAX_BLOCK_SIZE > instead of BLCKSZ, ensuring enough stack space is allocated for the maximum > size. This also does require using CLUSTER_BLOCK_SIZE or an expression based on > it when actually using this structure, so in practice more stack space may be > allocated then used in principal; as long as there is plenty of stack this > should have no specific impacts on code. > > Initial (basic) performance testing shows only minor changes with the pgbench -S > benchmark, though this is obviously an area that will need considerable > testing/verification across multiple workloads. > I wonder how to best evaluate/benchmark this. AFAICS what we want to measure is the extra cost of making the values dynamic (which means the compiler can't just optimize them out). I'd say a "pgbench -S" seems like a good test. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: