Re: [Patch] Make block and file size for WAL and relations defined atcluster creation - Mailing list pgsql-hackers
From | Remi Colinet |
---|---|
Subject | Re: [Patch] Make block and file size for WAL and relations defined atcluster creation |
Date | |
Msg-id | CADdR5nxKG6VSMzjwEauzxEvf-28zKbnT-_xg-2q3aA4o9hjjtQ@mail.gmail.com Whole thread Raw |
In response to | Re: [Patch] Make block and file size for WAL and relations definedat cluster creation (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hello,
2018-01-03 21:51 GMT+01:00 Andres Freund <andres@anarazel.de>:
Hi,
On 2018-01-03 21:43:51 +0100, Remi Colinet wrote:
> - we may test different combinations of file and block sizes, for the
> relation and the WAL in order to have the better performances of the server.
> Avoiding a compilation for each combination of values seems to make sense.
That's something you need to proof to beneficial *before* we make this
change.
Performance is only one argument advocating for the need of run-time block/file sizes choices.
DBA may just want to have larger files for its relation and WAL in order to reduce the number of files. Why would this be an unacceptable wish? Just because a developer decided to chose a value for the whole world?
What about the fact that storage are getting larger every year? Ok, at some point in time, a developer may change the default value in the source code and rebuild. But this is not very handy. For insance, we do not need to rebuild a kernel when we want to change just one parameter.
By the way, we someone install Postgresql, he may not want to rebuild but only to use.
> - Selecting the correct values for file and block sizes is a DBA task, and
> not a developer task.
> For instance, when someone wants to create a Linux filesystem with a given
> block size, he is not forced to accept a given value chosed by the
> developer of the filesystem driver when this later was compiled.
I'm unconvinced there's as much value syncing up fs in pg as some
conventional wisdom says.
The argument is to tell that visible parameters should be set by users or DBAs. This is an admin task. For instance, if someone uses a storage with 4K sectors, he may need to set the block size to 4K for both WAL and relations, without having to rebuild the binaries. Building binaries is not an easy task for everybody.
> - The file and block sizes should depend mostly of the physical server and
> physical storage.
> Not of the database software itself.
Citation needed.
Someone using a large database will probably want to have larger files. This is matter of personal perception. Some companies may alsohave defined policies regarding databases in order to avoid having too many files.
When using a storage with 4K blocks, it may be better to use 4K block sizes for Postgresql. But then, what about a storage with 16K blocks? Rebuild again...? And then, you need a build for each block and file size combination. You may end up with a lot of builds to manage.
> Regarding the cost of using run-time configurable values for file and block
> sizes of the WAL and relations, this cost is low both :
>
> - from a developer point of view: the source code changes are spread in
> many files but only a few one have significant changes.
> Mainly the tidbitmap.c is concerned the change. Other changes are minor
> changes.
>
> - from a run-time point of view. The overhead is only at the start of the
> database instance.
> And moreover, the overhead is still very low at the start of the server,
> with only a few more dynamic memory allocations.
That's to some degree because you rely on stack allocation of variable
sided amounts of data - we can't rely on that. E.g. you allocate stack
variables sized by rel_block_size, that's unfortunately not
ok. Additionally some of the size calculations will have some
performance impact.
Data structures depending on BLCKSZ and allocated on stack are migrated to palloc/pfree management in the patch. A few files are concerned by such change with the most noticeable one being tidbitmap.c. This later one is a bit more difficult to change because it includes directly the header file simplehash.h (not nice for gdb). Anyway, I could perform the conversion to run-time values with a minimal change, even for tidbitmap.c
Regards
Remi
- Andres
pgsql-hackers by date: