Thread: Preallocation changes in Postgresql 16

Preallocation changes in Postgresql 16

From
Riku Iki
Date:

Hi,

We have PostgreSQL server, which currently runs PostgreSQL 15 on compressed btrfs.

I tried to migrate DB to PostgreSQL 16, and found that data is not being compressed for PostgreSQL 16 server. One of the possible reason why btrfs won't compress data is data preallocation.

When running "compsize" tool, I indeed see that PostgreSQL preallocating data and it is not compressed (there is separate "preallocated" entry in output).

I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I posted this on StackExchange, and someone pointed on this commit as possible reason of such behavior.

Long discussion on lore.kernel.org about exactly this issue.

Re: Preallocation changes in Postgresql 16

From
Thomas Munro
Date:
On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:
> I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in
PostgreSQL16? 

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change.  Here is a
magic 8, tuned on "some filesystems":

        /*
         * If available and useful, use posix_fallocate() (via
         * FileFallocate()) to extend the relation. That's often more
         * efficient than using write(), as it commonly won't cause the kernel
         * to allocate page cache space for the extended pages.
         *
         * However, we don't use FileFallocate() for small extensions, as it
         * defeats delayed allocation on some filesystems. Not clear where
         * that decision should be made though? For now just use a cutoff of
         * 8, anything between 4 and 8 worked OK in some local testing.
         */
        if (numblocks > 8)

I wonder if it wants to be a GUC.



Re: Preallocation changes in Postgresql 16

From
Riku Iki
Date:
Thank you, I have such a system. I think my task would be to compile PG from sources(need to learn this), and see how it works with and without that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:
> I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change.  Here is a
magic 8, tuned on "some filesystems":

        /*
         * If available and useful, use posix_fallocate() (via
         * FileFallocate()) to extend the relation. That's often more
         * efficient than using write(), as it commonly won't cause the kernel
         * to allocate page cache space for the extended pages.
         *
         * However, we don't use FileFallocate() for small extensions, as it
         * defeats delayed allocation on some filesystems. Not clear where
         * that decision should be made though? For now just use a cutoff of
         * 8, anything between 4 and 8 worked OK in some local testing.
         */
        if (numblocks > 8)

I wonder if it wants to be a GUC.

Re: Preallocation changes in Postgresql 16

From
Riku Iki
Date:
I did the testing and confirmed that this was the issue.

I run following query:

 create table t as select '1234567890' from generate_series(1, 1000000000);

I commented if (numblocks > 8) codeblock, and see the following results from "compsize /dbdir/" command.


Before my changes:

Processed 1381 files, 90007 regular extents (90010 refs), 15 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       97%       41G          42G          42G      
none       100%       41G          41G          41G      
zstd        14%      157M         1.0G         1.0G      
prealloc   100%       16M          16M          16M



After the changes:

Processed 1381 files, 347328 regular extents (347331 refs), 15 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL        3%      1.4G          42G          42G      
none       100%       80K          80K          80K      
zstd         3%      1.4G          42G          42G

It is clearly visible that files created with fallocate are not compressed, and disk usage is much larger.
I am wondering if there is a way to have some feature request to have this parameter user configurable..  

On Fri, Apr 26, 2024 at 4:15 PM Riku Iki <riku.iki.x@gmail.com> wrote:
Thank you, I have such a system. I think my task would be to compile PG from sources(need to learn this), and see how it works with and without that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:
> I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change.  Here is a
magic 8, tuned on "some filesystems":

        /*
         * If available and useful, use posix_fallocate() (via
         * FileFallocate()) to extend the relation. That's often more
         * efficient than using write(), as it commonly won't cause the kernel
         * to allocate page cache space for the extended pages.
         *
         * However, we don't use FileFallocate() for small extensions, as it
         * defeats delayed allocation on some filesystems. Not clear where
         * that decision should be made though? For now just use a cutoff of
         * 8, anything between 4 and 8 worked OK in some local testing.
         */
        if (numblocks > 8)

I wonder if it wants to be a GUC.

Re: Preallocation changes in Postgresql 16

From
"Pierre Barre"
Date:
Hello,

It seems that I am running into this issue as well. 
Is it likely that this would ever be a config option?

Best,
Pierre Barre

On Fri, May 3, 2024, at 05:11, Riku Iki wrote:
I did the testing and confirmed that this was the issue.

I run following query:

 create table t as select '1234567890' from generate_series(1, 1000000000);

I commented if (numblocks > 8) codeblock, and see the following results from "compsize /dbdir/" command.


Before my changes:

Processed 1381 files, 90007 regular extents (90010 refs), 15 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       97%       41G          42G          42G      
none       100%       41G          41G          41G      
zstd        14%      157M         1.0G         1.0G      
prealloc   100%       16M          16M          16M



After the changes:

Processed 1381 files, 347328 regular extents (347331 refs), 15 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL        3%      1.4G          42G          42G      
none       100%       80K          80K          80K      
zstd         3%      1.4G          42G          42G

It is clearly visible that files created with fallocate are not compressed, and disk usage is much larger.
I am wondering if there is a way to have some feature request to have this parameter user configurable..  

On Fri, Apr 26, 2024 at 4:15 PM Riku Iki <riku.iki.x@gmail.com> wrote:
Thank you, I have such a system. I think my task would be to compile PG from sources(need to learn this), and see how it works with and without that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:
> I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change.  Here is a
magic 8, tuned on "some filesystems":

        /*
         * If available and useful, use posix_fallocate() (via
         * FileFallocate()) to extend the relation. That's often more
         * efficient than using write(), as it commonly won't cause the kernel
         * to allocate page cache space for the extended pages.
         *
         * However, we don't use FileFallocate() for small extensions, as it
         * defeats delayed allocation on some filesystems. Not clear where
         * that decision should be made though? For now just use a cutoff of
         * 8, anything between 4 and 8 worked OK in some local testing.
         */
        if (numblocks > 8)

I wonder if it wants to be a GUC.