[Patch] Windows relation extension failure at 2GB and 4GB - Mailing list pgsql-hackers

From Bryan Green
Subject [Patch] Windows relation extension failure at 2GB and 4GB
Date
Msg-id 0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
Whole thread Raw
List pgsql-hackers
Hello,

I found two related bugs in PostgreSQL's Windows port that prevent files 
from exceeding 2GB. While unlikely to affect most installations (default 
1GB segments), the code is objectively wrong and worth fixing.

The first bug is a pervasive use of off_t where pgoff_t should be used. 
On Windows, off_t is only 32-bit, causing signed integer overflow at 
exactly 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 
for this purpose and some function declarations in headers already use 
it, but the implementations weren't updated to match.

The problem shows up in multiple layers:

In fd.c and fd.h, the VfdCache structure's fileSize field uses off_t, as 
do FileSize(), FileTruncate(), and all the File* functions. These are 
the core file I/O abstraction layer that everything else builds on.

In md.c, _mdnblocks() uses off_t for its calculations. The actual 
arithmetic for computing file offsets is fine - the casts to pgoff_t 
work correctly - but passing these values through functions with off_t 
parameters truncates them.

In pg_iovec.h, pg_preadv() and pg_pwritev() take off_t offset 
parameters, truncating any 64-bit offsets passed from above.

In file_utils.c, pg_pwrite_zeros() takes an off_t offset parameter. This
function is called by FileZero() to extend files with zeros, so it's hit
during relation extension. This was the actual culprit in my testing -
mdzeroextend() would compute a correct 64-bit offset, but it got 
truncated to 32-bit (and negative) when passed to pg_pwrite_zeros().

After fixing all those off_t issues, there's a second bug at 4GB in the 
Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and 
win32pread.c. The current implementation uses an OVERLAPPED structure 
for positioned I/O, but only sets the Offset field (low 32 bits), 
leaving OffsetHigh at zero. This works up to 4GB by accident, but beyond 
that, offsets wrap around.

I can reproduce both bugs reliably with --with-segsize=8. The file grows 
to exactly 2GB and fails with "could not extend file: Invalid argument" 
despite having 300GB free. After fixing the off_t issues, it grows to 
exactly 4GB and hits the OVERLAPPED bug. Both are independently verifiable.

The fix touches nine files:
   src/include/storage/fd.h - File* function declarations
   src/backend/storage/file/fd.c - File* implementations and VfdCache
   src/backend/storage/smgr/md.c - _mdnblocks and other functions
   src/include/port/pg_iovec.h - pg_preadv/pg_pwritev signatures
   src/include/common/file_utils.h - pg_pwrite_zeros declaration
   src/common/file_utils.c - pg_pwrite_zeros implementation
   src/include/port/win32_port.h - pg_pread/pg_pwrite declarations
   src/port/win32pwrite.c - Windows pwrite implementation
   src/port/win32pread.c - Windows pread implementation

It's safe for all platforms since pgoff_t equals off_t on Unix where 
off_t is already 64-bit. Only Windows behavior changes.

That said, I'm finding off_t used in many other places throughout the 
codebase - buffile.c, various other file utilities such as backup and 
archive, probably more. This is likely causing latent bugs elsewhere on 
Windows, though most are masked by the 1GB default segment size. I'm 
investigating the full scope, but I think this needs to be broken up 
into multiple patches. The core file I/O layer (fd.c, md.c,
pg_pwrite/pg_pread) should probably go first since that's what's 
actively breaking file extension.

Not urgent since few people hit this in practice, but it's clearly wrong 
code.
Someone building with larger segments would see failures at 2GB and 
potential corruption at 4GB. Windows supports files up to 16 exabytes - 
no good reason to limit PostgreSQL to 2GB.

I have attached the patch to fix the relation extension problems for 
Windows to this email.

Can provide the other patches that changes off_t for pgoff_t in the rest 
of the code if there's interest in fixing this.

To reproduce the bugs on Windows:

1) Build with large segment size: meson setup build 
--prefix=C:\pgsql-test -Dsegsize=8.
2) Create a large table and insert data that will make it bigger than 2GB.

CREATE TABLE large_test (
     id bigserial PRIMARY KEY,
     data1 text,
     data2 text,
     data3 text
);

INSERT INTO large_test (data1, data2, data3)
SELECT
     repeat('A', 300),
     repeat('B', 300),
     repeat('C', 300)
FROM generate_series(1, 5000000);

SELECT pg_size_pretty(pg_relation_size('large_test'));

You will notice at this point that the first bug surfaces.

3) If you want to reproduce the 2nd bug then you should apply the patch 
and then comment out 'overlapped.OffsetHigh = (DWORD) (offset >> 32);' 
is win32pwrite.c.
4) Assuming you did 3, do the test in 2 again.  If you are watching the 
data/base/N/xxxxx file growing you will notice that it gets past 2GB but 
now fails at 4GB.


BG
Attachment

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Next
From: Robert Haas
Date:
Subject: Re: Making pg_rewind faster