Thread: pread() and pwrite()

pread() and pwrite()

From
Thomas Munro
Date:
Hello hackers,

A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt
$SUBJECT.  Last year, Tobias Oberstein argued again that we should do
that[2].  At the end of that thread there was a +1 from multiple
committers in support of getting it done for PostgreSQL 12.  Since
Oskari hasn't returned, I decided to dust off his patch and start a
new thread.

Here is a rebase and an initial review.  I plan to address the
problems myself, unless Oskari wants to do that in which case I'll
happily review and hopefully eventually commit it.  It's possible I
introduced new bugs while rebasing since basically everything moved
around, but it passes check-world here with and without HAVE_PREAD and
HAVE_PWRITE defined.

Let me summarise what's going on in the patch:

1.  FileRead() and FileWrite() are replaced with FileReadAt() and
FileWriteAt(), taking a new offset argument.  Now we can do one
syscall instead of two for random reads and writes.
2.  fd.c no longer tracks seek position, so we lose a bunch of cruft
from the vfd machinery.  FileSeek() now only supports SEEK_SET and
SEEK_END.

This is taking the position that we no longer want to be able to do
file IO with implicit positioning at all.  I think that seems
reasonable: it's nice to drop all that position tracking and caching
code, and the seek-based approach would be incompatible with any
multithreading plans.  I'm not even sure I'd bother adding "At" to the
function names.  If there are any extensions that want the old
interface they will fail to compile either way.  Note that the BufFile
interface remains the same, so hopefully that covers many use cases.

I guess the only remaining reason to use FileSeek() is to get the file
size?  So I wonder why SEEK_SET remains valid in the patch... if my
suspicion is correct that only SEEK_END still has a reason to exist,
perhaps we should just kill FileSeek() and add FileSize() or something
instead?

        pgstat_report_wait_start(wait_event_info);
+#ifdef HAVE_PREAD
+       returnCode = pread(vfdP->fd, buffer, amount, offset);
+#else
+       lseek(VfdCache[file].fd, offset, SEEK_SET);
        returnCode = read(vfdP->fd, buffer, amount);
+#endif
        pgstat_report_wait_end();

This obviously lacks error handling for lseek().

I wonder if anyone would want separate wait_event tracking for the
lseek() (though this codepath would be used by almost nobody,
especially if someone adds Windows support, so it's probably not worth
bothering with).

I suppose we could assume that if you have pread() you must also have
pwrite() and save on ./configure cycles.

I will add this to the next Festival of Commits, though clearly it
needs a bit more work before the festivities begin.

[1] https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi
[2] https://www.postgresql.org/message-id/flat/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Thomas Munro
Date:
On Thu, Jul 12, 2018 at 1:55 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I guess the only remaining reason to use FileSeek() is to get the file
> size?  So I wonder why SEEK_SET remains valid in the patch... if my
> suspicion is correct that only SEEK_END still has a reason to exist,
> perhaps we should just kill FileSeek() and add FileSize() or something
> instead?

Done.

>         pgstat_report_wait_start(wait_event_info);
> +#ifdef HAVE_PREAD
> +       returnCode = pread(vfdP->fd, buffer, amount, offset);
> +#else
> +       lseek(VfdCache[file].fd, offset, SEEK_SET);
>         returnCode = read(vfdP->fd, buffer, amount);
> +#endif
>         pgstat_report_wait_end();
>
> This obviously lacks error handling for lseek().

Fixed.

Updated the main WAL IO routines to use pread()/pwrite() too.

Not super heavily tested yet.

An idea for how to handle Windows, in a follow-up patch: add a file
src/backend/port/win32/file.c that defines pgwin32_pread() and
pgwin32_pwrite(), wrapping WriteFile()/ReadFile() and passing in an
"OVERLAPPED" struct with the offset and sets errno on error, then set
up the macros so that Windows can use them as pread(), pwrite().  It
might also be necessary to open all files with FILE_FLAG_OVERLAPPED.
Does any Windows hacker have a bettter idea, and/or want to try to
write that patch?  Otherwise I'll eventually try to do some long
distance hacking on AppVeyor.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Heikki Linnakangas
Date:
On 20/07/18 01:50, Thomas Munro wrote:
> An idea for how to handle Windows, in a follow-up patch: add a file
> src/backend/port/win32/file.c that defines pgwin32_pread() and
> pgwin32_pwrite(), wrapping WriteFile()/ReadFile() and passing in an
> "OVERLAPPED" struct with the offset and sets errno on error, then set
> up the macros so that Windows can use them as pread(), pwrite().  It
> might also be necessary to open all files with FILE_FLAG_OVERLAPPED.
> Does any Windows hacker have a bettter idea, and/or want to try to
> write that patch?  Otherwise I'll eventually try to do some long
> distance hacking on AppVeyor.

No objections, if you want to make the effort. But IMHO the lseek+read 
fallback is good enough on Windows. Unless you were thinking that we 
could then remove the !HAVE_PREAD fallback altogether. Are there any 
other platforms out there that don't have pread/pwrite that we care about?

- Heikki


Re: pread() and pwrite()

From
Oskari Saarenmaa
Date:
On Thu, Jul 12, 2018 at 01:55:31PM +1200, Thomas Munro wrote:
> A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt
> $SUBJECT.  Last year, Tobias Oberstein argued again that we should do
> that[2].  At the end of that thread there was a +1 from multiple
> committers in support of getting it done for PostgreSQL 12.  Since
> Oskari hasn't returned, I decided to dust off his patch and start a
> new thread.

Thanks for picking this up - I was meaning to get back to this, but have
unfortunately been too busy with other projects.

> 1.  FileRead() and FileWrite() are replaced with FileReadAt() and
> FileWriteAt(), taking a new offset argument.  Now we can do one
> syscall instead of two for random reads and writes.

> [...]  I'm not even sure I'd bother adding "At" to the
> function names.  If there are any extensions that want the old
> interface they will fail to compile either way.  Note that the BufFile
> interface remains the same, so hopefully that covers many use cases.

IIRC I used the "At" suffixes in my first version of the patch before
completely removing the functions which didn't take an offset argument
Now that they're gone I agree that we could just drop the "At" suffix;
"at" suffix is also used by various POSIX functions to operate in a
specific directory which may just add to confusion.

> I guess the only remaining reason to use FileSeek() is to get the file
> size?  So I wonder why SEEK_SET remains valid in the patch... if my
> suspicion is correct that only SEEK_END still has a reason to exist,
> perhaps we should just kill FileSeek() and add FileSize() or something
> instead?

I see you did this in your updated patch :+1:

Happy to see this patch move forward.

/ Oskari


Re: pread() and pwrite()

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> No objections, if you want to make the effort. But IMHO the lseek+read 
> fallback is good enough on Windows. Unless you were thinking that we 
> could then remove the !HAVE_PREAD fallback altogether. Are there any 
> other platforms out there that don't have pread/pwrite that we care about?

AFAICT, macOS has them as far back as we care about (prairiedog does).
HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep
the lseek+read workaround.  Don't know about the oldest Solaris critters
we have in the buildfarm.  FreeBSD has had 'em at least since 4.0 (1994);
didn't check the other BSDen.

SUS v2 (POSIX 1997) does specify both functions, so we could insist on
their presence without breaking any of our own portability guidelines.
However, if we have to have some workaround anyway for Windows, it
seems like including an lseek+read code path is reasonable so that we
needn't retire those oldest buildfarm critters.

            regards, tom lane


Re: pread() and pwrite()

From
Daniel Gustafsson
Date:
> On 20 Jul 2018, at 17:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> No objections, if you want to make the effort. But IMHO the lseek+read 
>> fallback is good enough on Windows. Unless you were thinking that we 
>> could then remove the !HAVE_PREAD fallback altogether. Are there any 
>> other platforms out there that don't have pread/pwrite that we care about?
> 
> AFAICT, macOS has them as far back as we care about (prairiedog does).
> HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep
> the lseek+read workaround.  Don't know about the oldest Solaris critters
> we have in the buildfarm.  FreeBSD has had 'em at least since 4.0 (1994);
> didn't check the other BSDen.

The OpenBSD box I have access to has pwrite/pread, and have had for some time
if I read the manpage right.

cheers ./daniel


Re: pread() and pwrite()

From
Thomas Munro
Date:
On Fri, Jul 20, 2018 at 8:14 PM, Oskari Saarenmaa <os@ohmu.fi> wrote:
> On Thu, Jul 12, 2018 at 01:55:31PM +1200, Thomas Munro wrote:
>> [...]  I'm not even sure I'd bother adding "At" to the
>> function names.  If there are any extensions that want the old
>> interface they will fail to compile either way.  Note that the BufFile
>> interface remains the same, so hopefully that covers many use cases.
>
> IIRC I used the "At" suffixes in my first version of the patch before
> completely removing the functions which didn't take an offset argument
> Now that they're gone I agree that we could just drop the "At" suffix;
> "at" suffix is also used by various POSIX functions to operate in a
> specific directory which may just add to confusion.

Done.  Rebased.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Thomas Munro
Date:
On Sat, Jul 21, 2018 at 3:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > No objections, if you want to make the effort. But IMHO the lseek+read
> > fallback is good enough on Windows. Unless you were thinking that we
> > could then remove the !HAVE_PREAD fallback altogether. Are there any
> > other platforms out there that don't have pread/pwrite that we care about?
>
> AFAICT, macOS has them as far back as we care about (prairiedog does).
> HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep
> the lseek+read workaround.  Don't know about the oldest Solaris critters
> we have in the buildfarm.  FreeBSD has had 'em at least since 4.0 (1994);
> didn't check the other BSDen.
>
> SUS v2 (POSIX 1997) does specify both functions, so we could insist on
> their presence without breaking any of our own portability guidelines.
> However, if we have to have some workaround anyway for Windows, it
> seems like including an lseek+read code path is reasonable so that we
> needn't retire those oldest buildfarm critters.

Yeah it seems useful and cheap to carry the lseek() fallback.  But
actually there is a good reason to implement proper pread/pwrite
(equivalent) on Windows: this patch removes the position tracking, so
that the fallback code generates *more* lseek() calls than current
master.  For example with sequential reads today we are smart enough
to skip redundant lseek() calls, but this patch removes those smarts.
I doubt anyone cares about that on HPUX 10.20 but I don't think we
should do that on Windows.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi,

On 07/26/2018 10:04 PM, Thomas Munro wrote:
> Done.  Rebased.
> 

This needs a rebase again.

Once resolved the patch passes make check-world, and a strace analysis 
shows the associated read()/write() have been turned into 
pread64()/pwrite64(). All lseek()'s are SEEK_END's.

Best regards,
  Jesper


Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi,

On 09/05/2018 02:42 PM, Jesper Pedersen wrote:
> On 07/26/2018 10:04 PM, Thomas Munro wrote:
>> Done.  Rebased.
>>
> 
> This needs a rebase again.
> 

Would it be of benefit to update these call sites

* slru.c
   - SlruPhysicalReadPage
   - SlruPhysicalWritePage
* xlogutils.c
   - XLogRead
* pg_receivewal.c
   - FindStreamingStart
* rewriteheap.c
   - heap_xlog_logical_rewrite
* walreceiver.c
   - XLogWalRcvWrite

too ?

Best regards,
  Jesper



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Fri, Sep 7, 2018 at 2:17 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> > This needs a rebase again.

Done.

> Would it be of benefit to update these call sites
>
> * slru.c
>    - SlruPhysicalReadPage
>    - SlruPhysicalWritePage
> * xlogutils.c
>    - XLogRead
> * pg_receivewal.c
>    - FindStreamingStart
> * rewriteheap.c
>    - heap_xlog_logical_rewrite
> * walreceiver.c
>    - XLogWalRcvWrite

It certainly wouldn't hurt... but more pressing to get this committed
would be Windows support IMHO.  I think the thing to do is to open
files with the FILE_FLAG_OVERLAPPED flag, and then use ReadFile() and
WriteFile() with an LPOVERLAPPED struct that holds an offset, but I'm
not sure if I can write that myself.  I tried doing some semi-serious
Windows development for the fsyncgate patch using only AppVeyor CI a
couple of weeks ago and it was like visiting the dentist.

On Thu, Sep 6, 2018 at 6:42 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Once resolved the patch passes make check-world, and a strace analysis
> shows the associated read()/write() have been turned into
> pread64()/pwrite64(). All lseek()'s are SEEK_END's.

Yeah :-)  Just for fun, here is the truss output for a single pgbench
transaction here:

recvfrom(9,"B\0\0\0\^R\0P0_4\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 41 (0x29)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\nBEG"...,27,0,NULL,0) = 27 (0x1b)
recvfrom(9,"B\0\0\0&\0P0_5\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 61 (0x3d)
pread(22,"\0\0\0\0\^P\M^C\M-@P\0\0\0\0\M-X"...,8192,0x960a000) = 8192 (0x2000)
pread(20,"\0\0\0\0\M^X\^D\M^Iq\0\0\0\0\^T"...,8192,0x380fe000) = 8192 (0x2000)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0\^]\0P0_6\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 52 (0x34)
sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,75,0,NULL,0) = 75 (0x4b)
recvfrom(9,"B\0\0\0"\0P0_7\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 57 (0x39)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0!\0P0_8\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 56 (0x38)
lseek(29,0x0,SEEK_END)                           = 8192 (0x2000)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0003\0P0_9\0\0\0\0\^D\0\0"...,8192,0,NULL,0x0) = 74 (0x4a)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\^OIN"...,32,0,NULL,0) = 32 (0x20)
recvfrom(9,"B\0\0\0\^S\0P0_10\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 42 (0x2a)
pwrite(33,"\M^X\M-P\^E\0\^A\0\0\0\0\M-`\M^A"...,16384,0x81e000) = 16384 (0x4000)
fdatasync(0x21)                                  = 0 (0x0)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\vCOM"...,28,0,NULL,0) = 28 (0x1c)

There is only one lseek() left.  I actually have another patch that
gets rid of even that (by caching sizes in SMgrRelation using a shared
invalidation counter which I'm not yet sure about).  Then pgbench's
7-round-trip transaction makes only the strictly necessary 18 syscalls
(every one an explainable network message, disk page or sync).
Unpatched master has 5 extra lseek()s.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Thomas Munro
Date:
On Wed, Sep 19, 2018 at 1:48 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Sep 7, 2018 at 2:17 AM Jesper Pedersen
> <jesper.pedersen@redhat.com> wrote:
> > > This needs a rebase again.

And again, due to the conflict with ppoll in AC_CHECK_FUNCS.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi Thomas,

On 9/18/18 9:48 PM, Thomas Munro wrote:
> It certainly wouldn't hurt... but more pressing to get this committed
> would be Windows support IMHO.  I think the thing to do is to open
> files with the FILE_FLAG_OVERLAPPED flag, and then use ReadFile() and
> WriteFile() with an LPOVERLAPPED struct that holds an offset, but I'm
> not sure if I can write that myself.  I tried doing some semi-serious
> Windows development for the fsyncgate patch using only AppVeyor CI a
> couple of weeks ago and it was like visiting the dentist.
> 

Sorry, no idea about this. Maybe Magnus can provide some feedback ?

> On Thu, Sep 6, 2018 at 6:42 AM Jesper Pedersen
>> Once resolved the patch passes make check-world, and a strace analysis
>> shows the associated read()/write() have been turned into
>> pread64()/pwrite64(). All lseek()'s are SEEK_END's.
> 
> Yeah :-) 

Thanks for v5 too.

Best regards,
  Jesper



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Fri, Sep 28, 2018 at 2:03 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Thanks for v5 too.

Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Thomas Munro
Date:
On Fri, Sep 28, 2018 at 2:03 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Thanks for v5 too.

Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!

-- 
Thomas Munro
http://www.enterprisedb.com

Re: pread() and pwrite()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!

Yeah, I've been burnt by that too recently.  It occurs to me we could make
that at least a little less painful if we formatted the macro with one
line per function name:

AC_CHECK_FUNCS([
    cbrt
    clock_gettime
    fdatasync
    ...
    wcstombs_l
])

You'd still get conflicts in configure itself, of course, but that
doesn't require manual work to resolve -- just re-run autoconf.

            regards, tom lane


Re: pread() and pwrite()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!

Yeah, I've been burnt by that too recently.  It occurs to me we could make
that at least a little less painful if we formatted the macro with one
line per function name:

AC_CHECK_FUNCS([
    cbrt
    clock_gettime
    fdatasync
    ...
    wcstombs_l
])

You'd still get conflicts in configure itself, of course, but that
doesn't require manual work to resolve -- just re-run autoconf.

            regards, tom lane



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!
>
> Yeah, I've been burnt by that too recently.  It occurs to me we could make
> that at least a little less painful if we formatted the macro with one
> line per function name:
>
> AC_CHECK_FUNCS([
>         cbrt
>         clock_gettime
>         fdatasync
>         ...
>         wcstombs_l
> ])
>
> You'd still get conflicts in configure itself, of course, but that
> doesn't require manual work to resolve -- just re-run autoconf.

+1, was about to suggest the same!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!
>
> Yeah, I've been burnt by that too recently.  It occurs to me we could make
> that at least a little less painful if we formatted the macro with one
> line per function name:
>
> AC_CHECK_FUNCS([
>         cbrt
>         clock_gettime
>         fdatasync
>         ...
>         wcstombs_l
> ])
>
> You'd still get conflicts in configure itself, of course, but that
> doesn't require manual work to resolve -- just re-run autoconf.

+1, was about to suggest the same!

-- 
Thomas Munro
http://www.enterprisedb.com


Re: pread() and pwrite()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
>> that at least a little less painful if we formatted the macro with one
>> line per function name:
>> 
>> AC_CHECK_FUNCS([
>> cbrt
>> clock_gettime
>> fdatasync
>> ...
>> wcstombs_l
>> ])
>> 
>> You'd still get conflicts in configure itself, of course, but that
>> doesn't require manual work to resolve -- just re-run autoconf.

> +1, was about to suggest the same!

Sold, I'll go do it.

            regards, tom lane



Re: pread() and pwrite()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
>> that at least a little less painful if we formatted the macro with one
>> line per function name:
>> 
>> AC_CHECK_FUNCS([
>> cbrt
>> clock_gettime
>> fdatasync
>> ...
>> wcstombs_l
>> ])
>> 
>> You'd still get conflicts in configure itself, of course, but that
>> doesn't require manual work to resolve -- just re-run autoconf.

> +1, was about to suggest the same!

Sold, I'll go do it.

            regards, tom lane


Re: pread() and pwrite()

From
Tom Lane
Date:
I wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
>>> that at least a little less painful if we formatted the macro with one
>>> line per function name:

>> +1, was about to suggest the same!

> Sold, I'll go do it.

Learned a few new things about M4 along the way :-( ... but done.
You'll need to rebase the pread patch again of course.

            regards, tom lane



Re: pread() and pwrite()

From
Tom Lane
Date:
I wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
>>> that at least a little less painful if we formatted the macro with one
>>> line per function name:

>> +1, was about to suggest the same!

> Sold, I'll go do it.

Learned a few new things about M4 along the way :-( ... but done.
You'll need to rebase the pread patch again of course.

            regards, tom lane


Re: pread() and pwrite()

From
Andrew Dunstan
Date:

On 10/08/2018 09:55 PM, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!
> Yeah, I've been burnt by that too recently.  It occurs to me we could make
> that at least a little less painful if we formatted the macro with one
> line per function name:
>
> AC_CHECK_FUNCS([
>     cbrt
>     clock_gettime
>     fdatasync
>     ...
>     wcstombs_l
> ])
>
> You'd still get conflicts in configure itself, of course, but that
> doesn't require manual work to resolve -- just re-run autoconf.
>
>             



By and large I think it's better not to submit patches with changes to 
configure, but to let the committer run autoconf.

You can avoid getting such changes in your patches by doing something 
like this:

     git config diff.nodiff.command /bin/true
     echo configure diff=nodiff >> .git/info/attributes

If you actually want to turn this off and see any diffs in configure, run

     git diff --no-ext-diff

It's also possible to supply a filter expression to 'git diff'.

OTOH, this will probably confuse the heck out of the cfbot patch checker.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pread() and pwrite()

From
Andres Freund
Date:
On 2018-10-09 14:32:29 -0400, Andrew Dunstan wrote:
> 
> 
> On 10/08/2018 09:55 PM, Tom Lane wrote:
> > Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > > Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!
> > Yeah, I've been burnt by that too recently.  It occurs to me we could make
> > that at least a little less painful if we formatted the macro with one
> > line per function name:
> > 
> > AC_CHECK_FUNCS([
> >     cbrt
> >     clock_gettime
> >     fdatasync
> >     ...
> >     wcstombs_l
> > ])
> > 
> > You'd still get conflicts in configure itself, of course, but that
> > doesn't require manual work to resolve -- just re-run autoconf.
> > 
> >             
> 
> 
> 
> By and large I think it's better not to submit patches with changes to
> configure, but to let the committer run autoconf.

> OTOH, this will probably confuse the heck out of the cfbot patch checker.

And make life harder for reviewers.

-1 on this one.

Greetings,

Andres Freund


Re: pread() and pwrite()

From
Andrew Dunstan
Date:

On 10/09/2018 02:37 PM, Andres Freund wrote:
> On 2018-10-09 14:32:29 -0400, Andrew Dunstan wrote:
>>
>> On 10/08/2018 09:55 PM, Tom Lane wrote:
>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>> Rebased again.  Patches that touch AC_CHECK_FUNCS are fun like that!
>>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
>>> that at least a little less painful if we formatted the macro with one
>>> line per function name:
>>>
>>> AC_CHECK_FUNCS([
>>>     cbrt
>>>     clock_gettime
>>>     fdatasync
>>>     ...
>>>     wcstombs_l
>>> ])
>>>
>>> You'd still get conflicts in configure itself, of course, but that
>>> doesn't require manual work to resolve -- just re-run autoconf.
>>>
>>>             
>>
>>
>> By and large I think it's better not to submit patches with changes to
>> configure, but to let the committer run autoconf.
>> OTOH, this will probably confuse the heck out of the cfbot patch checker.
> And make life harder for reviewers.
>
> -1 on this one.
>


Maybe I'm thinking back to the time when we used to use a bunch of old 
versions of autoconf ...

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Tue, Oct 9, 2018 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Thomas Munro <thomas.munro@enterprisedb.com> writes:
> >> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Yeah, I've been burnt by that too recently.  It occurs to me we could make
> >>> that at least a little less painful if we formatted the macro with one
> >>> line per function name:
>
> >> +1, was about to suggest the same!
>
> > Sold, I'll go do it.
>
> Learned a few new things about M4 along the way :-( ... but done.
> You'll need to rebase the pread patch again of course.

Thanks, much nicer.  Rebased.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi Thomas,

On 10/9/18 4:56 PM, Thomas Munro wrote:
> Thanks, much nicer.  Rebased.
> 

This still applies, and passes make check-world.

I wonder what the commit policy is on this, if the Windows part isn't 
included. I read Heikki's comment [1] as it would be ok to commit 
benefiting all platforms that has pread/pwrite.

The functions in [2] could be a follow-up patch as well.

[1] 
https://www.postgresql.org/message-id/6cc7c8dd-29f9-7d75-d18a-99f19c076d10%40iki.fi
[2] 
https://www.postgresql.org/message-id/c2f56d0a-cadd-3df1-ae48-b84dc8128c37%40redhat.com

Best regards,
  Jesper



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> This still applies, and passes make check-world.
>
> I wonder what the commit policy is on this, if the Windows part isn't
> included. I read Heikki's comment [1] as it would be ok to commit
> benefiting all platforms that has pread/pwrite.

Here's a patch to add Windows support by supplying
src/backend/port/win32/pread.c.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Thomas Munro
Date:
On Sun, Nov 4, 2018 at 12:03 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen
> <jesper.pedersen@redhat.com> wrote:
> > This still applies, and passes make check-world.
> >
> > I wonder what the commit policy is on this, if the Windows part isn't
> > included. I read Heikki's comment [1] as it would be ok to commit
> > benefiting all platforms that has pread/pwrite.
>
> Here's a patch to add Windows support by supplying
> src/backend/port/win32/pread.c.  Thoughts?

If we do that, I suppose we might as well supply implementations for
HP-UX 10.20 as well, and then we can get rid of the conditional macro
stuff at various call sites and use pread() and pwrite() freely.
Here's a version that does it that way.  One question is whether the
caveat mentioned in patch 0001 is acceptable.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi Thomas,

On 11/5/18 7:08 AM, Thomas Munro wrote:
> On Sun, Nov 4, 2018 at 12:03 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen
>> <jesper.pedersen@redhat.com> wrote:
>>> This still applies, and passes make check-world.
>>>
>>> I wonder what the commit policy is on this, if the Windows part isn't
>>> included. I read Heikki's comment [1] as it would be ok to commit
>>> benefiting all platforms that has pread/pwrite.
>>
>> Here's a patch to add Windows support by supplying
>> src/backend/port/win32/pread.c.  Thoughts?
> 
> If we do that, I suppose we might as well supply implementations for
> HP-UX 10.20 as well, and then we can get rid of the conditional macro
> stuff at various call sites and use pread() and pwrite() freely.
> Here's a version that does it that way.  One question is whether the
> caveat mentioned in patch 0001 is acceptable.
> 

Passed check-world, but I can't verify the 0001 patch. Reading the the 
API it looks ok to me.

I guess the caveat in 0001 is ok, as it is a side-effect of the 
underlying API.

Best regards,
  Jesper


Re: pread() and pwrite()

From
Alvaro Herrera
Date:
On 2018-Nov-04, Thomas Munro wrote:

> Here's a patch to add Windows support by supplying
> src/backend/port/win32/pread.c.  Thoughts?

Hmm, so how easy is to detect that somebody runs read/write on fds where
pread/pwrite have occurred?  I guess for data files it's easy to detect
since you'd quickly end up with corrupted files, but what about other
kinds of files?  I wonder if we should be worrying about using this
interface somewhere other than fd.c and forgetting about the limitation.
Say, what happens if we patch some place in xlog.c after this patch gets
in, using write() instead of pwrite()?

I suppose the safest approach is to use lseek (or whatever) to fix up
the position after the pread/pwrite -- but we don't want to pay the
price on an additional syscall.  Are there any other options?  Is there
a way to prevent read/write from being used on a file handle?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pread() and pwrite()

From
Alvaro Herrera
Date:
Please remove Tell from line 18 in fd.h.  To Küssnacht with him!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pread() and pwrite()

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Nov-04, Thomas Munro wrote:
>> Here's a patch to add Windows support by supplying
>> src/backend/port/win32/pread.c.  Thoughts?

> Hmm, so how easy is to detect that somebody runs read/write on fds where
> pread/pwrite have occurred?  I guess for data files it's easy to detect
> since you'd quickly end up with corrupted files, but what about other
> kinds of files?  I wonder if we should be worrying about using this
> interface somewhere other than fd.c and forgetting about the limitation.

Yeah.  I think the patch as presented is OK; it uses pread/pwrite only
inside fd.c, which is a reasonably non-leaky abstraction.  But there's
definitely a hazard of somebody submitting a patch that depends on
using pread/pwrite elsewhere, and then that maybe not working.

What I suggest is that we *not* try to make this a completely transparent
substitute.  Instead, make the functions exported by src/port/ be
"pg_pread" and "pg_pwrite", and inside fd.c we'd write something like

#ifdef HAVE_PREAD
#define pg_pread pread
#endif

and then refer to pg_pread/pg_pwrite in the body of that file.  That
way, if someone refers to pread and expects standard functionality
from it, they'll get a failure on platforms not supporting it.

FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly
and pass the core regression tests.

            regards, tom lane


Re: pread() and pwrite()

From
Thomas Munro
Date:
On Tue, Nov 6, 2018 at 5:07 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Please remove Tell from line 18 in fd.h.  To Küssnacht with him!

Thanks, done.  But what is this arrow sticking through my Mac laptop's
screen...?

On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On 2018-Nov-04, Thomas Munro wrote:
> >> Here's a patch to add Windows support by supplying
> >> src/backend/port/win32/pread.c.  Thoughts?
>
> > Hmm, so how easy is to detect that somebody runs read/write on fds where
> > pread/pwrite have occurred?  I guess for data files it's easy to detect
> > since you'd quickly end up with corrupted files, but what about other
> > kinds of files?  I wonder if we should be worrying about using this
> > interface somewhere other than fd.c and forgetting about the limitation.
>
> Yeah.  I think the patch as presented is OK; it uses pread/pwrite only
> inside fd.c, which is a reasonably non-leaky abstraction.  But there's
> definitely a hazard of somebody submitting a patch that depends on
> using pread/pwrite elsewhere, and then that maybe not working.
>
> What I suggest is that we *not* try to make this a completely transparent
> substitute.  Instead, make the functions exported by src/port/ be
> "pg_pread" and "pg_pwrite", and inside fd.c we'd write something like
>
> #ifdef HAVE_PREAD
> #define pg_pread pread
> #endif
>
> and then refer to pg_pread/pg_pwrite in the body of that file.  That
> way, if someone refers to pread and expects standard functionality
> from it, they'll get a failure on platforms not supporting it.

OK.  But since we're using this from both fd.c and xlog.c, I put that
into src/include/port.h.

> FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly
> and pass the core regression tests.

Thanks.  I also tested the replacements by temporarily hacking my
configure script to look for the wrong function name:

-ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread"
+ac_fn_c_check_func "$LINENO" "preadx" "ac_cv_func_pread"

-ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite"
+ac_fn_c_check_func "$LINENO" "pwritex" "ac_cv_func_pwrite"

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pread() and pwrite()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What I suggest is that we *not* try to make this a completely transparent
>> substitute.  Instead, make the functions exported by src/port/ be
>> "pg_pread" and "pg_pwrite", ...

> OK.  But since we're using this from both fd.c and xlog.c, I put that
> into src/include/port.h.

LGTM.  I didn't bother to run an actual test cycle, since it's not
materially different from the previous version as far as portability
is concerned.

            regards, tom lane


Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi,

On 11/5/18 9:10 PM, Thomas Munro wrote:
> On Tue, Nov 6, 2018 at 5:07 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Please remove Tell from line 18 in fd.h.  To Küssnacht with him!
> 
> Thanks, done.  But what is this arrow sticking through my Mac laptop's
> screen...?
> 
> On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> On 2018-Nov-04, Thomas Munro wrote:
>>>> Here's a patch to add Windows support by supplying
>>>> src/backend/port/win32/pread.c.  Thoughts?
>>
>>> Hmm, so how easy is to detect that somebody runs read/write on fds where
>>> pread/pwrite have occurred?  I guess for data files it's easy to detect
>>> since you'd quickly end up with corrupted files, but what about other
>>> kinds of files?  I wonder if we should be worrying about using this
>>> interface somewhere other than fd.c and forgetting about the limitation.
>>
>> Yeah.  I think the patch as presented is OK; it uses pread/pwrite only
>> inside fd.c, which is a reasonably non-leaky abstraction.  But there's
>> definitely a hazard of somebody submitting a patch that depends on
>> using pread/pwrite elsewhere, and then that maybe not working.
>>
>> What I suggest is that we *not* try to make this a completely transparent
>> substitute.  Instead, make the functions exported by src/port/ be
>> "pg_pread" and "pg_pwrite", and inside fd.c we'd write something like
>>
>> #ifdef HAVE_PREAD
>> #define pg_pread pread
>> #endif
>>
>> and then refer to pg_pread/pg_pwrite in the body of that file.  That
>> way, if someone refers to pread and expects standard functionality
>> from it, they'll get a failure on platforms not supporting it.
> 
> OK.  But since we're using this from both fd.c and xlog.c, I put that
> into src/include/port.h.
> 
>> FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly
>> and pass the core regression tests.
> 

Passes check-world, and includes the feedback on this thread.

New status: Ready for Committer

Best regards,
  Jesper



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Passes check-world, and includes the feedback on this thread.
>
> New status: Ready for Committer

Thanks!  Pushed.  I'll keep an eye on the build farm to see if
anything breaks on Cygwin or some other frankenOS.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi Thomas,

On 11/6/18 4:04 PM, Thomas Munro wrote:
> On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen
> Thanks!  Pushed.  I'll keep an eye on the build farm to see if
> anything breaks on Cygwin or some other frankenOS.
> 

There is [1] on Andres' skink setup. Looking.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01

Best regards,
  Jesper


Re: pread() and pwrite()

From
Andrew Dunstan
Date:
On 11/7/18 7:26 AM, Jesper Pedersen wrote:
> Hi Thomas,
>
> On 11/6/18 4:04 PM, Thomas Munro wrote:
>> On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen
>> Thanks!  Pushed.  I'll keep an eye on the build farm to see if
>> anything breaks on Cygwin or some other frankenOS.
>>
>
> There is [1] on Andres' skink setup. Looking.
>
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01
>
>


And lousyjack, which uses a slightly different way of calling valgrind, 
and thus got past initdb, found a bunch more:


see 
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01>


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi,

On 11/7/18 7:26 AM, Jesper Pedersen wrote:
> On 11/6/18 4:04 PM, Thomas Munro wrote:
>> On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen
>> Thanks!  Pushed.  I'll keep an eye on the build farm to see if
>> anything breaks on Cygwin or some other frankenOS.
>>
> 
> There is [1] on Andres' skink setup. Looking.
> 

Attached is a reproducer.

Adding the memset() command for the page makes valgrind happy.

Thoughts on how to proceed with this ? The report in [1] shows that 
there are a number of call sites where the page(s) aren't fully initialized.

[1] 
https://www.postgresql.org/message-id/3fe1e38a-fb70-6260-9300-ce67ede21c32%40redhat.com

Best regards,
  Jesper

Attachment

Re: pread() and pwrite()

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 11/7/18 7:26 AM, Jesper Pedersen wrote:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01

> And lousyjack, which uses a slightly different way of calling valgrind,
> and thus got past initdb, found a bunch more:
> <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01>

I'm confused by this.  Surely the pwrite-based code is writing exactly the
same data as before.  Do we have to conclude that valgrind is complaining
about passing uninitialized data to pwrite() when it did not complain
about exactly the same thing for write()?

[ looks ... ]  No, what we have to conclude is that the write-related
suppressions in src/tools/valgrind.supp need to be replaced or augmented
with pwrite-related ones.

            regards, tom lane


Re: pread() and pwrite()

From
Andrew Dunstan
Date:
On 11/7/18 9:30 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 11/7/18 7:26 AM, Jesper Pedersen wrote:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01
>> And lousyjack, which uses a slightly different way of calling valgrind,
>> and thus got past initdb, found a bunch more:
>> <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01>
> I'm confused by this.  Surely the pwrite-based code is writing exactly the
> same data as before.  Do we have to conclude that valgrind is complaining
> about passing uninitialized data to pwrite() when it did not complain
> about exactly the same thing for write()?
>
> [ looks ... ]  No, what we have to conclude is that the write-related
> suppressions in src/tools/valgrind.supp need to be replaced or augmented
> with pwrite-related ones.


Yeah. I just trawled through the lousyjack logs and it looks like all 
the cases it reported could be handled by:

{
     padding_XLogRecData_pwrite
     Memcheck:Param
     pwrite64(buf)

     ...
     fun:XLogWrite
}

cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pread() and pwrite()

From
Jesper Pedersen
Date:
Hi Tom,

On 11/7/18 9:30 AM, Tom Lane wrote:
> I'm confused by this.  Surely the pwrite-based code is writing exactly the
> same data as before.  Do we have to conclude that valgrind is complaining
> about passing uninitialized data to pwrite() when it did not complain
> about exactly the same thing for write()?
> 
> [ looks ... ]  No, what we have to conclude is that the write-related
> suppressions in src/tools/valgrind.supp need to be replaced or augmented
> with pwrite-related ones.
> 

The attached patch fixes this for me.

Unfortunately pwrite* doesn't work for the pwrite64(buf) line.

Best regards,
  Jesper

Attachment

Re: pread() and pwrite()

From
Andrew Dunstan
Date:
On 11/7/18 10:05 AM, Jesper Pedersen wrote:
> Hi Tom,
>
> On 11/7/18 9:30 AM, Tom Lane wrote:
>> I'm confused by this.  Surely the pwrite-based code is writing 
>> exactly the
>> same data as before.  Do we have to conclude that valgrind is 
>> complaining
>> about passing uninitialized data to pwrite() when it did not complain
>> about exactly the same thing for write()?
>>
>> [ looks ... ]  No, what we have to conclude is that the write-related
>> suppressions in src/tools/valgrind.supp need to be replaced or augmented
>> with pwrite-related ones.
>>
>
> The attached patch fixes this for me.
>
> Unfortunately pwrite* doesn't work for the pwrite64(buf) line.
>
>


Works for me. If there's no objection I will commit this.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pread() and pwrite()

From
Thomas Munro
Date:
On Thu, Nov 8, 2018 at 4:27 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On 11/7/18 10:05 AM, Jesper Pedersen wrote:
> > On 11/7/18 9:30 AM, Tom Lane wrote:
> >> I'm confused by this.  Surely the pwrite-based code is writing
> >> exactly the
> >> same data as before.  Do we have to conclude that valgrind is
> >> complaining
> >> about passing uninitialized data to pwrite() when it did not complain
> >> about exactly the same thing for write()?
> >>
> >> [ looks ... ]  No, what we have to conclude is that the write-related
> >> suppressions in src/tools/valgrind.supp need to be replaced or augmented
> >> with pwrite-related ones.
> >
> > The attached patch fixes this for me.
> >
> > Unfortunately pwrite* doesn't work for the pwrite64(buf) line.
>
> Works for me. If there's no objection I will commit this.

Thanks for adjusting that.  I suppose I would have known about this if
cfbot checked every patch with valgrind, which I might look into.

I'm a little confused about how an uninitialised value originating in
an OID list finishes up in an xlog buffer, considering that OIDs don't
have padding.

-- 
Thomas Munro
http://www.enterprisedb.com