Thread: pread() and pwrite()
Hello hackers, A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt $SUBJECT. Last year, Tobias Oberstein argued again that we should do that[2]. At the end of that thread there was a +1 from multiple committers in support of getting it done for PostgreSQL 12. Since Oskari hasn't returned, I decided to dust off his patch and start a new thread. Here is a rebase and an initial review. I plan to address the problems myself, unless Oskari wants to do that in which case I'll happily review and hopefully eventually commit it. It's possible I introduced new bugs while rebasing since basically everything moved around, but it passes check-world here with and without HAVE_PREAD and HAVE_PWRITE defined. Let me summarise what's going on in the patch: 1. FileRead() and FileWrite() are replaced with FileReadAt() and FileWriteAt(), taking a new offset argument. Now we can do one syscall instead of two for random reads and writes. 2. fd.c no longer tracks seek position, so we lose a bunch of cruft from the vfd machinery. FileSeek() now only supports SEEK_SET and SEEK_END. This is taking the position that we no longer want to be able to do file IO with implicit positioning at all. I think that seems reasonable: it's nice to drop all that position tracking and caching code, and the seek-based approach would be incompatible with any multithreading plans. I'm not even sure I'd bother adding "At" to the function names. If there are any extensions that want the old interface they will fail to compile either way. Note that the BufFile interface remains the same, so hopefully that covers many use cases. I guess the only remaining reason to use FileSeek() is to get the file size? So I wonder why SEEK_SET remains valid in the patch... if my suspicion is correct that only SEEK_END still has a reason to exist, perhaps we should just kill FileSeek() and add FileSize() or something instead? pgstat_report_wait_start(wait_event_info); +#ifdef HAVE_PREAD + returnCode = pread(vfdP->fd, buffer, amount, offset); +#else + lseek(VfdCache[file].fd, offset, SEEK_SET); returnCode = read(vfdP->fd, buffer, amount); +#endif pgstat_report_wait_end(); This obviously lacks error handling for lseek(). I wonder if anyone would want separate wait_event tracking for the lseek() (though this codepath would be used by almost nobody, especially if someone adds Windows support, so it's probably not worth bothering with). I suppose we could assume that if you have pread() you must also have pwrite() and save on ./configure cycles. I will add this to the next Festival of Commits, though clearly it needs a bit more work before the festivities begin. [1] https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi [2] https://www.postgresql.org/message-id/flat/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
On Thu, Jul 12, 2018 at 1:55 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I guess the only remaining reason to use FileSeek() is to get the file > size? So I wonder why SEEK_SET remains valid in the patch... if my > suspicion is correct that only SEEK_END still has a reason to exist, > perhaps we should just kill FileSeek() and add FileSize() or something > instead? Done. > pgstat_report_wait_start(wait_event_info); > +#ifdef HAVE_PREAD > + returnCode = pread(vfdP->fd, buffer, amount, offset); > +#else > + lseek(VfdCache[file].fd, offset, SEEK_SET); > returnCode = read(vfdP->fd, buffer, amount); > +#endif > pgstat_report_wait_end(); > > This obviously lacks error handling for lseek(). Fixed. Updated the main WAL IO routines to use pread()/pwrite() too. Not super heavily tested yet. An idea for how to handle Windows, in a follow-up patch: add a file src/backend/port/win32/file.c that defines pgwin32_pread() and pgwin32_pwrite(), wrapping WriteFile()/ReadFile() and passing in an "OVERLAPPED" struct with the offset and sets errno on error, then set up the macros so that Windows can use them as pread(), pwrite(). It might also be necessary to open all files with FILE_FLAG_OVERLAPPED. Does any Windows hacker have a bettter idea, and/or want to try to write that patch? Otherwise I'll eventually try to do some long distance hacking on AppVeyor. -- Thomas Munro http://www.enterprisedb.com
Attachment
On 20/07/18 01:50, Thomas Munro wrote: > An idea for how to handle Windows, in a follow-up patch: add a file > src/backend/port/win32/file.c that defines pgwin32_pread() and > pgwin32_pwrite(), wrapping WriteFile()/ReadFile() and passing in an > "OVERLAPPED" struct with the offset and sets errno on error, then set > up the macros so that Windows can use them as pread(), pwrite(). It > might also be necessary to open all files with FILE_FLAG_OVERLAPPED. > Does any Windows hacker have a bettter idea, and/or want to try to > write that patch? Otherwise I'll eventually try to do some long > distance hacking on AppVeyor. No objections, if you want to make the effort. But IMHO the lseek+read fallback is good enough on Windows. Unless you were thinking that we could then remove the !HAVE_PREAD fallback altogether. Are there any other platforms out there that don't have pread/pwrite that we care about? - Heikki
On Thu, Jul 12, 2018 at 01:55:31PM +1200, Thomas Munro wrote: > A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt > $SUBJECT. Last year, Tobias Oberstein argued again that we should do > that[2]. At the end of that thread there was a +1 from multiple > committers in support of getting it done for PostgreSQL 12. Since > Oskari hasn't returned, I decided to dust off his patch and start a > new thread. Thanks for picking this up - I was meaning to get back to this, but have unfortunately been too busy with other projects. > 1. FileRead() and FileWrite() are replaced with FileReadAt() and > FileWriteAt(), taking a new offset argument. Now we can do one > syscall instead of two for random reads and writes. > [...] I'm not even sure I'd bother adding "At" to the > function names. If there are any extensions that want the old > interface they will fail to compile either way. Note that the BufFile > interface remains the same, so hopefully that covers many use cases. IIRC I used the "At" suffixes in my first version of the patch before completely removing the functions which didn't take an offset argument Now that they're gone I agree that we could just drop the "At" suffix; "at" suffix is also used by various POSIX functions to operate in a specific directory which may just add to confusion. > I guess the only remaining reason to use FileSeek() is to get the file > size? So I wonder why SEEK_SET remains valid in the patch... if my > suspicion is correct that only SEEK_END still has a reason to exist, > perhaps we should just kill FileSeek() and add FileSize() or something > instead? I see you did this in your updated patch :+1: Happy to see this patch move forward. / Oskari
Heikki Linnakangas <hlinnaka@iki.fi> writes: > No objections, if you want to make the effort. But IMHO the lseek+read > fallback is good enough on Windows. Unless you were thinking that we > could then remove the !HAVE_PREAD fallback altogether. Are there any > other platforms out there that don't have pread/pwrite that we care about? AFAICT, macOS has them as far back as we care about (prairiedog does). HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep the lseek+read workaround. Don't know about the oldest Solaris critters we have in the buildfarm. FreeBSD has had 'em at least since 4.0 (1994); didn't check the other BSDen. SUS v2 (POSIX 1997) does specify both functions, so we could insist on their presence without breaking any of our own portability guidelines. However, if we have to have some workaround anyway for Windows, it seems like including an lseek+read code path is reasonable so that we needn't retire those oldest buildfarm critters. regards, tom lane
> On 20 Jul 2018, at 17:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> No objections, if you want to make the effort. But IMHO the lseek+read >> fallback is good enough on Windows. Unless you were thinking that we >> could then remove the !HAVE_PREAD fallback altogether. Are there any >> other platforms out there that don't have pread/pwrite that we care about? > > AFAICT, macOS has them as far back as we care about (prairiedog does). > HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep > the lseek+read workaround. Don't know about the oldest Solaris critters > we have in the buildfarm. FreeBSD has had 'em at least since 4.0 (1994); > didn't check the other BSDen. The OpenBSD box I have access to has pwrite/pread, and have had for some time if I read the manpage right. cheers ./daniel
On Fri, Jul 20, 2018 at 8:14 PM, Oskari Saarenmaa <os@ohmu.fi> wrote: > On Thu, Jul 12, 2018 at 01:55:31PM +1200, Thomas Munro wrote: >> [...] I'm not even sure I'd bother adding "At" to the >> function names. If there are any extensions that want the old >> interface they will fail to compile either way. Note that the BufFile >> interface remains the same, so hopefully that covers many use cases. > > IIRC I used the "At" suffixes in my first version of the patch before > completely removing the functions which didn't take an offset argument > Now that they're gone I agree that we could just drop the "At" suffix; > "at" suffix is also used by various POSIX functions to operate in a > specific directory which may just add to confusion. Done. Rebased. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Sat, Jul 21, 2018 at 3:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > No objections, if you want to make the effort. But IMHO the lseek+read > > fallback is good enough on Windows. Unless you were thinking that we > > could then remove the !HAVE_PREAD fallback altogether. Are there any > > other platforms out there that don't have pread/pwrite that we care about? > > AFAICT, macOS has them as far back as we care about (prairiedog does). > HPUX 10.20 (gaur/pademelon) does not, so personally I'd like to keep > the lseek+read workaround. Don't know about the oldest Solaris critters > we have in the buildfarm. FreeBSD has had 'em at least since 4.0 (1994); > didn't check the other BSDen. > > SUS v2 (POSIX 1997) does specify both functions, so we could insist on > their presence without breaking any of our own portability guidelines. > However, if we have to have some workaround anyway for Windows, it > seems like including an lseek+read code path is reasonable so that we > needn't retire those oldest buildfarm critters. Yeah it seems useful and cheap to carry the lseek() fallback. But actually there is a good reason to implement proper pread/pwrite (equivalent) on Windows: this patch removes the position tracking, so that the fallback code generates *more* lseek() calls than current master. For example with sequential reads today we are smart enough to skip redundant lseek() calls, but this patch removes those smarts. I doubt anyone cares about that on HPUX 10.20 but I don't think we should do that on Windows. -- Thomas Munro http://www.enterprisedb.com
Hi, On 07/26/2018 10:04 PM, Thomas Munro wrote: > Done. Rebased. > This needs a rebase again. Once resolved the patch passes make check-world, and a strace analysis shows the associated read()/write() have been turned into pread64()/pwrite64(). All lseek()'s are SEEK_END's. Best regards, Jesper
Hi, On 09/05/2018 02:42 PM, Jesper Pedersen wrote: > On 07/26/2018 10:04 PM, Thomas Munro wrote: >> Done. Rebased. >> > > This needs a rebase again. > Would it be of benefit to update these call sites * slru.c - SlruPhysicalReadPage - SlruPhysicalWritePage * xlogutils.c - XLogRead * pg_receivewal.c - FindStreamingStart * rewriteheap.c - heap_xlog_logical_rewrite * walreceiver.c - XLogWalRcvWrite too ? Best regards, Jesper
On Fri, Sep 7, 2018 at 2:17 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > > This needs a rebase again. Done. > Would it be of benefit to update these call sites > > * slru.c > - SlruPhysicalReadPage > - SlruPhysicalWritePage > * xlogutils.c > - XLogRead > * pg_receivewal.c > - FindStreamingStart > * rewriteheap.c > - heap_xlog_logical_rewrite > * walreceiver.c > - XLogWalRcvWrite It certainly wouldn't hurt... but more pressing to get this committed would be Windows support IMHO. I think the thing to do is to open files with the FILE_FLAG_OVERLAPPED flag, and then use ReadFile() and WriteFile() with an LPOVERLAPPED struct that holds an offset, but I'm not sure if I can write that myself. I tried doing some semi-serious Windows development for the fsyncgate patch using only AppVeyor CI a couple of weeks ago and it was like visiting the dentist. On Thu, Sep 6, 2018 at 6:42 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Once resolved the patch passes make check-world, and a strace analysis > shows the associated read()/write() have been turned into > pread64()/pwrite64(). All lseek()'s are SEEK_END's. Yeah :-) Just for fun, here is the truss output for a single pgbench transaction here: recvfrom(9,"B\0\0\0\^R\0P0_4\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 41 (0x29) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\nBEG"...,27,0,NULL,0) = 27 (0x1b) recvfrom(9,"B\0\0\0&\0P0_5\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 61 (0x3d) pread(22,"\0\0\0\0\^P\M^C\M-@P\0\0\0\0\M-X"...,8192,0x960a000) = 8192 (0x2000) pread(20,"\0\0\0\0\M^X\^D\M^Iq\0\0\0\0\^T"...,8192,0x380fe000) = 8192 (0x2000) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e) recvfrom(9,"B\0\0\0\^]\0P0_6\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 52 (0x34) sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,75,0,NULL,0) = 75 (0x4b) recvfrom(9,"B\0\0\0"\0P0_7\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 57 (0x39) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e) recvfrom(9,"B\0\0\0!\0P0_8\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 56 (0x38) lseek(29,0x0,SEEK_END) = 8192 (0x2000) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e) recvfrom(9,"B\0\0\0003\0P0_9\0\0\0\0\^D\0\0"...,8192,0,NULL,0x0) = 74 (0x4a) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\^OIN"...,32,0,NULL,0) = 32 (0x20) recvfrom(9,"B\0\0\0\^S\0P0_10\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 42 (0x2a) pwrite(33,"\M^X\M-P\^E\0\^A\0\0\0\0\M-`\M^A"...,16384,0x81e000) = 16384 (0x4000) fdatasync(0x21) = 0 (0x0) sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\vCOM"...,28,0,NULL,0) = 28 (0x1c) There is only one lseek() left. I actually have another patch that gets rid of even that (by caching sizes in SMgrRelation using a shared invalidation counter which I'm not yet sure about). Then pgbench's 7-round-trip transaction makes only the strictly necessary 18 syscalls (every one an explainable network message, disk page or sync). Unpatched master has 5 extra lseek()s. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Sep 19, 2018 at 1:48 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Sep 7, 2018 at 2:17 AM Jesper Pedersen > <jesper.pedersen@redhat.com> wrote: > > > This needs a rebase again. And again, due to the conflict with ppoll in AC_CHECK_FUNCS. -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi Thomas, On 9/18/18 9:48 PM, Thomas Munro wrote: > It certainly wouldn't hurt... but more pressing to get this committed > would be Windows support IMHO. I think the thing to do is to open > files with the FILE_FLAG_OVERLAPPED flag, and then use ReadFile() and > WriteFile() with an LPOVERLAPPED struct that holds an offset, but I'm > not sure if I can write that myself. I tried doing some semi-serious > Windows development for the fsyncgate patch using only AppVeyor CI a > couple of weeks ago and it was like visiting the dentist. > Sorry, no idea about this. Maybe Magnus can provide some feedback ? > On Thu, Sep 6, 2018 at 6:42 AM Jesper Pedersen >> Once resolved the patch passes make check-world, and a strace analysis >> shows the associated read()/write() have been turned into >> pread64()/pwrite64(). All lseek()'s are SEEK_END's. > > Yeah :-) Thanks for v5 too. Best regards, Jesper
On Fri, Sep 28, 2018 at 2:03 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Thanks for v5 too. Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! -- Thomas Munro http://www.enterprisedb.com
Attachment
On Fri, Sep 28, 2018 at 2:03 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Thanks for v5 too. Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! Yeah, I've been burnt by that too recently. It occurs to me we could make that at least a little less painful if we formatted the macro with one line per function name: AC_CHECK_FUNCS([ cbrt clock_gettime fdatasync ... wcstombs_l ]) You'd still get conflicts in configure itself, of course, but that doesn't require manual work to resolve -- just re-run autoconf. regards, tom lane
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! Yeah, I've been burnt by that too recently. It occurs to me we could make that at least a little less painful if we formatted the macro with one line per function name: AC_CHECK_FUNCS([ cbrt clock_gettime fdatasync ... wcstombs_l ]) You'd still get conflicts in configure itself, of course, but that doesn't require manual work to resolve -- just re-run autoconf. regards, tom lane
On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! > > Yeah, I've been burnt by that too recently. It occurs to me we could make > that at least a little less painful if we formatted the macro with one > line per function name: > > AC_CHECK_FUNCS([ > cbrt > clock_gettime > fdatasync > ... > wcstombs_l > ]) > > You'd still get conflicts in configure itself, of course, but that > doesn't require manual work to resolve -- just re-run autoconf. +1, was about to suggest the same! -- Thomas Munro http://www.enterprisedb.com
On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! > > Yeah, I've been burnt by that too recently. It occurs to me we could make > that at least a little less painful if we formatted the macro with one > line per function name: > > AC_CHECK_FUNCS([ > cbrt > clock_gettime > fdatasync > ... > wcstombs_l > ]) > > You'd still get conflicts in configure itself, of course, but that > doesn't require manual work to resolve -- just re-run autoconf. +1, was about to suggest the same! -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I've been burnt by that too recently. It occurs to me we could make >> that at least a little less painful if we formatted the macro with one >> line per function name: >> >> AC_CHECK_FUNCS([ >> cbrt >> clock_gettime >> fdatasync >> ... >> wcstombs_l >> ]) >> >> You'd still get conflicts in configure itself, of course, but that >> doesn't require manual work to resolve -- just re-run autoconf. > +1, was about to suggest the same! Sold, I'll go do it. regards, tom lane
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I've been burnt by that too recently. It occurs to me we could make >> that at least a little less painful if we formatted the macro with one >> line per function name: >> >> AC_CHECK_FUNCS([ >> cbrt >> clock_gettime >> fdatasync >> ... >> wcstombs_l >> ]) >> >> You'd still get conflicts in configure itself, of course, but that >> doesn't require manual work to resolve -- just re-run autoconf. > +1, was about to suggest the same! Sold, I'll go do it. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I've been burnt by that too recently. It occurs to me we could make >>> that at least a little less painful if we formatted the macro with one >>> line per function name: >> +1, was about to suggest the same! > Sold, I'll go do it. Learned a few new things about M4 along the way :-( ... but done. You'll need to rebase the pread patch again of course. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I've been burnt by that too recently. It occurs to me we could make >>> that at least a little less painful if we formatted the macro with one >>> line per function name: >> +1, was about to suggest the same! > Sold, I'll go do it. Learned a few new things about M4 along the way :-( ... but done. You'll need to rebase the pread patch again of course. regards, tom lane
On 10/08/2018 09:55 PM, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! > Yeah, I've been burnt by that too recently. It occurs to me we could make > that at least a little less painful if we formatted the macro with one > line per function name: > > AC_CHECK_FUNCS([ > cbrt > clock_gettime > fdatasync > ... > wcstombs_l > ]) > > You'd still get conflicts in configure itself, of course, but that > doesn't require manual work to resolve -- just re-run autoconf. > > By and large I think it's better not to submit patches with changes to configure, but to let the committer run autoconf. You can avoid getting such changes in your patches by doing something like this: git config diff.nodiff.command /bin/true echo configure diff=nodiff >> .git/info/attributes If you actually want to turn this off and see any diffs in configure, run git diff --no-ext-diff It's also possible to supply a filter expression to 'git diff'. OTOH, this will probably confuse the heck out of the cfbot patch checker. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-10-09 14:32:29 -0400, Andrew Dunstan wrote: > > > On 10/08/2018 09:55 PM, Tom Lane wrote: > > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > > Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! > > Yeah, I've been burnt by that too recently. It occurs to me we could make > > that at least a little less painful if we formatted the macro with one > > line per function name: > > > > AC_CHECK_FUNCS([ > > cbrt > > clock_gettime > > fdatasync > > ... > > wcstombs_l > > ]) > > > > You'd still get conflicts in configure itself, of course, but that > > doesn't require manual work to resolve -- just re-run autoconf. > > > > > > > > By and large I think it's better not to submit patches with changes to > configure, but to let the committer run autoconf. > OTOH, this will probably confuse the heck out of the cfbot patch checker. And make life harder for reviewers. -1 on this one. Greetings, Andres Freund
On 10/09/2018 02:37 PM, Andres Freund wrote: > On 2018-10-09 14:32:29 -0400, Andrew Dunstan wrote: >> >> On 10/08/2018 09:55 PM, Tom Lane wrote: >>> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>>> Rebased again. Patches that touch AC_CHECK_FUNCS are fun like that! >>> Yeah, I've been burnt by that too recently. It occurs to me we could make >>> that at least a little less painful if we formatted the macro with one >>> line per function name: >>> >>> AC_CHECK_FUNCS([ >>> cbrt >>> clock_gettime >>> fdatasync >>> ... >>> wcstombs_l >>> ]) >>> >>> You'd still get conflicts in configure itself, of course, but that >>> doesn't require manual work to resolve -- just re-run autoconf. >>> >>> >> >> >> By and large I think it's better not to submit patches with changes to >> configure, but to let the committer run autoconf. >> OTOH, this will probably confuse the heck out of the cfbot patch checker. > And make life harder for reviewers. > > -1 on this one. > Maybe I'm thinking back to the time when we used to use a bunch of old versions of autoconf ... cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 9, 2018 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Thomas Munro <thomas.munro@enterprisedb.com> writes: > >> On Tue, Oct 9, 2018 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Yeah, I've been burnt by that too recently. It occurs to me we could make > >>> that at least a little less painful if we formatted the macro with one > >>> line per function name: > > >> +1, was about to suggest the same! > > > Sold, I'll go do it. > > Learned a few new things about M4 along the way :-( ... but done. > You'll need to rebase the pread patch again of course. Thanks, much nicer. Rebased. -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi Thomas, On 10/9/18 4:56 PM, Thomas Munro wrote: > Thanks, much nicer. Rebased. > This still applies, and passes make check-world. I wonder what the commit policy is on this, if the Windows part isn't included. I read Heikki's comment [1] as it would be ok to commit benefiting all platforms that has pread/pwrite. The functions in [2] could be a follow-up patch as well. [1] https://www.postgresql.org/message-id/6cc7c8dd-29f9-7d75-d18a-99f19c076d10%40iki.fi [2] https://www.postgresql.org/message-id/c2f56d0a-cadd-3df1-ae48-b84dc8128c37%40redhat.com Best regards, Jesper
On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > This still applies, and passes make check-world. > > I wonder what the commit policy is on this, if the Windows part isn't > included. I read Heikki's comment [1] as it would be ok to commit > benefiting all platforms that has pread/pwrite. Here's a patch to add Windows support by supplying src/backend/port/win32/pread.c. Thoughts? -- Thomas Munro http://www.enterprisedb.com
Attachment
On Sun, Nov 4, 2018 at 12:03 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen > <jesper.pedersen@redhat.com> wrote: > > This still applies, and passes make check-world. > > > > I wonder what the commit policy is on this, if the Windows part isn't > > included. I read Heikki's comment [1] as it would be ok to commit > > benefiting all platforms that has pread/pwrite. > > Here's a patch to add Windows support by supplying > src/backend/port/win32/pread.c. Thoughts? If we do that, I suppose we might as well supply implementations for HP-UX 10.20 as well, and then we can get rid of the conditional macro stuff at various call sites and use pread() and pwrite() freely. Here's a version that does it that way. One question is whether the caveat mentioned in patch 0001 is acceptable. -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi Thomas, On 11/5/18 7:08 AM, Thomas Munro wrote: > On Sun, Nov 4, 2018 at 12:03 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Sat, Nov 3, 2018 at 2:07 AM Jesper Pedersen >> <jesper.pedersen@redhat.com> wrote: >>> This still applies, and passes make check-world. >>> >>> I wonder what the commit policy is on this, if the Windows part isn't >>> included. I read Heikki's comment [1] as it would be ok to commit >>> benefiting all platforms that has pread/pwrite. >> >> Here's a patch to add Windows support by supplying >> src/backend/port/win32/pread.c. Thoughts? > > If we do that, I suppose we might as well supply implementations for > HP-UX 10.20 as well, and then we can get rid of the conditional macro > stuff at various call sites and use pread() and pwrite() freely. > Here's a version that does it that way. One question is whether the > caveat mentioned in patch 0001 is acceptable. > Passed check-world, but I can't verify the 0001 patch. Reading the the API it looks ok to me. I guess the caveat in 0001 is ok, as it is a side-effect of the underlying API. Best regards, Jesper
On 2018-Nov-04, Thomas Munro wrote: > Here's a patch to add Windows support by supplying > src/backend/port/win32/pread.c. Thoughts? Hmm, so how easy is to detect that somebody runs read/write on fds where pread/pwrite have occurred? I guess for data files it's easy to detect since you'd quickly end up with corrupted files, but what about other kinds of files? I wonder if we should be worrying about using this interface somewhere other than fd.c and forgetting about the limitation. Say, what happens if we patch some place in xlog.c after this patch gets in, using write() instead of pwrite()? I suppose the safest approach is to use lseek (or whatever) to fix up the position after the pread/pwrite -- but we don't want to pay the price on an additional syscall. Are there any other options? Is there a way to prevent read/write from being used on a file handle? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Please remove Tell from line 18 in fd.h. To Küssnacht with him! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-Nov-04, Thomas Munro wrote: >> Here's a patch to add Windows support by supplying >> src/backend/port/win32/pread.c. Thoughts? > Hmm, so how easy is to detect that somebody runs read/write on fds where > pread/pwrite have occurred? I guess for data files it's easy to detect > since you'd quickly end up with corrupted files, but what about other > kinds of files? I wonder if we should be worrying about using this > interface somewhere other than fd.c and forgetting about the limitation. Yeah. I think the patch as presented is OK; it uses pread/pwrite only inside fd.c, which is a reasonably non-leaky abstraction. But there's definitely a hazard of somebody submitting a patch that depends on using pread/pwrite elsewhere, and then that maybe not working. What I suggest is that we *not* try to make this a completely transparent substitute. Instead, make the functions exported by src/port/ be "pg_pread" and "pg_pwrite", and inside fd.c we'd write something like #ifdef HAVE_PREAD #define pg_pread pread #endif and then refer to pg_pread/pg_pwrite in the body of that file. That way, if someone refers to pread and expects standard functionality from it, they'll get a failure on platforms not supporting it. FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly and pass the core regression tests. regards, tom lane
On Tue, Nov 6, 2018 at 5:07 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Please remove Tell from line 18 in fd.h. To Küssnacht with him! Thanks, done. But what is this arrow sticking through my Mac laptop's screen...? On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2018-Nov-04, Thomas Munro wrote: > >> Here's a patch to add Windows support by supplying > >> src/backend/port/win32/pread.c. Thoughts? > > > Hmm, so how easy is to detect that somebody runs read/write on fds where > > pread/pwrite have occurred? I guess for data files it's easy to detect > > since you'd quickly end up with corrupted files, but what about other > > kinds of files? I wonder if we should be worrying about using this > > interface somewhere other than fd.c and forgetting about the limitation. > > Yeah. I think the patch as presented is OK; it uses pread/pwrite only > inside fd.c, which is a reasonably non-leaky abstraction. But there's > definitely a hazard of somebody submitting a patch that depends on > using pread/pwrite elsewhere, and then that maybe not working. > > What I suggest is that we *not* try to make this a completely transparent > substitute. Instead, make the functions exported by src/port/ be > "pg_pread" and "pg_pwrite", and inside fd.c we'd write something like > > #ifdef HAVE_PREAD > #define pg_pread pread > #endif > > and then refer to pg_pread/pg_pwrite in the body of that file. That > way, if someone refers to pread and expects standard functionality > from it, they'll get a failure on platforms not supporting it. OK. But since we're using this from both fd.c and xlog.c, I put that into src/include/port.h. > FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly > and pass the core regression tests. Thanks. I also tested the replacements by temporarily hacking my configure script to look for the wrong function name: -ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread" +ac_fn_c_check_func "$LINENO" "preadx" "ac_cv_func_pread" -ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite" +ac_fn_c_check_func "$LINENO" "pwritex" "ac_cv_func_pwrite" -- Thomas Munro http://www.enterprisedb.com
Attachment
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What I suggest is that we *not* try to make this a completely transparent >> substitute. Instead, make the functions exported by src/port/ be >> "pg_pread" and "pg_pwrite", ... > OK. But since we're using this from both fd.c and xlog.c, I put that > into src/include/port.h. LGTM. I didn't bother to run an actual test cycle, since it's not materially different from the previous version as far as portability is concerned. regards, tom lane
Hi, On 11/5/18 9:10 PM, Thomas Munro wrote: > On Tue, Nov 6, 2018 at 5:07 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Please remove Tell from line 18 in fd.h. To Küssnacht with him! > > Thanks, done. But what is this arrow sticking through my Mac laptop's > screen...? > > On Tue, Nov 6, 2018 at 6:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> On 2018-Nov-04, Thomas Munro wrote: >>>> Here's a patch to add Windows support by supplying >>>> src/backend/port/win32/pread.c. Thoughts? >> >>> Hmm, so how easy is to detect that somebody runs read/write on fds where >>> pread/pwrite have occurred? I guess for data files it's easy to detect >>> since you'd quickly end up with corrupted files, but what about other >>> kinds of files? I wonder if we should be worrying about using this >>> interface somewhere other than fd.c and forgetting about the limitation. >> >> Yeah. I think the patch as presented is OK; it uses pread/pwrite only >> inside fd.c, which is a reasonably non-leaky abstraction. But there's >> definitely a hazard of somebody submitting a patch that depends on >> using pread/pwrite elsewhere, and then that maybe not working. >> >> What I suggest is that we *not* try to make this a completely transparent >> substitute. Instead, make the functions exported by src/port/ be >> "pg_pread" and "pg_pwrite", and inside fd.c we'd write something like >> >> #ifdef HAVE_PREAD >> #define pg_pread pread >> #endif >> >> and then refer to pg_pread/pg_pwrite in the body of that file. That >> way, if someone refers to pread and expects standard functionality >> from it, they'll get a failure on platforms not supporting it. > > OK. But since we're using this from both fd.c and xlog.c, I put that > into src/include/port.h. > >> FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly >> and pass the core regression tests. > Passes check-world, and includes the feedback on this thread. New status: Ready for Committer Best regards, Jesper
On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Passes check-world, and includes the feedback on this thread. > > New status: Ready for Committer Thanks! Pushed. I'll keep an eye on the build farm to see if anything breaks on Cygwin or some other frankenOS. -- Thomas Munro http://www.enterprisedb.com
Hi Thomas, On 11/6/18 4:04 PM, Thomas Munro wrote: > On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen > Thanks! Pushed. I'll keep an eye on the build farm to see if > anything breaks on Cygwin or some other frankenOS. > There is [1] on Andres' skink setup. Looking. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01 Best regards, Jesper
On 11/7/18 7:26 AM, Jesper Pedersen wrote: > Hi Thomas, > > On 11/6/18 4:04 PM, Thomas Munro wrote: >> On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen >> Thanks! Pushed. I'll keep an eye on the build farm to see if >> anything breaks on Cygwin or some other frankenOS. >> > > There is [1] on Andres' skink setup. Looking. > > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01 > > And lousyjack, which uses a slightly different way of calling valgrind, and thus got past initdb, found a bunch more: see <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01> cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 11/7/18 7:26 AM, Jesper Pedersen wrote: > On 11/6/18 4:04 PM, Thomas Munro wrote: >> On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen >> Thanks! Pushed. I'll keep an eye on the build farm to see if >> anything breaks on Cygwin or some other frankenOS. >> > > There is [1] on Andres' skink setup. Looking. > Attached is a reproducer. Adding the memset() command for the page makes valgrind happy. Thoughts on how to proceed with this ? The report in [1] shows that there are a number of call sites where the page(s) aren't fully initialized. [1] https://www.postgresql.org/message-id/3fe1e38a-fb70-6260-9300-ce67ede21c32%40redhat.com Best regards, Jesper
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 11/7/18 7:26 AM, Jesper Pedersen wrote: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01 > And lousyjack, which uses a slightly different way of calling valgrind, > and thus got past initdb, found a bunch more: > <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01> I'm confused by this. Surely the pwrite-based code is writing exactly the same data as before. Do we have to conclude that valgrind is complaining about passing uninitialized data to pwrite() when it did not complain about exactly the same thing for write()? [ looks ... ] No, what we have to conclude is that the write-related suppressions in src/tools/valgrind.supp need to be replaced or augmented with pwrite-related ones. regards, tom lane
On 11/7/18 9:30 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 11/7/18 7:26 AM, Jesper Pedersen wrote: >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-07%2001%3A01%3A01 >> And lousyjack, which uses a slightly different way of calling valgrind, >> and thus got past initdb, found a bunch more: >> <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-11-07%2001%3A33%3A01> > I'm confused by this. Surely the pwrite-based code is writing exactly the > same data as before. Do we have to conclude that valgrind is complaining > about passing uninitialized data to pwrite() when it did not complain > about exactly the same thing for write()? > > [ looks ... ] No, what we have to conclude is that the write-related > suppressions in src/tools/valgrind.supp need to be replaced or augmented > with pwrite-related ones. Yeah. I just trawled through the lousyjack logs and it looks like all the cases it reported could be handled by: { padding_XLogRecData_pwrite Memcheck:Param pwrite64(buf) ... fun:XLogWrite } cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Tom, On 11/7/18 9:30 AM, Tom Lane wrote: > I'm confused by this. Surely the pwrite-based code is writing exactly the > same data as before. Do we have to conclude that valgrind is complaining > about passing uninitialized data to pwrite() when it did not complain > about exactly the same thing for write()? > > [ looks ... ] No, what we have to conclude is that the write-related > suppressions in src/tools/valgrind.supp need to be replaced or augmented > with pwrite-related ones. > The attached patch fixes this for me. Unfortunately pwrite* doesn't work for the pwrite64(buf) line. Best regards, Jesper
Attachment
On 11/7/18 10:05 AM, Jesper Pedersen wrote: > Hi Tom, > > On 11/7/18 9:30 AM, Tom Lane wrote: >> I'm confused by this. Surely the pwrite-based code is writing >> exactly the >> same data as before. Do we have to conclude that valgrind is >> complaining >> about passing uninitialized data to pwrite() when it did not complain >> about exactly the same thing for write()? >> >> [ looks ... ] No, what we have to conclude is that the write-related >> suppressions in src/tools/valgrind.supp need to be replaced or augmented >> with pwrite-related ones. >> > > The attached patch fixes this for me. > > Unfortunately pwrite* doesn't work for the pwrite64(buf) line. > > Works for me. If there's no objection I will commit this. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 8, 2018 at 4:27 AM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 11/7/18 10:05 AM, Jesper Pedersen wrote: > > On 11/7/18 9:30 AM, Tom Lane wrote: > >> I'm confused by this. Surely the pwrite-based code is writing > >> exactly the > >> same data as before. Do we have to conclude that valgrind is > >> complaining > >> about passing uninitialized data to pwrite() when it did not complain > >> about exactly the same thing for write()? > >> > >> [ looks ... ] No, what we have to conclude is that the write-related > >> suppressions in src/tools/valgrind.supp need to be replaced or augmented > >> with pwrite-related ones. > > > > The attached patch fixes this for me. > > > > Unfortunately pwrite* doesn't work for the pwrite64(buf) line. > > Works for me. If there's no objection I will commit this. Thanks for adjusting that. I suppose I would have known about this if cfbot checked every patch with valgrind, which I might look into. I'm a little confused about how an uninitialised value originating in an OID list finishes up in an xlog buffer, considering that OIDs don't have padding. -- Thomas Munro http://www.enterprisedb.com