Thread: 2pc leaks fds
Hi, Using 2PC with master very quickly leads to: 2020-04-05 19:42:18.368 PDT [2298126][5/2009:0] LOG: out of file descriptors: Too many open files; release and retry 2020-04-05 19:42:18.368 PDT [2298126][5/2009:0] STATEMENT: COMMIT PREPARED 'ptx_2'; This started with: commit 0dc8ead46363fec6f621a12c7e1f889ba73b55a9 (HEAD -> master) Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Date: 2019-11-25 15:04:54 -0300 Refactor WAL file-reading code into WALRead() I found this while trying to benchmark the effect of my snapshot changes on 2pc. I just used the attached pgbench file. andres@awork3:~/build/postgres/dev-assert/vpath$ pgbench -n -s 500 -c 4 -j 4 -T 100000 -P1 -f ~/tmp/pgbench-write-2pc.sql progress: 1.0 s, 3723.8 tps, lat 1.068 ms stddev 0.305 client 2 script 0 aborted in command 8 query 0: ERROR: could not seek to end of file "base/14036/16396": Too many open files client 1 script 0 aborted in command 8 query 0: ERROR: could not seek to end of file "base/14036/16396": Too many open files client 3 script 0 aborted in command 8 query 0: ERROR: could not seek to end of file "base/14036/16396": Too many open files client 0 script 0 aborted in command 8 query 0: ERROR: could not seek to end of file "base/14036/16396": Too many open files transaction type: /home/andres/tmp/pgbench-write-2pc.sql I've not yet reviewed the change sufficiently to pinpoint the issue. It's a bit sad that nobody has hit this in the last few months :(. Greetings, Andres Freund
Attachment
On Sun, Apr 05, 2020 at 07:56:51PM -0700, Andres Freund wrote: > I found this while trying to benchmark the effect of my snapshot changes > on 2pc. I just used the attached pgbench file. > > I've not yet reviewed the change sufficiently to pinpoint the issue. Indeed. It takes seconds to show up. > It's a bit sad that nobody has hit this in the last few months :(. 2PC shines with the code of xlogreader.c in this case because it keeps opening and closing XLogReaderState for a short amount of time. So it is not surprising to me to see this error only months after the fact because recovery or pg_waldump just use one XLogReaderState. From what I can see, the error is that the code only bothers closing WALOpenSegment->seg when switching to a new segment, but we need also to close it when finishing the business in XLogReaderFree(). I am adding an open item, and attached is a patch to take care of the problem. Thoughts? -- Michael
Attachment
Hi, On 2020-04-06 14:26:48 +0900, Michael Paquier wrote: > 2PC shines with the code of xlogreader.c in this case because it keeps > opening and closing XLogReaderState for a short amount of time. So it > is not surprising to me to see this error only months after the fact > because recovery or pg_waldump just use one XLogReaderState. Well, it doesn't exactly signal that people (including me, up to just now) are testing their changes all that carefully... > From what I can see, the error is that the code only bothers closing > WALOpenSegment->seg when switching to a new segment, but we need also > to close it when finishing the business in XLogReaderFree(). Yea, I came to the same conclusion and locally fixed it the same way (except having the close a bit earlier in XLogReaderFree()). > diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c > index f3fea5132f..7e25e2050a 100644 > --- a/src/backend/access/transam/xlogreader.c > +++ b/src/backend/access/transam/xlogreader.c > @@ -144,6 +144,9 @@ XLogReaderFree(XLogReaderState *state) > if (state->main_data) > pfree(state->main_data); > > + if (state->seg.ws_file >= 0) > + close(state->seg.ws_file); > + > pfree(state->errormsg_buf); > if (state->readRecordBuf) > pfree(state->readRecordBuf); But I'm not sure it's quite the right idea. I'm not sure I fully understand the design of 0dc8ead46, but it looks to me like it's intended to allow users of the interface to have different ways of opening files. If we just close() the fd that'd be a bit more limited. OTOH, I think all but one (XLogPageRead()) of the current users of XLogReader use WALRead(), which also close()s the fd (before calling the WALSegmentOpen callback). The XLogReader code flow has gotten quite complicated :(. XLogReaderReadRecord()-> state->read_page() -> logical_read_xlog_page etc -> WALRead() -> wal_segment_open callback etc. There's been a fair bit of change, making the interface more generic / powerful / reducing duplication, but not a lot of added / adapted comments in the header... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> wrote: > > From what I can see, the error is that the code only bothers closing > > WALOpenSegment->seg when switching to a new segment, but we need also > > to close it when finishing the business in XLogReaderFree(). > > Yea, I came to the same conclusion and locally fixed it the same way > (except having the close a bit earlier in XLogReaderFree()). It's still not quite clear to me why the problem starts to appear after 0dc8ead46. This patch does not remove any close() call from XLogReaderFree(). > But I'm not sure it's quite the right idea. I'm not sure I fully > understand the design of 0dc8ead46, but it looks to me like it's > intended to allow users of the interface to have different ways of > opening files. If we just close() the fd that'd be a bit more limited. It should have allowed users to have different ways to *locate the segment* file. The WALSegmentOpen callback could actually return file path instead of the file descriptor and let WALRead() perform the opening/closing, but then the WALRead function would need to be aware whether it is executing in backend or in frontend (so it can use the correct function to open/close the file). I was aware of the problem that the correct function should be used to open the file and that's why this comment was added (although "mandatory" would be more suitable than "preferred"): * BasicOpenFile() is the preferred way to open the segment file in backend * code, whereas open(2) should be used in frontend. */ typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt, TimeLineID *tli_p); -- Antonin Houska Web: https://www.cybertec-postgresql.com
Hi, On 2020-04-06 09:12:32 +0200, Antonin Houska wrote: > Andres Freund <andres@anarazel.de> wrote: > > > > From what I can see, the error is that the code only bothers closing > > > WALOpenSegment->seg when switching to a new segment, but we need also > > > to close it when finishing the business in XLogReaderFree(). > > > > Yea, I came to the same conclusion and locally fixed it the same way > > (except having the close a bit earlier in XLogReaderFree()). > > It's still not quite clear to me why the problem starts to appear after > 0dc8ead46. This patch does not remove any close() call from XLogReaderFree(). Before that change the file was also kind of leaked, but would use the same static variable to store the fd and thus close it. Greetings, Andres Freund
Hi, I pushed a fix. While it might not be the best medium/long term fix, it unbreaks 2PC. Perhaps we should add an open item to track whether we want to fix this differently? On 2020-04-06 09:12:32 +0200, Antonin Houska wrote: > Andres Freund <andres@anarazel.de> wrote: > It should have allowed users to have different ways to *locate the segment* > file. The WALSegmentOpen callback could actually return file path instead of > the file descriptor and let WALRead() perform the opening/closing, but then > the WALRead function would need to be aware whether it is executing in backend > or in frontend (so it can use the correct function to open/close the file). > > I was aware of the problem that the correct function should be used to open > the file and that's why this comment was added (although "mandatory" would be > more suitable than "preferred"): > > * BasicOpenFile() is the preferred way to open the segment file in backend > * code, whereas open(2) should be used in frontend. > */ > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt, > TimeLineID *tli_p); I don't think that BasicOpenFile() really solves anything here? If anything it *exascerbates* the problem, because it will trigger all of the "virtual file descriptors" for already opened Files to close() the underlying OS FDs. So not even a fully cached table can be seqscanned, because that tries to check the file size... Greetings, Andres Freund
On Tue, Apr 07, 2020 at 05:12:49PM -0700, Andres Freund wrote: > I pushed a fix. While it might not be the best medium/long term fix, it > unbreaks 2PC. Perhaps we should add an open item to track whether we > want to fix this differently? Sounds fine to me. I have updated the open item that we have now by adding a comment that the leak has been fixed by 91c4054, but that we should revisit the refactoring. -- Michael
Attachment
Antonin Houska <ah@cybertec.at> wrote: > Andres Freund <andres@anarazel.de> wrote: > > But I'm not sure it's quite the right idea. I'm not sure I fully > > understand the design of 0dc8ead46, but it looks to me like it's > > intended to allow users of the interface to have different ways of > > opening files. If we just close() the fd that'd be a bit more limited. > > It should have allowed users to have different ways to *locate the segment* > file. The WALSegmentOpen callback could actually return file path instead of > the file descriptor and let WALRead() perform the opening/closing, but then > the WALRead function would need to be aware whether it is executing in backend > or in frontend (so it can use the correct function to open/close the file). Well, #ifdef FRONTEND can be used to distinguish the caller of WALRead(). However now that I tried to adjust the API, I see that pg_waldump.c:WALDumpOpenSegment uses specific logic to open the file. So if the callback only returned the file name, there would be no suitable place for the things that WALDumpOpenSegment does. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Andres Freund <andres@anarazel.de> wrote: > On 2020-04-06 09:12:32 +0200, Antonin Houska wrote: > > Andres Freund <andres@anarazel.de> wrote: > > It should have allowed users to have different ways to *locate the segment* > > file. The WALSegmentOpen callback could actually return file path instead of > > the file descriptor and let WALRead() perform the opening/closing, but then > > the WALRead function would need to be aware whether it is executing in backend > > or in frontend (so it can use the correct function to open/close the file). > > > > I was aware of the problem that the correct function should be used to open > > the file and that's why this comment was added (although "mandatory" would be > > more suitable than "preferred"): > > > > * BasicOpenFile() is the preferred way to open the segment file in backend > > * code, whereas open(2) should be used in frontend. > > */ > > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt, > > TimeLineID *tli_p); > > I don't think that BasicOpenFile() really solves anything here? If > anything it *exascerbates* the problem, because it will trigger all of > the "virtual file descriptors" for already opened Files to close() the > underlying OS FDs. So not even a fully cached table can be seqscanned, > because that tries to check the file size... Specifically for 2PC, isn't it better to close some OS-level FD of an unrelated table scan and then succeed than to ERROR immediately? Anyway, 0dc8ead46 hasn't changed this. I at least admit that the comment should not recommend particular function, and that WALRead() should call the appropriate function to close the file, rather than always calling close(). Can we just pass the existing FD to the callback as an additional argument, and let the callback close it? -- Antonin Houska Web: https://www.cybertec-postgresql.com
I have tested with and without the commit from Andres using the pgbench script (below) provided in the initial email.
pgbench -n -s 500 -c 4 -j 4 -T 100000 -P1 -f pgbench-write-2pc.sql
I am not getting the leak anymore, it seems to be holding up pretty well.
On Wed, Apr 8, 2020 at 12:59 PM Antonin Houska <ah@cybertec.at> wrote:
Andres Freund <andres@anarazel.de> wrote:
> On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> > Andres Freund <andres@anarazel.de> wrote:
> > It should have allowed users to have different ways to *locate the segment*
> > file. The WALSegmentOpen callback could actually return file path instead of
> > the file descriptor and let WALRead() perform the opening/closing, but then
> > the WALRead function would need to be aware whether it is executing in backend
> > or in frontend (so it can use the correct function to open/close the file).
> >
> > I was aware of the problem that the correct function should be used to open
> > the file and that's why this comment was added (although "mandatory" would be
> > more suitable than "preferred"):
> >
> > * BasicOpenFile() is the preferred way to open the segment file in backend
> > * code, whereas open(2) should be used in frontend.
> > */
> > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt,
> > TimeLineID *tli_p);
>
> I don't think that BasicOpenFile() really solves anything here? If
> anything it *exascerbates* the problem, because it will trigger all of
> the "virtual file descriptors" for already opened Files to close() the
> underlying OS FDs. So not even a fully cached table can be seqscanned,
> because that tries to check the file size...
Specifically for 2PC, isn't it better to close some OS-level FD of an
unrelated table scan and then succeed than to ERROR immediately? Anyway,
0dc8ead46 hasn't changed this.
I at least admit that the comment should not recommend particular function,
and that WALRead() should call the appropriate function to close the file,
rather than always calling close().
Can we just pass the existing FD to the callback as an additional argument,
and let the callback close it?
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
On 2020-Apr-08, Antonin Houska wrote: > Specifically for 2PC, isn't it better to close some OS-level FD of an > unrelated table scan and then succeed than to ERROR immediately? Anyway, > 0dc8ead46 hasn't changed this. I think for full generality of the interface, we pass a "close" callback in addition to the "open" callback. But if we were to pass it only for WALRead, then there would be no way to call it during XLogReaderFree. I think the fix Andres applied is okay as far as it goes, but for the long term we may want to change the interface even further -- maybe by having these functions be part of the XLogReader state struct. I have this code paged out of my head ATM, but maybe tomorrow I can give it a little think. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Concretely, I propose to have a new struct like typedef struct xlogReaderFuncs { XLogPageReadCB read_page; XLogSegmentOpenCB open_segment; XLogSegmentCloseCB open_segment; } xlogReaderFuncs; #define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__} and then invoke it something like xlogreader = XLogReaderAllocate(wal_segment_size, NULL, XLOGREADER_FUNCS(.readpage = &read_local_xlog_page, .opensegment = &wal_segment_open), .closesegment = &wal_segment_close), NULL); (with suitable definitions for XLogSegmentOpenCB etc) so that the support functions are all available at the xlogreader level, instead of "open" being buried at the read-page level. Any additional support functions can be added easily. This would give xlogreader a simpler interface. If people like this, I could make this change for pg13 and avoid changing the API again in pg14. Thougths? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-04-22 13:57:54 -0400, Alvaro Herrera wrote: > Concretely, I propose to have a new struct like > > typedef struct xlogReaderFuncs > { > XLogPageReadCB read_page; > XLogSegmentOpenCB open_segment; > XLogSegmentCloseCB open_segment; > } xlogReaderFuncs; > > #define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__} Not sure I quite see the point of that helper macro... > and then invoke it something like > > xlogreader = XLogReaderAllocate(wal_segment_size, NULL, > XLOGREADER_FUNCS(.readpage = &read_local_xlog_page, > .opensegment = &wal_segment_open), > .closesegment = &wal_segment_close), > NULL); > > (with suitable definitions for XLogSegmentOpenCB etc) so that the > support functions are all available at the xlogreader level, instead of > "open" being buried at the read-page level. Any additional support > functions can be added easily. > > This would give xlogreader a simpler interface. My first reaction was that this looks like it'd make it harder to read WAL from memory. But that's not really a problem, since opensegment/closesegment don't have to do anything. I think reducing the levels of indirection around xlogreader would be a good idea. The control flow currently is *really* complicated: With the page read callback at the xlogreader level, as well as separate callbacks set from within the page read callback and passed to WALRead(). And even though the WALOpenSegment, WALSegmentContext are really private to WALRead, not XLogReader as a whole, they are members of XLogReaderState. I think the PG13 changes made it considerably harder to understand xlogreader / xlogreader using code. Note that the WALOpenSegment callback currently does not have access to XLogReaderState->private_data, which I think is a pretty significant new restriction. Afaict it's not nicely possible anymore to have two xlogreaders inside the the same process that read from different data directories or other cases where opening the segment requires context information. > If people like this, I could make this change for pg13 and avoid > changing the API again in pg14. I'm in favor of doing so. Not necessarily primarily to avoid repeated API changes, but because I don't think the v13 changes went in the quite right direction. ISTM that we should: - have the three callbacks you mention above - change WALSegmentOpen to also get the XLogReaderState - add private state to WALOpenSegment, so it can be used even when not accessing data in files / when one needs more information to close the file. - disambiguate between WALOpenSegment (struct describing an open segment) and WALSegmentOpen (callback to open a segment) (note that the read page callback uses a *CB naming, why not follow?) Greetings, Andres Freund
On 2020-Apr-22, Andres Freund wrote: > On 2020-04-22 13:57:54 -0400, Alvaro Herrera wrote: > > Concretely, I propose to have a new struct like > > > > typedef struct xlogReaderFuncs > > { > > XLogPageReadCB read_page; > > XLogSegmentOpenCB open_segment; > > XLogSegmentCloseCB open_segment; > > } xlogReaderFuncs; > > > > #define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__} > > Not sure I quite see the point of that helper macro... Avoid the ugly cast -- same discussion we had for ARCHIVE_OPTS in pg_dump code in commit f831d4accda0. > ISTM that we should: > - have the three callbacks you mention above > - change WALSegmentOpen to also get the XLogReaderState > - add private state to WALOpenSegment, so it can be used even when not > accessing data in files / when one needs more information to close the > file. > - disambiguate between WALOpenSegment (struct describing an open > segment) and WALSegmentOpen (callback to open a segment) (note that > the read page callback uses a *CB naming, why not follow?) Sounds good. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Apr-22, Andres Freund wrote: > I'm in favor of doing so. Not necessarily primarily to avoid repeated > API changes, but because I don't think the v13 changes went in the quite > right direction. > > ISTM that we should: > - have the three callbacks you mention above > - change WALSegmentOpen to also get the XLogReaderState > - add private state to WALOpenSegment, so it can be used even when not > accessing data in files / when one needs more information to close the > file. > - disambiguate between WALOpenSegment (struct describing an open > segment) and WALSegmentOpen (callback to open a segment) (note that > the read page callback uses a *CB naming, why not follow?) Here's a first attempt at that. The segment_open/close callbacks are now given at XLogReaderAllocate time, and are passed the XLogReaderState pointer. I wrote a comment to explain that the page_read callback can use WALRead() if it wishes to do so; but if it does, then segment_open has to be provided. segment_close is mandatory (since we call it at XLogReaderFree). Of the half a dozen cases that exist, three are slightly weird: * Physical walsender does not use a xlogreader at all. I think we could beat that code up so that it does. But for the moment I just cons up a fake xlogreader, which only has the segment_open pointer set up, so that it can call WALRead. * main xlog.c uses an xlogreader with XLogPageRead(), which does not use WALRead. Therefore it does not pass open_segment. It does not use xlogreader->seg.ws_file either. Eventually we may want to beat this one up also. * pg_rewind has its own page read callback, SimpleXLogPageRead, which does all the required opening and closing. I don't think it'd be an improvement to force this to use segment_open. Oddly enough, it calls itself "simple" but is unique in having the ability to read files from the wal archive. All tests are passing for me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Apr-22, Andres Freund wrote: > > > I'm in favor of doing so. Not necessarily primarily to avoid repeated > > API changes, but because I don't think the v13 changes went in the quite > > right direction. > > > > ISTM that we should: > > - have the three callbacks you mention above > > - change WALSegmentOpen to also get the XLogReaderState > > - add private state to WALOpenSegment, so it can be used even when not > > accessing data in files / when one needs more information to close the > > file. > > - disambiguate between WALOpenSegment (struct describing an open > > segment) and WALSegmentOpen (callback to open a segment) (note that > > the read page callback uses a *CB naming, why not follow?) > > Here's a first attempt at that. The segment_open/close callbacks are > now given at XLogReaderAllocate time, and are passed the XLogReaderState > pointer. I wrote a comment to explain that the page_read callback can > use WALRead() if it wishes to do so; but if it does, then segment_open > has to be provided. segment_close is mandatory (since we call it at > XLogReaderFree). > > Of the half a dozen cases that exist, three are slightly weird: > > * Physical walsender does not use a xlogreader at all. I think we could > beat that code up so that it does. But for the moment I just cons up > a fake xlogreader, which only has the segment_open pointer set up, so > that it can call WALRead. > > * main xlog.c uses an xlogreader with XLogPageRead(), which does not use > WALRead. Therefore it does not pass open_segment. It does not use > xlogreader->seg.ws_file either. Eventually we may want to beat this > one up also. > > * pg_rewind has its own page read callback, SimpleXLogPageRead, which > does all the required opening and closing. I don't think it'd be an > improvement to force this to use segment_open. Oddly enough, it calls > itself "simple" but is unique in having the ability to read files from > the wal archive. > > All tests are passing for me. I modestly object to such many call-back functions. FWIW I'm writing this with [1] in my mind. An open-callback is bound to a read-callback. A close-callback is bound to the way the read-callback opens a segment (or the open-callback). I'm afraid that only adding "cleanup" callback might be sufficient. [1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-Apr-24, Kyotaro Horiguchi wrote: > At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > > Here's a first attempt at that. The segment_open/close callbacks are > > now given at XLogReaderAllocate time, and are passed the XLogReaderState > > pointer. I wrote a comment to explain that the page_read callback can > > use WALRead() if it wishes to do so; but if it does, then segment_open > > has to be provided. segment_close is mandatory (since we call it at > > XLogReaderFree). > I modestly object to such many call-back functions. FWIW I'm writing > this with [1] in my mind. > [1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com Hmm. Looking at your 0001, I think there's nothing in that patch that's not compatible with my proposed API change. 0002 is a completely different story of course; but that patch is a radical change of spirit for xlogreader, in the sense that it's no longer a "reader", but rather just an interpreter of bytes from a WAL byte sequence into WAL records; and shifts the responsibility of the actual reading to the caller. That's why xlogreader no longer receives the page_read callback (and why it doesn't need the segment_open, segment_close callbacks). I have to admit that until today I hadn't realized that that's what your patch series was doing. I'm not familiar with how you intend to implement WAL encryption on top of this, but on first blush I'm not liking this proposed design too much. > An open-callback is bound to a read-callback. A close-callback is > bound to the way the read-callback opens a segment (or the > open-callback). I'm afraid that only adding "cleanup" callback might > be sufficient. Well, the complaint is that the current layering is weird, in that there are two levels at which we pass callbacks: one is XLogReaderAllocate, where you specify the page_read callback; and the other layer is inside the page_read callback, if that layer uses the WALRead auxiliary function. The thing that my patch is doing is pass all three callbacks at the XLogReaderAllocate level. So when xlogreader drills down to read_page, xlogreader already has the segment_open callback handy if it needs it. Conceptually, this seems more sensible. I think a "cleanup" callback might also be sensible in general terms, but we have a problem with the specifics -- namely that the "state" that we need to clean up (the file descriptor of the open segment) is part of xlogreader's state. And we obviously cannot remove the FD from XLogReaderState, because when we need the FD to do things with it to obtain data from the file. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Apr-24, Kyotaro Horiguchi wrote: > > > At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > > > > Here's a first attempt at that. The segment_open/close callbacks are > > > now given at XLogReaderAllocate time, and are passed the XLogReaderState > > > pointer. I wrote a comment to explain that the page_read callback can > > > use WALRead() if it wishes to do so; but if it does, then segment_open > > > has to be provided. segment_close is mandatory (since we call it at > > > XLogReaderFree). > > > I modestly object to such many call-back functions. FWIW I'm writing > > this with [1] in my mind. > > [1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com > > Hmm. Looking at your 0001, I think there's nothing in that patch that's > not compatible with my proposed API change. > > 0002 is a completely different story of course; but that patch is a > radical change of spirit for xlogreader, in the sense that it's no > longer a "reader", but rather just an interpreter of bytes from a WAL > byte sequence into WAL records; and shifts the responsibility of the > actual reading to the caller. That's why xlogreader no longer receives > the page_read callback (and why it doesn't need the segment_open, > segment_close callbacks). Sorry for the ambiguity, I didn't meant I minded that this conflicts with my patch or I don't want this to be committed. It is easily rebased on this patch. What I was anxious about is that the new callback struct might be too flexible than required. So I "mildly" objected, and I won't be dissapointed if this patch is committed. > I have to admit that until today I hadn't realized that that's what your > patch series was doing. I'm not familiar with how you intend to > implement WAL encryption on top of this, but on first blush I'm not > liking this proposed design too much. Right. I might be too much in detail, but it simplifies the call tree. Anyway that is another discussion, though:) > > An open-callback is bound to a read-callback. A close-callback is > > bound to the way the read-callback opens a segment (or the > > open-callback). I'm afraid that only adding "cleanup" callback might > > be sufficient. > > Well, the complaint is that the current layering is weird, in that there > are two levels at which we pass callbacks: one is XLogReaderAllocate, > where you specify the page_read callback; and the other layer is inside > the page_read callback, if that layer uses the WALRead auxiliary > function. The thing that my patch is doing is pass all three callbacks > at the XLogReaderAllocate level. So when xlogreader drills down to > read_page, xlogreader already has the segment_open callback handy if it > needs it. Conceptually, this seems more sensible. It looks like as if the open/read/close-callbacks are generic and on the same interface layer, but actually open-callback is dedicate to WALRead and it is useless when the read-callback doesn't use WALRead. What I was anxious about is that the open-callback is uselessly exposing the secret of the read-callback. > I think a "cleanup" callback might also be sensible in general terms, > but we have a problem with the specifics -- namely that the "state" that > we need to clean up (the file descriptor of the open segment) is part of > xlogreader's state. And we obviously cannot remove the FD from > XLogReaderState, because when we need the FD to do things with it to > obtain data from the file. I meant concretely that we only have read- and cleanup- callbacks in xlogreader state. The caller of XLogReaderAllocate specifies the cleanup-callback that is to be used to clean up what the reader-callback left behind, in the same manner with this patch does. The only reason it is not named close-callback is that it is used only when the xlogreader-state is destroyed. So I'm fine with read/close-callbacks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-Apr-27, Kyotaro Horiguchi wrote: > At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > Sorry for the ambiguity, I didn't meant I minded that this conflicts > with my patch or I don't want this to be committed. It is easily > rebased on this patch. What I was anxious about is that the new > callback struct might be too flexible than required. So I "mildly" > objected, and I won't be dissapointed if this patch is committed. ... well, yeah, maybe it is too flexible. And perhaps we could further tweak this interface so that the file descriptor is not part of XLogReader at all -- with such a change, it would make more sense to worry about the "close" callback not being "close" but something like "cleanup", as you suggest. But right now, and thinking from the point of view of going into postgres 13 beta shortly, it seems to me that XLogReader is just a very leaky abstraction since both itself and its users are aware of the fact that there is a file descriptor. Maybe with your rework for encryption you'll want to remove the FD from XLogReader at all, and move it elsewhere. Or maybe not. But it seems to me that my suggested approach is sensible, and better than the current situation. (Let's keep in mind that the primary concern here is that the callstack is way too complicated -- you ask XlogReader for data, it calls your Read callback, that one calls WALRead passing your openSegment callback and stuffs the FD in XLogReaderState ... a sieve it is, the way it leaks, not an abstraction.) > > I have to admit that until today I hadn't realized that that's what your > > patch series was doing. I'm not familiar with how you intend to > > implement WAL encryption on top of this, but on first blush I'm not > > liking this proposed design too much. > > Right. I might be too much in detail, but it simplifies the call > tree. Anyway that is another discussion, though:) Okay. We can discuss further changes later, of course. > It looks like as if the open/read/close-callbacks are generic and on > the same interface layer, but actually open-callback is dedicate to > WALRead and it is useless when the read-callback doesn't use > WALRead. What I was anxious about is that the open-callback is > uselessly exposing the secret of the read-callback. Well, I don't think we care about that. WALRead can be thought of as just a helper function that you may use to write your read callback. The comments I added explain this. > I meant concretely that we only have read- and cleanup- callbacks in > xlogreader state. The caller of XLogReaderAllocate specifies the > cleanup-callback that is to be used to clean up what the > reader-callback left behind, in the same manner with this patch does. > The only reason it is not named close-callback is that it is used only > when the xlogreader-state is destroyed. So I'm fine with > read/close-callbacks. We can revisit the current design in the future. For example for encryption we might decide to remove the current-open-segment FD from XLogReaderState and then things might be different. (I think the current design is based a lot on historical code, rather than being optimal.) Since your objection isn't strong, I propose to commit same patch as before, only rebased as conflicted with cd123234404e and this comment prologuing WALRead: /* * Helper function to ease writing of XLogRoutine->page_read callbacks. * If this function is used, caller must supply an open_segment callback in * 'state', as that is used here. [... rest is same as before ...] -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
At Thu, 7 May 2020 19:28:55 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Apr-27, Kyotaro Horiguchi wrote: > > > At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > > > Sorry for the ambiguity, I didn't meant I minded that this conflicts > > with my patch or I don't want this to be committed. It is easily > > rebased on this patch. What I was anxious about is that the new > > callback struct might be too flexible than required. So I "mildly" > > objected, and I won't be dissapointed if this patch is committed. > > ... well, yeah, maybe it is too flexible. And perhaps we could further > tweak this interface so that the file descriptor is not part of > XLogReader at all -- with such a change, it would make more sense to > worry about the "close" callback not being "close" but something like > "cleanup", as you suggest. But right now, and thinking from the point > of view of going into postgres 13 beta shortly, it seems to me that > XLogReader is just a very leaky abstraction since both itself and its > users are aware of the fact that there is a file descriptor. Agreed. > Maybe with your rework for encryption you'll want to remove the FD from > XLogReader at all, and move it elsewhere. Or maybe not. But it seems > to me that my suggested approach is sensible, and better than the > current situation. (Let's keep in mind that the primary concern here is > that the callstack is way too complicated -- you ask XlogReader for > data, it calls your Read callback, that one calls WALRead passing your > openSegment callback and stuffs the FD in XLogReaderState ... a sieve it > is, the way it leaks, not an abstraction.) I agree that new callback functions is most sensible for getting into 13, of course. > > > I have to admit that until today I hadn't realized that that's what your > > > patch series was doing. I'm not familiar with how you intend to > > > implement WAL encryption on top of this, but on first blush I'm not > > > liking this proposed design too much. > > > > Right. I might be too much in detail, but it simplifies the call > > tree. Anyway that is another discussion, though:) > > Okay. We can discuss further changes later, of course. > > > It looks like as if the open/read/close-callbacks are generic and on > > the same interface layer, but actually open-callback is dedicate to > > WALRead and it is useless when the read-callback doesn't use > > WALRead. What I was anxious about is that the open-callback is > > uselessly exposing the secret of the read-callback. > > Well, I don't think we care about that. WALRead can be thought of as > just a helper function that you may use to write your read callback. > The comments I added explain this. Thanks. > > I meant concretely that we only have read- and cleanup- callbacks in > > xlogreader state. The caller of XLogReaderAllocate specifies the > > cleanup-callback that is to be used to clean up what the > > reader-callback left behind, in the same manner with this patch does. > > The only reason it is not named close-callback is that it is used only > > when the xlogreader-state is destroyed. So I'm fine with > > read/close-callbacks. > > We can revisit the current design in the future. For example for > encryption we might decide to remove the current-open-segment FD from > XLogReaderState and then things might be different. (I think the > current design is based a lot on historical code, rather than being > optimal.) > > Since your objection isn't strong, I propose to commit same patch as > before, only rebased as conflicted with cd123234404e and this comment > prologuing WALRead: > > /* > * Helper function to ease writing of XLogRoutine->page_read callbacks. > * If this function is used, caller must supply an open_segment callback in > * 'state', as that is used here. > [... rest is same as before ...] I agree to the direction of this patch. Thanks for the explanation. The patch looks good to me except the two points below. + /* XXX for xlogreader use, we'd call XLogBeginRead+XLogReadRecord here */ + if (!WALRead(&fake_xlogreader, + &output_message.data[output_message.len], I'm not sure the point of the XXX comment, but I think WALRead here is the right thing and we aren't going to use XLogBeginRead+XLogReadRecord here. So it seems to me the comment is misleading and instead we need such a comment for fake_xlogreader like this. + static XLogReaderState fake_xlogreader = + { + /* fake reader state only to let WALRead to use the callbacks */ wal_segment_close(XlogReaderState *state) is setting state->seg.ws_file to -1. On the other hand wal_segment_close(state,..) doesn't update ws_file and the caller sets the returned value to (eventually) the same field. + seg->ws_file = state->routine.segment_open(state, nextSegNo, + segcxt, &tli); If you are willing to do so, I think it is better to make the callback functions are responsible to update the seg.ws_file and the callers don't care. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-May-08, Kyotaro Horiguchi wrote: > I agree to the direction of this patch. Thanks for the explanation. > The patch looks good to me except the two points below. Thanks! I pushed the patch. I fixed the walsender commentary as you suggested, but I'm still of the opinion that we might want to use the XLogReader abstraction in physical walsender than work without it; if nothing else, that would simplify WALRead's API. I didn't change this one though: > wal_segment_close(XlogReaderState *state) is setting > state->seg.ws_file to -1. On the other hand wal_segment_close(state,..) > doesn't update ws_file and the caller sets the returned value to > (eventually) the same field. > > + seg->ws_file = state->routine.segment_open(state, nextSegNo, > + segcxt, &tli); > > If you are willing to do so, I think it is better to make the callback > functions are responsible to update the seg.ws_file and the callers > don't care. I agree that this would be a good idea, but it's more than just a handful of lines of changes so I think we should consider it separately. Attached as 0002. I also realized while doing this that we can further simplify WALRead()'s API if we're willing to bend walsender a little bit more into the fake xlogreader thing; that's 0001. I marked the open item closed nonetheless. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2020-May-08, Kyotaro Horiguchi wrote: > > > I agree to the direction of this patch. Thanks for the explanation. > > The patch looks good to me except the two points below. > > Thanks! I pushed the patch. I tried to follow the discussion but haven't had better idea. This looks better than the previous version. Thanks! While looking at the changes, I've noticed a small comment issue in the XLogReaderRoutine structure definition: * "tli_p" is an input/output argument. XLogRead() uses it to pass the The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This incorrect comment was actually introduced by that commit.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
Hi Antonin, thanks for the review. On 2020-May-11, Antonin Houska wrote: > While looking at the changes, I've noticed a small comment issue in the > XLogReaderRoutine structure definition: > > * "tli_p" is an input/output argument. XLogRead() uses it to pass the > > The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This > incorrect comment was actually introduced by that commit.) Ah. I'll fix this, thanks for pointing it out. (It might be that the TLI situation can be improved with some callback, too.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services