Thread: log_newpage header comment
It seems that in implementing ginbuildempty(), I falsified the first "note" in the header comment for log_newpage(): * Note: all current callers build pages in private memory and write them* directly to smgr, rather than using bufmgr. Thereforethere is no need* to pass a buffer ID to XLogInsert, nor to perform MarkBufferDirty within* the critical section. Mea culpa, mea culpa, mea maxima culpa. So, this leads to a couple of questions: 1. Considering that we're logging the entire page, is it necessary (or even desirable) to include the buffer ID in the rdata structure? If so, why? To put that another way, does my abuse of log_newpage constitute a bug in gistbuildempty()? 2. Should we add a new function that does the same thing as log_newpage for a shared buffer? I'm imagining that the signature would be: XLogRecPtr log_newpage_buffer(RelFileNode *rnode, Buffer buffer); Admittedly, it may seem that we don't quite need such a function to cater to just one caller, but I think we might have more in the future. Among other things, it occurs to me to wonder whether I ought to rewrite all the ambuildempty routines in the style of ginbuildempty(), to avoid having to fsync in the foreground. Even if not, I think there might be other applications for this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > It seems that in implementing ginbuildempty(), I falsified the first > "note" in the header comment for log_newpage(): > * Note: all current callers build pages in private memory and write them > * directly to smgr, rather than using bufmgr. Therefore there is no need > * to pass a buffer ID to XLogInsert, nor to perform MarkBufferDirty within > * the critical section. > 1. Considering that we're logging the entire page, is it necessary (or > even desirable) to include the buffer ID in the rdata structure? If > so, why? To put that another way, does my abuse of log_newpage > constitute a bug in gistbuildempty()? AFAICS, not passing the buffer ID to XLogInsert is not an issue, since we are logging the whole page in any case. However, failing to perform MarkBufferDirty within the critical section definitely is an issue. > 2. Should we add a new function that does the same thing as > log_newpage for a shared buffer? I'm imagining that the signature > would be: Either that or rethink building this data in shared buffers. What's the point of that, exactly, for a page that we are most certainly not going to use in normal operation? regards, tom lane
On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> It seems that in implementing ginbuildempty(), I falsified the first >> "note" in the header comment for log_newpage(): > >> * Note: all current callers build pages in private memory and write them >> * directly to smgr, rather than using bufmgr. Therefore there is no need >> * to pass a buffer ID to XLogInsert, nor to perform MarkBufferDirty within >> * the critical section. > >> 1. Considering that we're logging the entire page, is it necessary (or >> even desirable) to include the buffer ID in the rdata structure? If >> so, why? To put that another way, does my abuse of log_newpage >> constitute a bug in gistbuildempty()? > > AFAICS, not passing the buffer ID to XLogInsert is not an issue, since > we are logging the whole page in any case. However, failing to perform > MarkBufferDirty within the critical section definitely is an issue. However, I'm not failing to do that: there's an enclosing critical section. >> 2. Should we add a new function that does the same thing as >> log_newpage for a shared buffer? I'm imagining that the signature >> would be: > > Either that or rethink building this data in shared buffers. What's the > point of that, exactly, for a page that we are most certainly not going > to use in normal operation? Well, right. I mean, I think the current implementation is mostly design-by-accident, and my first thought was the same as yours: don't clutter shared_buffers with pages we don't actually need for anything.But there is a down side to doing it the other way,too. Look at what btbuildempty does: /* Write the page. If archiving/streaming, XLOG it. */ smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE, (char *) metapage, true); if (XLogIsNeeded()) log_newpage(&index->rd_smgr->smgr_rnode.node,INIT_FORKNUM, BTREE_METAPAGE, metapage); /* * An immediate sync is require even if we xlog'd the page, because the * write did not go throughshared_buffers and therefore a concurrent * checkpoint may have move the redo pointer past our xlog record. */ smgrimmedsync(index->rd_smgr, INIT_FORKNUM); So we have to write the page out immediately, then we have to XLOG it, and then even though we've XLOG'd it, we still have to fsync it immediately. It might be better to go through shared_buffers, which would allow the write and fsync to happen in the background. The cache-poisoning effect is probably trivial - even on a system with 32MB of shared_buffers there are thousands of pages, and init forks are never going to consume more than a handful unless you're creating an awful lot of unlogged relations very quickly - in which case maybe avoiding the immediate fsyncs is a more pressing concern. On the other hand, we haven't had any complaints about the way it works now, either. What's your thought? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> AFAICS, not passing the buffer ID to XLogInsert is not an issue, since >> we are logging the whole page in any case. �However, failing to perform >> MarkBufferDirty within the critical section definitely is an issue. > However, I'm not failing to do that: there's an enclosing critical section. Mph. But is it being done in the right order relative to the other XLOG related steps? See the code sketch in transam/README. > So we have to write the page out immediately, then we have to XLOG it, > and then even though we've XLOG'd it, we still have to fsync it > immediately. It might be better to go through shared_buffers, which > would allow the write and fsync to happen in the background. Well, that's a fair point, but on the other hand we've not had any complaints traceable to the cost of making init forks. On the whole, I don't care for the idea that the modify-and-xlog-a-buffer sequence is being split between log_newpage and its caller. That sounds like bugs waiting to happen anytime somebody refactors XLOG operations. It would probably be best to refactor this as you're suggesting, so that that becomes less nonstandard. regards, tom lane
On Fri, Jun 8, 2012 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> AFAICS, not passing the buffer ID to XLogInsert is not an issue, since >>> we are logging the whole page in any case. However, failing to perform >>> MarkBufferDirty within the critical section definitely is an issue. > >> However, I'm not failing to do that: there's an enclosing critical section. > > Mph. But is it being done in the right order relative to the other XLOG > related steps? See the code sketch in transam/README. It appears to me that it is being done in the right order. >> So we have to write the page out immediately, then we have to XLOG it, >> and then even though we've XLOG'd it, we still have to fsync it >> immediately. It might be better to go through shared_buffers, which >> would allow the write and fsync to happen in the background. > > Well, that's a fair point, but on the other hand we've not had any > complaints traceable to the cost of making init forks. > > On the whole, I don't care for the idea that the > modify-and-xlog-a-buffer sequence is being split between log_newpage and > its caller. That sounds like bugs waiting to happen anytime somebody > refactors XLOG operations. It would probably be best to refactor this > as you're suggesting, so that that becomes less nonstandard. OK. So what I'm thinking is that we should add a new function that takes a relfilenode and a buffer and steps 4-6 of what's described in transam/README: mark the buffer dirty, xlog it, and set the LSN and TLI. We might want to have this function assert that it is in a critical section, for the avoidance of error. Then anyone who wants to use it can do steps 1-3, call the function, and then finish up with steps 6-7. I don't think we can cleanly encapsulate any more than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: > OK. So what I'm thinking is that we should add a new function that > takes a relfilenode and a buffer and steps 4-6 of what's described in > transam/README: mark the buffer dirty, xlog it, and set the LSN and > TLI. We might want to have this function assert that it is in a > critical section, for the avoidance of error. Then anyone who wants > to use it can do steps 1-3, call the function, and then finish up with > steps 6-7. I don't think we can cleanly encapsulate any more than > that. On further review, I think that we ought to make MarkBufferDirty() the caller's job, because sometimes we may need to xlog only if XLogIsNeeded(), but the buffer's got to get marked dirty either way. So I think the new function should just do step 5 - emit XLOG and set LSN/TLI. Proposed patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Jun 8, 2012 at 1:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> OK. So what I'm thinking is that we should add a new function that >> takes a relfilenode and a buffer and steps 4-6 of what's described in >> transam/README: mark the buffer dirty, xlog it, and set the LSN and >> TLI. We might want to have this function assert that it is in a >> critical section, for the avoidance of error. Then anyone who wants >> to use it can do steps 1-3, call the function, and then finish up with >> steps 6-7. I don't think we can cleanly encapsulate any more than >> that. > > On further review, I think that we ought to make MarkBufferDirty() the > caller's job, because sometimes we may need to xlog only if > XLogIsNeeded(), but the buffer's got to get marked dirty either way. > So I think the new function should just do step 5 - emit XLOG and set > LSN/TLI. > > Proposed patch attached. Whee, testing is fun. Second try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
>On further review, I think that we ought to make MarkBufferDirty() the >caller's job, because sometimes we may need to xlog only if >XLogIsNeeded(), but the buffer's got to get marked dirty either way. Incase the place where Xlog is not required, woudn't it fsync the data; So in that case even MarkBufferDirty() will also be not required. >So I think the new function should just do step 5 - emit XLOG and set >LSN/TLI. In the API log_newpage_buffer(), as buffer already contains the page to be logged, so can't it be assumed that the page willbe initialized and no need to check if PageIsNew before setting LSN/TLI. ________________________________________ From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Robert Haas [robertmhaas@gmail.com] Sent: Friday, June 08, 2012 10:50 PM To: Tom Lane Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] log_newpage header comment On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: > OK. So what I'm thinking is that we should add a new function that > takes a relfilenode and a buffer and steps 4-6 of what's described in > transam/README: mark the buffer dirty, xlog it, and set the LSN and > TLI. We might want to have this function assert that it is in a > critical section, for the avoidance of error. Then anyone who wants > to use it can do steps 1-3, call the function, and then finish up with > steps 6-7. I don't think we can cleanly encapsulate any more than > that. On further review, I think that we ought to make MarkBufferDirty() the caller's job, because sometimes we may need to xlog only if XLogIsNeeded(), but the buffer's got to get marked dirty either way. So I think the new function should just do step 5 - emit XLOG and set LSN/TLI. Proposed patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Whee, testing is fun. Second try. I'm concerned by the fact that neither the original nor the new code bother to test whether the relation is WAL-loggable. It may be that ginbuildempty cannot be invoked for temp tables, but it still seems like an oversight waiting to bite you. I'd be happier if there were a RelationNeedsWAL test here. Come to think of it, the other foobuildempty functions aren't checking this either. A related point is that most of the other existing calls to log_newpage are covered by "if (XLogIsNeeded())" tests. It's not immediately obvious why these two shouldn't be. After some reflection I think that's correct, but probably the comments for log_newpage and log_newpage_buffer need to explain the different WAL-is-needed tests that apply to the two usages. (I'm also thinking that the XLogIsNeeded macro is very poorly named, as the situations it should be used in are far more narrow than the macro name suggests. Haven't consumed enough caffeine to think of a better name, though.) regards, tom lane
On Sat, Jun 9, 2012 at 1:43 AM, Amit kapila <amit.kapila@huawei.com> wrote: >>On further review, I think that we ought to make MarkBufferDirty() the >>caller's job, because sometimes we may need to xlog only if >>XLogIsNeeded(), but the buffer's got to get marked dirty either way. > > Incase the place where Xlog is not required, woudn't it fsync the data; > So in that case even MarkBufferDirty() will also be not required. Uh... no. The whole point of doing things in shared buffers is that you don't have to write and fsync the buffers immediately. Instead, buffer evicting handles that stuff for you. >>So I think the new function should just do step 5 - emit XLOG and set >>LSN/TLI. > > In the API log_newpage_buffer(), as buffer already contains the page to be logged, so can't it be assumed that the pagewill be initialized and no need to check > if PageIsNew before setting LSN/TLI. I don't see why it's any different from log_newpage() in that regard. That data is initialized before being written, as well, but someone contemplated the possible need to write a page of all zeros. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 9, 2012 at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Whee, testing is fun. Second try. > > I'm concerned by the fact that neither the original nor the new code > bother to test whether the relation is WAL-loggable. It may be that > ginbuildempty cannot be invoked for temp tables, but it still seems > like an oversight waiting to bite you. I'd be happier if there were > a RelationNeedsWAL test here. Come to think of it, the other > foobuildempty functions aren't checking this either. That's a good thing, because if they were, it would be wrong. RelationNeedsWAL() tells you whether the main fork is WAL-logged, but the buildempty routines exist for the purpose of constructing the init fork, which should always be WAL-logged. > A related point is that most of the other existing calls to log_newpage > are covered by "if (XLogIsNeeded())" tests. It's not immediately > obvious why these two shouldn't be. After some reflection I think > that's correct, but probably the comments for log_newpage and > log_newpage_buffer need to explain the different WAL-is-needed tests > that apply to the two usages. > > (I'm also thinking that the XLogIsNeeded macro is very poorly named, > as the situations it should be used in are far more narrow than the > macro name suggests. Haven't consumed enough caffeine to think of > a better name, though.) XLogIsNeeded() is, indeed, not a very good name: it's telling us whether wal_level > minimal, and thus whether there might be a standby. When log_newpage() is used to rebuild a relation, we use WAL bypass whenever possible, since the whole relation will be removed if the transaction rolls back, but we must still log it if a standby exists. In contrast, the init forks need to make it to the standby regardless. I'm not sure that I agree that it's the job of log_newpage() to explain to people calling it on what things they must conditionalize WAL generation: after all, none of this is unique to new-page records. If we emit some other record while creating an index on the primary, we can skip that when !XLogIsNeeded(), too. Any operation we perform on any relation fork other than the init fork can be conditionalized on RelationNeedsWAL(), not just new-page records. The charter of the function is just to avoid duplicating the mumbo-jumbo that's required to emit the record. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>Uh... no. The whole point of doing things in shared buffers is that >>you don't have to write and fsync the buffers immediately. Instead, >>buffer evicting handles that stuff for you. So you mean to say that there exists operations where Xlog is not required even though it marks the buffer as dirty for later eviction. >>I don't see why it's any different from log_newpage() in that regard. >>That data is initialized before being written, as well, but someone >>contemplated the possible need to write a page of all zeros. The comment above the code indicates that "the page is uninitialized". Does the code consider page with all zero's as uninitialized or the comment is not appropriate. -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com] Sent: Sunday, June 10, 2012 7:14 PM To: Amit kapila Cc: Tom Lane; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] log_newpage header comment On Sat, Jun 9, 2012 at 1:43 AM, Amit kapila <amit.kapila@huawei.com> wrote: >>On further review, I think that we ought to make MarkBufferDirty() the >>caller's job, because sometimes we may need to xlog only if >>XLogIsNeeded(), but the buffer's got to get marked dirty either way. > > Incase the place where Xlog is not required, woudn't it fsync the data; > So in that case even MarkBufferDirty() will also be not required. Uh... no. The whole point of doing things in shared buffers is that you don't have to write and fsync the buffers immediately. Instead, buffer evicting handles that stuff for you. >>So I think the new function should just do step 5 - emit XLOG and set >>LSN/TLI. > > In the API log_newpage_buffer(), as buffer already contains the page to be logged, so can't it be assumed that the page will be initialized and no need to check > if PageIsNew before setting LSN/TLI. I don't see why it's any different from log_newpage() in that regard. That data is initialized before being written, as well, but someone contemplated the possible need to write a page of all zeros. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jun 10, 2012 at 11:41 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >>>Uh... no. The whole point of doing things in shared buffers is that >>>you don't have to write and fsync the buffers immediately. Instead, >>>buffer evicting handles that stuff for you. > > So you mean to say that there exists operations where Xlog is not required > even though it marks the buffer as dirty for later eviction. Correct. >>>I don't see why it's any different from log_newpage() in that regard. >>>That data is initialized before being written, as well, but someone >>>contemplated the possible need to write a page of all zeros. > > The comment above the code indicates that "the page is uninitialized". > Does the code consider page with all zero's as uninitialized or the comment > is not > appropriate. Yes, all zeroes = uninitialized. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company