Thread: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
On Sat, Mar 26, 2016 at 8:39 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-03-25 12:02:05 -0400, Robert Haas wrote: >> Gosh, that's surprising. I wonder if that just revealed an underlying >> issue rather than creating it. > > I think that's the case; it's just somewhat unlikely to hit in other > cases. > > If SMgrRelation->md_fd[n] is an empty relation, and mdread() or another > routine is asking for a block in the second segment - which will error > out. But even if the first segment is zero bytes, _mdfd_getseg() will > dutifully try to open the second segment. Which will succeed in the case > of a truncated relation, because we leave the truncated segment in > place. > > ISTM that _mdfd_getseg better check the length of the last segment > before opening the next one? Well, I agree that it's pretty strange that _mdfd_getseg() makes no such check, but I still don't think I understand what's going on here. Backends shouldn't be requesting nonexistent blocks from a relation - higher-level safeguards, like holding AccessExclusiveLock before trying to complete a DROP or TRUNCATE - are supposed to prevent that. If this patch is causing us to hold onto smgr references to a relation on which we no longer hold locks, I think that's irretrievably broken and should be reverted. I really doubt this will be the only thing that goes wrong if you do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 04, 2016 at 12:45:25PM -0400, Robert Haas wrote: > Backends shouldn't be requesting nonexistent blocks from a relation - > higher-level safeguards, like holding AccessExclusiveLock before > trying to complete a DROP or TRUNCATE - are supposed to prevent that. > If this patch is causing us to hold onto smgr references to a relation > on which we no longer hold locks, I think that's irretrievably broken > and should be reverted. I really doubt this will be the only thing > that goes wrong if you do that. +1
On 2016-04-04 12:45:25 -0400, Robert Haas wrote: > Well, I agree that it's pretty strange that _mdfd_getseg() makes no > such check, but I still don't think I understand what's going on here. > Backends shouldn't be requesting nonexistent blocks from a relation - > higher-level safeguards, like holding AccessExclusiveLock before > trying to complete a DROP or TRUNCATE - are supposed to prevent that. I don't think that's really true: We write blocks back without holding any sort of relation level locks; and thus do _mdfd_getseg() type accesses as well. And we're not really "requesting nonexistant blocks" - the segments are just opened to get the associated file descriptor, and they're opened with EXTENSION_RETURN_NULL. It turns out to be important performance-wise to reuse fd's when triggering kernel writeback. > If this patch is causing us to hold onto smgr references to a relation > on which we no longer hold locks, I think that's irretrievably broken > and should be reverted. I really doubt this will be the only thing > that goes wrong if you do that. As we already have done that for writes for a long time, I'm a bit surprised about that statement. Greetings, Andres Freund
On Mon, Apr 11, 2016 at 12:50 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-04 12:45:25 -0400, Robert Haas wrote: >> Well, I agree that it's pretty strange that _mdfd_getseg() makes no >> such check, but I still don't think I understand what's going on here. >> Backends shouldn't be requesting nonexistent blocks from a relation - >> higher-level safeguards, like holding AccessExclusiveLock before >> trying to complete a DROP or TRUNCATE - are supposed to prevent that. > > I don't think that's really true: We write blocks back without holding > any sort of relation level locks; and thus do _mdfd_getseg() type > accesses as well. You're right, but I think that's more because I didn't say it correctly than because you haven't done something novel. DROP and relation truncation know about shared buffers, and they go clear blocks that that might be affected from it as part of the truncate operation, which means that no other backend will see them after they are gone. The lock makes sure that no other references can be added while we're busy removing any that are already there. So I think that there is currently an invariant that any block we are attempting to access should actually still exist. It sounds like these references are sticking around in backend-private memory, which means they are neither protected by locks nor able to be cleared out on drop or truncate. I think that's a new thing, and a bit scary. > And we're not really "requesting nonexistant blocks" - the segments are > just opened to get the associated file descriptor, and they're opened > with EXTENSION_RETURN_NULL. It turns out to be important > performance-wise to reuse fd's when triggering kernel writeback. The possibly-saving grace here, I suppose, is that the references we're worried about are just being used to issue hints to the operating system. So I guess if we sent a hint on a wrong block or skip sending a hint altogether because of some failure, no harm done, as long as we don't error out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-11 13:04:48 -0400, Robert Haas wrote: > You're right, but I think that's more because I didn't say it > correctly than because you haven't done something novel. Could be. > DROP and > relation truncation know about shared buffers, and they go clear > blocks that that might be affected from it as part of the truncate > operation, which means that no other backend will see them after they > are gone. The lock makes sure that no other references can be added > while we're busy removing any that are already there. So I think that > there is currently an invariant that any block we are attempting to > access should actually still exist. Note that we're not actually accessing any blocks, we're just opening a segment to get the associated file descriptor. > It sounds like these references are sticking around in backend-private > memory, which means they are neither protected by locks nor able to be > cleared out on drop or truncate. I think that's a new thing, and a > bit scary. True. But how would you batch flush requests in a sorted manner otherwise, without re-opening file descriptors otherwise? And that's prety essential for performance. I can think of a number of relatively easy ways to address this: 1) Just zap (or issue?) all pending flush requests when getting an smgrinval/smgrclosenode 2) Do 1), but filter for the closed relnode 3) Actually handle the case of the last open segment not being RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() doesso. I'm kind of inclined to do both 3) and 1). > The possibly-saving grace here, I suppose, is that the references > we're worried about are just being used to issue hints to the > operating system. Indeed. > So I guess if we sent a hint on a wrong block or > skip sending a hint altogether because of some failure, no harm done, > as long as we don't error out. Which the writeback code is careful not to do; afaics it's just the "already open segment" issue making problems here. - Andres
Hello Andres, > I can think of a number of relatively easy ways to address this: > 1) Just zap (or issue?) all pending flush requests when getting an > smgrinval/smgrclosenode > 2) Do 1), but filter for the closed relnode > 3) Actually handle the case of the last open segment not being > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > I'm kind of inclined to do both 3) and 1). Alas I'm a little bit outside my area of expertise. Just for suggestion purpose, possibly totally wrong (please accept my apology if this is the case), the ideas I had while thinking about these issues, some may be quite close to your suggestions above: - keep track of file/segment closing/reopenings (say with a counter), so that if a flush is about a file/segment whichhas been closed or reopened, it is just skipped. I'm not sure this is enough, because one process may do the filecleaning and another may want to flush, although I guess there is some kind of locking/shared data structures to manage such interactions between pg processes. - because of "empty the bucket when filled behavior" of the current implementation, a buffer to be flused may be keptfor a very long time in the bucket. I think that flushing advices become stale after a while so should not be issued(the buffer may have been written again, ...), or the bucket should be flushed after a while even of not full. Also, a detail in "pg_flush_data", there are a serie of three if/endif, but it is not obvious to me whether they are mutually exclusive while looking at the source, so I was wondering whether the code could try several flushings approaches one after the other... This is probably not the case, but I would be more at ease with a if/elif/elif/endif structure there so that is clearer. -- Fabien.
On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > I can think of a number of relatively easy ways to address this: > 1) Just zap (or issue?) all pending flush requests when getting an > smgrinval/smgrclosenode > 2) Do 1), but filter for the closed relnode > 3) Actually handle the case of the last open segment not being > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > I'm kind of inclined to do both 3) and 1). #3 seems like it's probably about 15 years overdue, so let's do that anyway. I don't quite understand why #1 fixes the problem - not because I doubt you, but just because I haven't studied it much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At 2016-04-12 09:00:57 -0400, robertmhaas@gmail.com wrote: > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > > > > 3) Actually handle the case of the last open segment not being > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > #3 seems like it's probably about 15 years overdue, so let's do that > anyway. I don't have anything useful to contribute, I'm just trying to understand this fix. _mdfd_getseg() looks like this: targetseg = blkno / ((BlockNumber) RELSEG_SIZE);for (nextsegno = 1; nextsegno <= targetseg; nextsegno++){ Assert(nextsegno== v->mdfd_segno + 1); if (v->mdfd_chain == NULL) { /* * Normally we will create new segments only if authorized by the * caller (i.e., we are doing mdextend()). […] */ if (behavior == EXTENSION_CREATE || InRecovery) { if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE) /* mdextend */ v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT); } else { /* We won'tcreate segment if not existent */ v->mdfd_chain = _mdfd_openseg(reln, forknum, nextsegno, 0); } if (v->mdfd_chain == NULL) /* return NULL or ERROR */ } v = v->mdfd_chain;} Do I understand correctly that the case of the "last open segment" (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or InRecovery? And that "We won't create segment if not existent" should happen, but doesn't only because the next segment file wasn't removed earlier, so we have to add an extra check for that case? In other words, is something like the following what's needed here, or is there more to it? -- Abhijit diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 578276d..58ea5a0 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -1783,6 +1783,8 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno, if (v->mdfd_chain== NULL) { + bool thisSegNeedsExtension = _mdnblocks(reln, forknum, v) < RELSEG_SIZE; + /* * Normally we will create new segments only if authorized by the * caller (i.e., we are doing mdextend()). But when doing WAL @@ -1799,7 +1801,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno, */ if (behavior == EXTENSION_CREATE || InRecovery) { - if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE) + if (thisSegNeedsExtension) { char *zerobuf = palloc0(BLCKSZ); @@ -1810,7 +1812,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno, } v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT); } - else + else if (!thisSegNeedsExtension) { /* We won'tcreate segment if not existent */ v->mdfd_chain = _mdfd_openseg(reln, forknum, nextsegno,0);
On Thu, Apr 14, 2016 at 1:39 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2016-04-12 09:00:57 -0400, robertmhaas@gmail.com wrote: >> >> On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: >> > >> > 3) Actually handle the case of the last open segment not being >> > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. >> >> #3 seems like it's probably about 15 years overdue, so let's do that >> anyway. > > Do I understand correctly that the case of the "last open segment" > (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE > (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call > _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or > InRecovery? > > And that "We won't create segment if not existent" should happen, but > doesn't only because the next segment file wasn't removed earlier, so > we have to add an extra check for that case? > > In other words, is something like the following what's needed here, or > is there more to it? Something like that is what I was thinking about, but I notice it has the disadvantage of adding lseeks to cater to a shouldn't-happen condition. Not sure if we should care. My attempts to test things were also singularly unrewarding. Even after messing with the filesystem in various ways, I couldn't figure out exactly how this makes a difference. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > > I can think of a number of relatively easy ways to address this: > > 1) Just zap (or issue?) all pending flush requests when getting an > > smgrinval/smgrclosenode > > 2) Do 1), but filter for the closed relnode > > 3) Actually handle the case of the last open segment not being > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > I'm kind of inclined to do both 3) and 1). Andres, you own the underlying open item, right? Do you plan to implement this fix you have outlined? If so, when? > #3 seems like it's probably about 15 years overdue, so let's do that anyway. > > I don't quite understand why #1 fixes the problem - not because I > doubt you, but just because I haven't studied it much.
On 2016-04-16 18:23:01 -0400, Noah Misch wrote: > On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > > > I can think of a number of relatively easy ways to address this: > > > 1) Just zap (or issue?) all pending flush requests when getting an > > > smgrinval/smgrclosenode > > > 2) Do 1), but filter for the closed relnode > > > 3) Actually handle the case of the last open segment not being > > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > > > I'm kind of inclined to do both 3) and 1). > > Andres, you own the underlying open item, right? Right. > Do you plan to implement this fix you have outlined? If so, when? Yes, I plan to next week. - Andres
On Sat, Apr 16, 2016 at 03:28:23PM -0700, Andres Freund wrote: > On 2016-04-16 18:23:01 -0400, Noah Misch wrote: > > On Tue, Apr 12, 2016 at 09:00:57AM -0400, Robert Haas wrote: > > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > > > > I can think of a number of relatively easy ways to address this: > > > > 1) Just zap (or issue?) all pending flush requests when getting an > > > > smgrinval/smgrclosenode > > > > 2) Do 1), but filter for the closed relnode > > > > 3) Actually handle the case of the last open segment not being > > > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > > > > > I'm kind of inclined to do both 3) and 1). > > > > Andres, you own the underlying open item, right? > > Right. > > > Do you plan to implement this fix you have outlined? If so, when? > > Yes, I plan to next week. Completion by the end of next week will work. Thanks. At that time (2016-04-24T00:00 UTC), this would otherwise have spent twenty-three days as an open item. Would you inform this thread if you discover reasons to think your plan will no longer succeed?
On 2016-04-15 13:42:34 -0400, Robert Haas wrote: > On Thu, Apr 14, 2016 at 1:39 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > > At 2016-04-12 09:00:57 -0400, robertmhaas@gmail.com wrote: > >> > >> On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > >> > > >> > 3) Actually handle the case of the last open segment not being > >> > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > >> > >> #3 seems like it's probably about 15 years overdue, so let's do that > >> anyway. > > > > Do I understand correctly that the case of the "last open segment" > > (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE > > (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call > > _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or > > InRecovery? > > > > And that "We won't create segment if not existent" should happen, but > > doesn't only because the next segment file wasn't removed earlier, so > > we have to add an extra check for that case? > > > > In other words, is something like the following what's needed here, or > > is there more to it? Yes, I think that should solve the bug reported here. Did you check? What makes me rather worried about solely using this as a solution right now is the InRecovery check, which triggers segment extensions/creations even with EXTENSION_RETURN_NULL. Afaics it's more or less a dormant bug that _mdfd_getseg() ignores RETURN_NULL in the InRecovery case, because mdopen() (and other places) do *not* behave that way! > Something like that is what I was thinking about, but I notice it has > the disadvantage of adding lseeks to cater to a shouldn't-happen > condition. Not sure if we should care. I'm not particularly concerned about that, my gues sit'd barely impact the number of lseeks, because most callers will have called mdnblocks() before anyway. > My attempts to test things were also singularly unrewarding. Even > after messing with the filesystem in various ways, I couldn't figure > out exactly how this makes a difference. What do you mean by this? You couldn't reproduce the bug? Or not how to trigger such a test? Andres
On Thu, Apr 21, 2016 at 9:45 AM, Andres Freund <andres@anarazel.de> wrote: >> My attempts to test things were also singularly unrewarding. Even >> after messing with the filesystem in various ways, I couldn't figure >> out exactly how this makes a difference. > > What do you mean by this? You couldn't reproduce the bug? Or not how to > trigger such a test? Let's see. I think what i did is made _mdfd_getseg() error out if it got to a segment that was not the right size and preceded the one it was trying to reach. But I could not hit that error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-14 11:09:29 +0530, Abhijit Menon-Sen wrote: > At 2016-04-12 09:00:57 -0400, robertmhaas@gmail.com wrote: > > > > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: > > > > > > 3) Actually handle the case of the last open segment not being > > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. > > > > #3 seems like it's probably about 15 years overdue, so let's do that > > anyway. > > I don't have anything useful to contribute, I'm just trying to > understand this fix. > > _mdfd_getseg() looks like this: > > targetseg = blkno / ((BlockNumber) RELSEG_SIZE); > for (nextsegno = 1; nextsegno <= targetseg; nextsegno++) > { > Assert(nextsegno == v->mdfd_segno + 1); > > if (v->mdfd_chain == NULL) > { > /* > * Normally we will create new segments only if authorized by the > * caller (i.e., we are doing mdextend()). […] > */ > if (behavior == EXTENSION_CREATE || InRecovery) > { > if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE) > /* mdextend */ > v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT); > } > else > { > /* We won't create segment if not existent */ > v->mdfd_chain = _mdfd_openseg(reln, forknum, nextsegno, 0); > } > if (v->mdfd_chain == NULL) > /* return NULL or ERROR */ > } > v = v->mdfd_chain; > } > > Do I understand correctly that the case of the "last open segment" > (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE > (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call > _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or > InRecovery? > > And that "We won't create segment if not existent" should happen, but > doesn't only because the next segment file wasn't removed earlier, so > we have to add an extra check for that case? > > In other words, is something like the following what's needed here, or > is there more to it? It's a bit more complicated than that, but that's the principle. Thanks for the patch, and sorry for my late response. See my attached version for a more fleshed out version of this. Looking at the code around this I found: * _mdfd_getseg(), in contrast to mdnblocks() doesn't check segment size, neither whether to continue with a too small, nor to error out with a too big segment * For, imo pretty bad, reasons in mdsync() we currently rely on EXTENSION_RETURN_NULL to extend the file in recovery, and to set errno to ENOENT. Brrr. * we currently FATAL if a segment is too big - I've copied that behaviour, but why not just ERROR out? The attached patch basically adds the segment size checks to _mdfd_getseg(), and doesn't perform extension, even in recovery, if EXTENSION_REALLY_RETURN_NULL is passed. This fixes the issue for me, both in the originally reported variant, and in some more rudely setup version (i.e. rm'ing files). I'm seing some weird behaviour (even with writeback flushing disabled) with smgr invals and recovery/standbys, but I don't want to dive into it before http://www.postgresql.org/message-id/CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com is addressed, it's too likely to be related. Greetings, Andres Freund
Attachment
On Fri, Apr 22, 2016 at 1:07 PM, Andres Freund <andres@anarazel.de> wrote: > The attached patch basically adds the segment size checks to > _mdfd_getseg(), and doesn't perform extension, even in recovery, if > EXTENSION_REALLY_RETURN_NULL is passed. > > This fixes the issue for me, both in the originally reported variant, > and in some more rudely setup version (i.e. rm'ing files). I think this looks broadly reasonable. Some specific comments: + /* + * Normally we will create new segments only if authorized by + * the caller (i.e., we are doing mdextend()). But when doing + * WAL recovery, create segments anyway; this allows cases + * such as replaying WAL data that has a write into a + * high-numbered segment of a relation that was later deleted. + * We want to go ahead and create the segments so we can + * finish out the replay. Add something like: "However, if the caller has specified EXTENSION_REALLY_RETURN_NULL, then extension is not desired even in recovery; we won't reach this point in that case." + errno = ENOENT; /* some callers check errno, yuck */ I think a considerably more detailed comment would be warranted. + else + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not open file \"%s\" (target block %u): previous segment is only %u blocks", + _mdfd_segpath(reln, forknum, nextsegno), + blkno, nblocks))); + } The else and its ensuing level of indentation are unnecessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 22, 2016 at 5:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 22, 2016 at 1:07 PM, Andres Freund <andres@anarazel.de> wrote: >> The attached patch basically adds the segment size checks to >> _mdfd_getseg(), and doesn't perform extension, even in recovery, if >> EXTENSION_REALLY_RETURN_NULL is passed. >> >> This fixes the issue for me, both in the originally reported variant, >> and in some more rudely setup version (i.e. rm'ing files). > > I think this looks broadly reasonable. Some specific comments: > > + /* > + * Normally we will create new segments only if authorized by > + * the caller (i.e., we are doing mdextend()). But when doing > + * WAL recovery, create segments anyway; this allows cases > + * such as replaying WAL data that has a write into a > + * high-numbered segment of a relation that was later deleted. > + * We want to go ahead and create the segments so we can > + * finish out the replay. > > Add something like: "However, if the caller has specified > EXTENSION_REALLY_RETURN_NULL, then extension is not desired even in > recovery; we won't reach this point in that case." > > + errno = ENOENT; /* some callers check errno, yuck */ > > I think a considerably more detailed comment would be warranted. > > + else > + { > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not open file \"%s\" > (target block %u): previous segment is only %u blocks", > + _mdfd_segpath(reln, forknum, nextsegno), > + blkno, nblocks))); > + } > > The else and its ensuing level of indentation are unnecessary. Andres, this issue has now been open for more than a month, which is frankly kind of ridiculous given the schedule we're trying to hit for beta. Do you think it's possible to commit something RSN without compromising the quality of PostgreSQL 9.6? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22 April 2016 at 18:07, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-14 11:09:29 +0530, Abhijit Menon-Sen wrote: >> At 2016-04-12 09:00:57 -0400, robertmhaas@gmail.com wrote: >> > >> > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote: >> > > >> > > 3) Actually handle the case of the last open segment not being >> > > RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so. >> > >> > #3 seems like it's probably about 15 years overdue, so let's do that >> > anyway. >> >> I don't have anything useful to contribute, I'm just trying to >> understand this fix. >> >> _mdfd_getseg() looks like this: >> >> targetseg = blkno / ((BlockNumber) RELSEG_SIZE); >> for (nextsegno = 1; nextsegno <= targetseg; nextsegno++) >> { >> Assert(nextsegno == v->mdfd_segno + 1); >> >> if (v->mdfd_chain == NULL) >> { >> /* >> * Normally we will create new segments only if authorized by the >> * caller (i.e., we are doing mdextend()). […] >> */ >> if (behavior == EXTENSION_CREATE || InRecovery) >> { >> if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE) >> /* mdextend */ >> v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT); >> } >> else >> { >> /* We won't create segment if not existent */ >> v->mdfd_chain = _mdfd_openseg(reln, forknum, nextsegno, 0); >> } >> if (v->mdfd_chain == NULL) >> /* return NULL or ERROR */ >> } >> v = v->mdfd_chain; >> } >> >> Do I understand correctly that the case of the "last open segment" >> (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE >> (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call >> _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or >> InRecovery? >> >> And that "We won't create segment if not existent" should happen, but >> doesn't only because the next segment file wasn't removed earlier, so >> we have to add an extra check for that case? >> >> In other words, is something like the following what's needed here, or >> is there more to it? > > It's a bit more complicated than that, but that's the principle. Thanks > for the patch, and sorry for my late response. See my attached version > for a more fleshed out version of this. > > Looking at the code around this I found: > * _mdfd_getseg(), in contrast to mdnblocks() doesn't check segment size, > neither whether to continue with a too small, nor to error out with a > too big segment > * For, imo pretty bad, reasons in mdsync() we currently rely on > EXTENSION_RETURN_NULL to extend the file in recovery, and to set errno > to ENOENT. Brrr. > * we currently FATAL if a segment is too big - I've copied that > behaviour, but why not just ERROR out? > > The attached patch basically adds the segment size checks to > _mdfd_getseg(), and doesn't perform extension, even in recovery, if > EXTENSION_REALLY_RETURN_NULL is passed. > > This fixes the issue for me, both in the originally reported variant, > and in some more rudely setup version (i.e. rm'ing files). Fixes it for me too. Thom
On 2016-04-25 08:55:54 -0400, Robert Haas wrote: > Andres, this issue has now been open for more than a month, which is > frankly kind of ridiculous given the schedule we're trying to hit for > beta. Do you think it's possible to commit something RSN without > compromising the quality of PostgreSQL 9.6? Frankly I'm getting a bit annoyed here too. I posted a patch Friday, directly after getting back from pgconf.us. Saturday I posted a patch for an issue which I didn't cause, but which caused issues when testing the patch in this thread. Sunday I just worked on some small patch - one you did want to see resolved too. It's now 8.30 Monday morning. What's the point of your message? Andres
On Mon, Apr 25, 2016 at 11:56 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-25 08:55:54 -0400, Robert Haas wrote: >> Andres, this issue has now been open for more than a month, which is >> frankly kind of ridiculous given the schedule we're trying to hit for >> beta. Do you think it's possible to commit something RSN without >> compromising the quality of PostgreSQL 9.6? > > Frankly I'm getting a bit annoyed here too. I posted a patch Friday, > directly after getting back from pgconf.us. Saturday I posted a patch > for an issue which I didn't cause, but which caused issues when testing > the patch in this thread. Sunday I just worked on some small patch - one > you did want to see resolved too. It's now 8.30 Monday morning. What's > the point of your message? I think that the point of my message is exactly what I said in my message. This isn't really about the last couple of days. The issue was reported on March 20th. On March 31st, Noah asked you for a plan to get it fixed by April 7th. You never replied. On April 16th, the issue not having been fixed, he followed up. You said that you would fix it next week. That week is now over, and we're into the following week. We have a patch, and that's good, and I have reviewed it and Thom has tested it, and that's good, too. But it is not clear whether you feel confident to commit it or when you might be planning to do that, so I asked. Given that this is the open item of longest tenure and that we're hoping to ship beta soon, why is that out of line? Fundamentally, the choices for each open item are (1) get it fixed before beta, (2) revert the commit that caused it, (3) decide it's OK to ship beta with that issue, or (4) slip beta. We initially had a theory that the commit that caused this issue merely revealed an underlying problem that had existed before, but I no longer really think that's the case. That commit introduced a new way to write to blocks that might have in the meantime been removed, and it failed to make that safe. That's not to say that md.c doesn't do some wonky things, but the pre-existing code in the upper layers coped with the wonkiness and your new code doesn't, so in effect it's a regression. And in fact I think it's a regression that can be expected to be a significant operational problem for people if not fixed, because the circumstances in which it can happen are not very obscure. You just need to hold some pending flush requests in your backend-local queue while some other process truncates the relation, and boom. So I am disinclined to option #3. I also do not think that we should slip beta for an issue that was reported this far in advance of the planned beta date, so I am disinclined to option #4. That leaves #1 and #2. I assume you will be pretty darned unhappy if we end up at #2, so I am trying to figure out if we can achieve #1. OK? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2016-04-25 12:38:36 -0400, Robert Haas wrote: > I think that the point of my message is exactly what I said in my > message. This isn't really about the last couple of days. The issue > was reported on March 20th. On March 31st, Noah asked you for a plan > to get it fixed by April 7th. You never replied. On April 16th, the > issue not having been fixed, he followed up. You said that you would > fix it next week. That week is now over, and we're into the following > week. Well, I posted a patch. I'd have applied it too (after addressing your comments obviously), except that there's some interdependencies with the nsmg > 0 thread (some of my tests fail spuriously without that fixed). Early last week I waited for a patch on that thread, but when that didn't materialize by Friday I switched to work on that [1]. With both fixes applied I can't reproduce any problems anymore. About the delay: Sure, it'd be nicer if I'd addressed this immediately. But priority-wise it's an issue that's damned hard to hit, and where the worst known consequence is having to reconnect; that's not earth shattering. So spending time to work on the, imo more severe performance regressions, seemed to be more important; maybe I was wrong in priorizing things that way. > We have a patch, and that's good, and I have reviewed it and > Thom has tested it, and that's good, too. But it is not clear whether > you feel confident to commit it or when you might be planning to do > that, so I asked. Given that this is the open item of longest tenure > and that we're hoping to ship beta soon, why is that out of line? Well, if you'd asked it that way, then I'd not responded the way I have. > We initially had a theory that the commit that caused this issue > merely revealed an underlying problem that had existed before, but I > no longer really think that's the case. I do think there's some lingering problems (avoiding a FATAL by choosing an index scan instead of a seqscan, the problem afaics can transiently occur in HS, any corrupted index can trigger it ...) - but I agree it's not really been a big problem so far. > That commit introduced a new way to write to blocks that might have in > the meantime been removed, and it failed to make that safe. There's no writing of any blocks involved - the problem is just about opening segments which might or might not exist. > And in fact I think it's a regression that can be > expected to be a significant operational problem for people if not > fixed, because the circumstances in which it can happen are not very > obscure. You just need to hold some pending flush requests in your > backend-local queue while some other process truncates the relation, > and boom. I found it quite hard to come up with scenarios to reproduce it. Entire segments have to be dropped, the writes to the to-be-dropped segment have to result in fully dead rows, only few further writes outside the dropped can happen, invalidations may not fix the problem up. But obviously it should be fixed. > I assume you will be pretty darned unhappy if we end up at #2, so I am > trying to figure out if we can achieve #1. OK? Ok. [1] http://archives.postgresql.org/message-id/20160424025117.2cmf6ku4ldfcoo44%40alap3.anarazel.de Andres
Hello Andres, > Fixes it for me too. Yep, for me too. I'm not at ease with the part of the API the code is dealing with, so I do not feel like a competent reviewer. I agree with the "more comments" suggested by Robert. I have just a small naming point: /* ereport if segment not present, create in recovery */ EXTENSION_FAIL, /* return NULL if not present, create in recovery*/ EXTENSION_RETURN_NULL, /* return NULL if not present */ EXTENSION_REALLY_RETURN_NULL, /* create new segmentsas needed */ EXTENSION_CREATE The comments seem pretty clear, but the naming of these options are more behavioral than functional somehow (or the reverse?), especially the RETURN_NULL and REALLY_RETURN_NULL names seemed pretty contrived to me. There is one context: whether it is in recovery. There are 3 possible behaviors: whether to error or ignore or create if segment does not exist. In recovery it is always create if asked for it must be made available, maybe it does not exists because of the crash... If I understand correctly, with flushes kept a long time before being processed, there is a new state which is "ignore whatever", we do not want to create a segment for flushing non existing data. So I would suggest it would be better to name the options closer to the comments above, something like: normal / in recovery: EXTENSION_ERROR_OR_IR_CREATE EXTENSION_IGNORE_OR_IR_CREATE EXTENSION_IGNORE EXTENSION_CREATE Now, maybe that is too late to try to find a better name for these options now:-) -- Fabien.
On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote: > Well, I posted a patch. I'd have applied it too (after addressing your > comments obviously), except that there's some interdependencies with the > nsmg > 0 thread (some of my tests fail spuriously without that > fixed). Early last week I waited for a patch on that thread, but when > that didn't materialize by Friday I switched to work on that [1]. With > both fixes applied I can't reproduce any problems anymore. OK. What are the interdependencies? You've said that a few times but I am not clear on what the nature of those interdependencies is. > About the delay: Sure, it'd be nicer if I'd addressed this > immediately. But priority-wise it's an issue that's damned hard to hit, > and where the worst known consequence is having to reconnect; that's not > earth shattering. So spending time to work on the, imo more severe > performance regressions, seemed to be more important; maybe I was wrong > in priorizing things that way. Do you mean performance regression related to this patch, or to other patches like the atomic buffer pin/unpin stuff? >> We have a patch, and that's good, and I have reviewed it and >> Thom has tested it, and that's good, too. But it is not clear whether >> you feel confident to commit it or when you might be planning to do >> that, so I asked. Given that this is the open item of longest tenure >> and that we're hoping to ship beta soon, why is that out of line? > > Well, if you'd asked it that way, then I'd not responded the way I have. Fair enough, but keep in mind that a few more mildly-phrased emails from Noah didn't actually get us to the point where this is fixed, or even to a clear explanation of when it might be fixed or why it wasn't getting fixed sooner. >> That commit introduced a new way to write to blocks that might have in >> the meantime been removed, and it failed to make that safe. > > There's no writing of any blocks involved - the problem is just about > opening segments which might or might not exist. As I understand it, that commit introduced a backend-private queue; when it fills, we issue flush requests for particular blocks. By the time the flush requests are issued, the blocks might have been truncated away. So it's not writing blocks in the sense of write(), but persuading the OS to write those blocks to stable storage. >> And in fact I think it's a regression that can be >> expected to be a significant operational problem for people if not >> fixed, because the circumstances in which it can happen are not very >> obscure. You just need to hold some pending flush requests in your >> backend-local queue while some other process truncates the relation, >> and boom. > > I found it quite hard to come up with scenarios to reproduce it. Entire > segments have to be dropped, the writes to the to-be-dropped segment > have to result in fully dead rows, only few further writes outside the > dropped can happen, invalidations may not fix the problem up. But > obviously it should be fixed. It's probably a bit tricky to produce a workload where the truncated-away block is in some backend's private queue, because the queues are very small and it's not exactly clear what sort of workload will produce regular relation truncations, and you need both of those things to hit this. However, I think it's pretty optimistic to think that there won't be people in that situation, partly because we had a customer find a bug in an EDB proprietary feature a very similar whose profile is very similar to this feature less than 2 years ago. >> I assume you will be pretty darned unhappy if we end up at #2, so I am >> trying to figure out if we can achieve #1. OK? > > Ok. I'm still not clear on the answer to this. I think the answer is that you think that this patch is OK to commit with some editing, but the situation with the nmsgs > 0 patch is not so clear yet because it hasn't had any review yet. And they depend on each other somehow. Am I close? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-25 16:29:36 -0400, Robert Haas wrote: > On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote: > > Well, I posted a patch. I'd have applied it too (after addressing your > > comments obviously), except that there's some interdependencies with the > > nsmg > 0 thread (some of my tests fail spuriously without that > > fixed). Early last week I waited for a patch on that thread, but when > > that didn't materialize by Friday I switched to work on that [1]. With > > both fixes applied I can't reproduce any problems anymore. > > OK. What are the interdependencies? You've said that a few times but > I am not clear on what the nature of those interdependencies is. I added checks to smgr/md.c that verify that the smgr state is consistent with the on-file state. Without the two issues in [1] fixed, these tests fail in a standby, while running regression tests. Adding those tests made me notice a bug in an unreleased version of the patch, so it seems they're worthwhile to run. > > About the delay: Sure, it'd be nicer if I'd addressed this > > immediately. But priority-wise it's an issue that's damned hard to hit, > > and where the worst known consequence is having to reconnect; that's not > > earth shattering. So spending time to work on the, imo more severe > > performance regressions, seemed to be more important; maybe I was wrong > > in priorizing things that way. > > Do you mean performance regression related to this patch, or to other > patches like the atomic buffer pin/unpin stuff? Other patches, I can't see this having measurable impact. > >> We have a patch, and that's good, and I have reviewed it and > >> Thom has tested it, and that's good, too. But it is not clear whether > >> you feel confident to commit it or when you might be planning to do > >> that, so I asked. Given that this is the open item of longest tenure > >> and that we're hoping to ship beta soon, why is that out of line? > > > > Well, if you'd asked it that way, then I'd not responded the way I have. > > Fair enough, but keep in mind that a few more mildly-phrased emails > from Noah didn't actually get us to the point where this is fixed, or > even to a clear explanation of when it might be fixed or why it wasn't > getting fixed sooner. Hrmpf. I'd initially missed the first email in the onslought, and the second email I responded to and posted a patch in the timeframe I'd promised. I didn't forsee, after we'd initially dismissed a correlation, that there'd be a connection to the other patch. > >> That commit introduced a new way to write to blocks that might have in > >> the meantime been removed, and it failed to make that safe. > > > > There's no writing of any blocks involved - the problem is just about > > opening segments which might or might not exist. > > As I understand it, that commit introduced a backend-private queue; > when it fills, we issue flush requests for particular blocks. By the > time the flush requests are issued, the blocks might have been > truncated away. So it's not writing blocks in the sense of write(), > but persuading the OS to write those blocks to stable storage. Right. > >> And in fact I think it's a regression that can be > >> expected to be a significant operational problem for people if not > >> fixed, because the circumstances in which it can happen are not very > >> obscure. You just need to hold some pending flush requests in your > >> backend-local queue while some other process truncates the relation, > >> and boom. > > > > I found it quite hard to come up with scenarios to reproduce it. Entire > > segments have to be dropped, the writes to the to-be-dropped segment > > have to result in fully dead rows, only few further writes outside the > > dropped can happen, invalidations may not fix the problem up. But > > obviously it should be fixed. > > It's probably a bit tricky to produce a workload where the > truncated-away block is in some backend's private queue, because the > queues are very small and it's not exactly clear what sort of workload > will produce regular relation truncations, and you need both of those > things to hit this. However, I think it's pretty optimistic to think > that there won't be people in that situation, partly because we had a > customer find a bug in an EDB proprietary feature a very similar > whose profile is very similar to this feature less than 2 years ago. I'm obviously not arguing that we shouldn't fix this. > >> I assume you will be pretty darned unhappy if we end up at #2, so I am > >> trying to figure out if we can achieve #1. OK? > > > > Ok. > > I'm still not clear on the answer to this. I think the answer is that > you think that this patch is OK to commit with some editing, but the > situation with the nmsgs > 0 patch is not so clear yet because it > hasn't had any review yet. And they depend on each other somehow. Am > I close? You are. I'd prefer pushing this after fixes for the two invalidation issues, because it makes testing easier. But I guess the timeframe there is unclear enough, that it makes sense to test this patch on top of the the preliminary fixes, and then push without those. - Andres
On Mon, Apr 25, 2016 at 4:57 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-25 16:29:36 -0400, Robert Haas wrote: >> On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote: >> > Well, I posted a patch. I'd have applied it too (after addressing your >> > comments obviously), except that there's some interdependencies with the >> > nsmg > 0 thread (some of my tests fail spuriously without that >> > fixed). Early last week I waited for a patch on that thread, but when >> > that didn't materialize by Friday I switched to work on that [1]. With >> > both fixes applied I can't reproduce any problems anymore. >> >> OK. What are the interdependencies? You've said that a few times but >> I am not clear on what the nature of those interdependencies is. > > I added checks to smgr/md.c that verify that the smgr state is > consistent with the on-file state. Without the two issues in [1] fixed, > these tests fail in a standby, while running regression tests. Adding > those tests made me notice a bug in an unreleased version of the patch, > so it seems they're worthwhile to run. Footnote [1] is used, but not defined. >> >> I assume you will be pretty darned unhappy if we end up at #2, so I am >> >> trying to figure out if we can achieve #1. OK? >> > >> > Ok. >> >> I'm still not clear on the answer to this. I think the answer is that >> you think that this patch is OK to commit with some editing, but the >> situation with the nmsgs > 0 patch is not so clear yet because it >> hasn't had any review yet. And they depend on each other somehow. Am >> I close? > > You are. I'd prefer pushing this after fixes for the two invalidation > issues, because it makes testing easier. But I guess the timeframe there > is unclear enough, that it makes sense to test this patch on top of the > the preliminary fixes, and then push without those. I think it makes sense to go ahead and push this fix rather soon. I'd much rather not have this bug, even if verifying that I don't have it is a little harder than it might be under other circumstances. The fix could be buggy too, and if that fix doesn't go in until right before we ship beta1, we're less likely to be able to find and correct any deficiencies before that goes out the door. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-26 17:25:18 -0400, Robert Haas wrote: > On Mon, Apr 25, 2016 at 4:57 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-25 16:29:36 -0400, Robert Haas wrote: > >> On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote: > >> > Well, I posted a patch. I'd have applied it too (after addressing your > >> > comments obviously), except that there's some interdependencies with the > >> > nsmg > 0 thread (some of my tests fail spuriously without that > >> > fixed). Early last week I waited for a patch on that thread, but when > >> > that didn't materialize by Friday I switched to work on that [1]. With > >> > both fixes applied I can't reproduce any problems anymore. > >> > >> OK. What are the interdependencies? You've said that a few times but > >> I am not clear on what the nature of those interdependencies is. > > > > I added checks to smgr/md.c that verify that the smgr state is > > consistent with the on-file state. Without the two issues in [1] fixed, > > these tests fail in a standby, while running regression tests. Adding > > those tests made me notice a bug in an unreleased version of the patch, > > so it seems they're worthwhile to run. > > Footnote [1] is used, but not defined. Oops, it's the thread you replied too.. > I think it makes sense to go ahead and push this fix rather soon. Will do.
Hi, On 2016-04-25 20:53:26 +0200, Fabien COELHO wrote: > I have just a small naming point: > > /* ereport if segment not present, create in recovery */ > EXTENSION_FAIL, > /* return NULL if not present, create in recovery */ > EXTENSION_RETURN_NULL, > /* return NULL if not present */ > EXTENSION_REALLY_RETURN_NULL, > /* create new segments as needed */ > EXTENSION_CREATE > > The comments seem pretty clear, but the naming of these options are more > behavioral than functional somehow (or the reverse?), especially the > RETURN_NULL and REALLY_RETURN_NULL names seemed pretty contrived to me. I tend to agree. But "fixing" that would mean changing quite some additional pieces of code, more than I want to do in a bugfix. I also think it's not *that* confusing... Thanks for looking. Andres
On 26 April 2016 at 22:32, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-26 17:25:18 -0400, Robert Haas wrote:
>> On Mon, Apr 25, 2016 at 4:57 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-04-25 16:29:36 -0400, Robert Haas wrote:
>> >> On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote:
>> >> > Well, I posted a patch. I'd have applied it too (after addressing your
>> >> > comments obviously), except that there's some interdependencies with the
>> >> > nsmg > 0 thread (some of my tests fail spuriously without that
>> >> > fixed). Early last week I waited for a patch on that thread, but when
>> >> > that didn't materialize by Friday I switched to work on that [1]. With
>> >> > both fixes applied I can't reproduce any problems anymore.
>> >>
>> >> OK. What are the interdependencies? You've said that a few times but
>> >> I am not clear on what the nature of those interdependencies is.
>> >
>> > I added checks to smgr/md.c that verify that the smgr state is
>> > consistent with the on-file state. Without the two issues in [1] fixed,
>> > these tests fail in a standby, while running regression tests. Adding
>> > those tests made me notice a bug in an unreleased version of the patch,
>> > so it seems they're worthwhile to run.
>>
>> Footnote [1] is used, but not defined.
>
> Oops, it's the thread you replied too..
>
>> I think it makes sense to go ahead and push this fix rather soon.
>
> Will do.
I've noticed another breakage, which I can reproduce consistently.
createdb pgbench
pgbench -i -s 100 --unlogged-tables pgbench
psql -f pgbench_partitions.sql pgbench
vacuumdb -z pgbench
createdb test
Which produces:
createdb: database creation failed: ERROR: checkpoint request failed
HINT: Consult recent messages in the server log for details.
The log shows:> On 2016-04-26 17:25:18 -0400, Robert Haas wrote:
>> On Mon, Apr 25, 2016 at 4:57 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-04-25 16:29:36 -0400, Robert Haas wrote:
>> >> On Mon, Apr 25, 2016 at 2:05 PM, Andres Freund <andres@anarazel.de> wrote:
>> >> > Well, I posted a patch. I'd have applied it too (after addressing your
>> >> > comments obviously), except that there's some interdependencies with the
>> >> > nsmg > 0 thread (some of my tests fail spuriously without that
>> >> > fixed). Early last week I waited for a patch on that thread, but when
>> >> > that didn't materialize by Friday I switched to work on that [1]. With
>> >> > both fixes applied I can't reproduce any problems anymore.
>> >>
>> >> OK. What are the interdependencies? You've said that a few times but
>> >> I am not clear on what the nature of those interdependencies is.
>> >
>> > I added checks to smgr/md.c that verify that the smgr state is
>> > consistent with the on-file state. Without the two issues in [1] fixed,
>> > these tests fail in a standby, while running regression tests. Adding
>> > those tests made me notice a bug in an unreleased version of the patch,
>> > so it seems they're worthwhile to run.
>>
>> Footnote [1] is used, but not defined.
>
> Oops, it's the thread you replied too..
>
>> I think it makes sense to go ahead and push this fix rather soon.
>
> Will do.
I've noticed another breakage, which I can reproduce consistently.
createdb pgbench
pgbench -i -s 100 --unlogged-tables pgbench
psql -f pgbench_partitions.sql pgbench
vacuumdb -z pgbench
createdb test
Which produces:
createdb: database creation failed: ERROR: checkpoint request failed
HINT: Consult recent messages in the server log for details.
2016-04-28 17:36:00 BST [18605]: [1-1] user=thom,db=postgres,client=[local] DEBUG: postgres child[18605]: starting with (
2016-04-28 17:36:00 BST [18605]: [2-1] user=thom,db=postgres,client=[local] DEBUG: postgres
2016-04-28 17:36:00 BST [18605]: [3-1] user=thom,db=postgres,client=[local] DEBUG: )
2016-04-28 17:36:00 BST [18605]: [4-1] user=thom,db=postgres,client=[local] DEBUG: InitPostgres
2016-04-28 17:36:00 BST [18605]: [5-1] user=thom,db=postgres,client=[local] DEBUG: StartTransaction
2016-04-28 17:36:00 BST [18605]: [6-1] user=thom,db=postgres,client=[local] DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nest
lvl: 1, children:
2016-04-28 17:36:00 BST [18605]: [7-1] user=thom,db=postgres,client=[local] DEBUG: CommitTransaction
2016-04-28 17:36:00 BST [18605]: [8-1] user=thom,db=postgres,client=[local] DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nest
lvl: 1, children:
2016-04-28 17:36:00 BST [18605]: [9-1] user=thom,db=postgres,client=[local] DEBUG: StartTransactionCommand
2016-04-28 17:36:00 BST [18605]: [10-1] user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test;
2016-04-28 17:36:00 BST [18605]: [11-1] user=thom,db=postgres,client=[local] DEBUG: StartTransaction
2016-04-28 17:36:00 BST [18605]: [12-1] user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test;
2016-04-28 17:36:00 BST [18605]: [13-1] user=thom,db=postgres,client=[local] DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nes
tlvl: 1, children:
2016-04-28 17:36:00 BST [18605]: [14-1] user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test;
2016-04-28 17:36:00 BST [18605]: [15-1] user=thom,db=postgres,client=[local] DEBUG: ProcessUtility
2016-04-28 17:36:00 BST [18605]: [16-1] user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test;
2016-04-28 17:36:00 BST [18108]: [46-1] user=,db=,client= DEBUG: performing replication slot checkpoint
2016-04-28 17:36:00 BST [18105]: [158-1] user=,db=,client= DEBUG: server process (PID 18582) exited with exit code 0
2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not fsync file "base/24581/24594.1" but retrying: No such file or directory
2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not fsync file "base/24581/24594.1": No such file or directory
2016-04-28 17:36:08 BST [18605]: [17-1] user=thom,db=postgres,client=[local] ERROR: checkpoint request failed
2016-04-28 17:36:08 BST [18605]: [18-1] user=thom,db=postgres,client=[local] HINT: Consult recent messages in the server log for details.
2016-04-28 17:36:08 BST [18605]: [19-1] user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test;
2016-04-28 17:36:08 BST [18605]: [20-1] user=thom,db=postgres,client=[local] DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2016-04-28 17:36:08 BST [18605]: [21-1] user=thom,db=postgres,client=[local] DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2016-04-28 17:36:08 BST [18605]: [22-1] user=thom,db=postgres,client=[local] DEBUG: proc_exit(0): 3 callbacks to make
2016-04-28 17:36:08 BST [18605]: [23-1] user=thom,db=postgres,client=[local] DEBUG: exit(0)
2016-04-28 17:36:08 BST [18605]: [24-1] user=thom,db=postgres,client=[local] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2016-04-28 17:36:08 BST [18605]: [25-1] user=thom,db=postgres,client=[local] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2016-04-28 17:36:08 BST [18605]: [26-1] user=thom,db=postgres,client=[local] DEBUG: proc_exit(-1): 0 callbacks to make
2016-04-28 17:36:08 BST [18105]: [159-1] user=,db=,client= DEBUG: server process (PID 18605) exited with exit code 0
2016-04-28 17:36:09 BST [18108]: [49-1] user=,db=,client= DEBUG: checkpointer updated shared memory configuration values
Relfilenode 24594 corresponds to the empty pgbench_accounts parent table.
Thom
Attachment
On 2016-04-28 17:41:29 +0100, Thom Brown wrote: > I've noticed another breakage, which I can reproduce consistently. > > createdb pgbench > pgbench -i -s 100 --unlogged-tables pgbench > psql -f pgbench_partitions.sql pgbench > vacuumdb -z pgbench > createdb test > > Which produces: > > createdb: database creation failed: ERROR: checkpoint request failed > HINT: Consult recent messages in the server log for details. Interesting. Any chance you could verify if this actually related to either the fix or the commit that was to blame for the previous issue? Because > user=thom,db=postgres,client=[local] STATEMENT: CREATE DATABASE test; > 2016-04-28 17:36:00 BST [18108]: [46-1] user=,db=,client= DEBUG: > performing replication slot checkpoint > 2016-04-28 17:36:00 BST [18105]: [158-1] user=,db=,client= DEBUG: server > process (PID 18582) exited with exit code 0 > 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not > fsync file "base/24581/24594.1" but retrying: No such file or directory > 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not > fsync file "base/24581/24594.1": No such file or directory > 2016-04-28 17:36:08 BST [18605]: [17-1] > user=thom,db=postgres,client=[local] ERROR: checkpoint request failed > 2016-04-28 17:36:08 BST [18605]: [18-1] > user=thom,db=postgres,client=[local] HINT: Consult recent messages in the > server log for details. > 2016-04-28 17:36:08 BST [18605]: [19-1] Doesn't really sound like it's necessarily related to the previous problem. - Andres
> createdb: database creation failed: ERROR: checkpoint request failed > HINT: Consult recent messages in the server log for details. Attached is a simpler/faster version, based on the previous script, I just added a CHECKPOINT after the EXPLAIN ANALYSE. It fails on head with: HINT: Consider increasing the configuration parameter "max_wal_size". ERROR: canceling autovacuum task CONTEXT: automaticvacuum of table "fabien.public.pgbench_accounts" ERROR: canceling autovacuum task CONTEXT: automatic analyze oftable "fabien.public.pgbench_accounts_1" ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "fabien.public.pgbench_accounts_2"ERROR: could not fsync file "base/16384/16397.1": No such file or directory ERROR: checkpointrequest failed HINT: Consult recent messages in the server log for details. STATEMENT: CHECKPOINT; I tried it on pg prior to the flushing patches and it did not fail, so it seems to be related somehow. Probably the fix is similar, just some conditions to check when dealing with a file which has been removed by a large DELETE. -- Fabien.
An interesting complement about the previous failing test: Although I disabled the "flushing" feature... sh> grep flush_after xxx/postgresql.conf bgwriter_flush_after = 0 # 0 disables, backend_flush_after =0 # 0 disables, wal_writer_flush_after = 0 # 0 disables checkpoint_flush_after = 0 #0 disables, the test still fails with "head": ... ERROR: could not fsync file "base/16384/16397.1": No such file or directory ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details. STATEMENT: CHECKPOINT; So this seem to suggest that there is an underlying issue independent from the flushing file. -- Fabien.
On 2016-04-28 17:41:29 +0100, Thom Brown wrote: > I've noticed another breakage, which I can reproduce consistently. > 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not > fsync file "base/24581/24594.1" but retrying: No such file or directory > 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not > fsync file "base/24581/24594.1": No such file or directory > 2016-04-28 17:36:08 BST [18605]: [17-1] > user=thom,db=postgres,client=[local] ERROR: checkpoint request failed > 2016-04-28 17:36:08 BST [18605]: [18-1] > user=thom,db=postgres,client=[local] HINT: Consult recent messages in the > server log for details. Yuck. md.c is so crummy :( Basically the reason for the problem is that mdsync() needs to access "formally non-existant segments" (as in ones where previous segments are < RELSEG_SIZE), because we queue (and the might be preexistant) fsync requests via register_dirty_segment() in mdtruncate(). I'm a bit of a loss of how to reconcile that view with the original issue in this thread. The best I can come up with this moment is doing a _mdfd_openseg() in mdsync() to open the truncated segment if _mdfd_getseg() returned NULL. We don't want to normally use that in either function because it'll imply a separate open() etc, which is pretty expensive - but doing in the fallback case would be kind of ok. Andres
On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-28 17:41:29 +0100, Thom Brown wrote: >> I've noticed another breakage, which I can reproduce consistently. > >> 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not >> fsync file "base/24581/24594.1" but retrying: No such file or directory >> 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not >> fsync file "base/24581/24594.1": No such file or directory >> 2016-04-28 17:36:08 BST [18605]: [17-1] >> user=thom,db=postgres,client=[local] ERROR: checkpoint request failed >> 2016-04-28 17:36:08 BST [18605]: [18-1] >> user=thom,db=postgres,client=[local] HINT: Consult recent messages in the >> server log for details. > > Yuck. md.c is so crummy :( > > Basically the reason for the problem is that mdsync() needs to access > "formally non-existant segments" (as in ones where previous segments are > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync > requests via register_dirty_segment() in mdtruncate(). Shouldn't we just throw those flush requests away? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-05-02 12:29:45 -0400, Robert Haas wrote: > On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund <andres@anarazel.de> wrote: > > Basically the reason for the problem is that mdsync() needs to access > > "formally non-existant segments" (as in ones where previous segments are > > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync > > requests via register_dirty_segment() in mdtruncate(). > > Shouldn't we just throw those flush requests away? Well, we explicity make them for truncations (register_dirty_segment() calls in mdtruncate()). There's no comment as to why - I suspect the idea is that you want to make sure the truncation sticks in case of crash? FWIW, falling back to _mdfd_openseg() fixes the issue. Andres
On Mon, May 2, 2016 at 12:41 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-02 12:29:45 -0400, Robert Haas wrote: >> On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund <andres@anarazel.de> wrote: >> > Basically the reason for the problem is that mdsync() needs to access >> > "formally non-existant segments" (as in ones where previous segments are >> > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync >> > requests via register_dirty_segment() in mdtruncate(). >> >> Shouldn't we just throw those flush requests away? > > Well, we explicity make them for truncations (register_dirty_segment() > calls in mdtruncate()). There's no comment as to why - I suspect the > idea is that you want to make sure the truncation sticks in case of > crash? I dunno, I don't understand this well enough yet. > FWIW, falling back to _mdfd_openseg() fixes the issue. Can you post a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-05-02 12:44:53 -0400, Robert Haas wrote: > On Mon, May 2, 2016 at 12:41 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-02 12:29:45 -0400, Robert Haas wrote: > >> On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund <andres@anarazel.de> wrote: > >> > Basically the reason for the problem is that mdsync() needs to access > >> > "formally non-existant segments" (as in ones where previous segments are > >> > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync > >> > requests via register_dirty_segment() in mdtruncate(). > >> > >> Shouldn't we just throw those flush requests away? > > > > Well, we explicity make them for truncations (register_dirty_segment() > > calls in mdtruncate()). There's no comment as to why - I suspect the > > idea is that you want to make sure the truncation sticks in case of > > crash? > > I dunno, I don't understand this well enough yet. > > > FWIW, falling back to _mdfd_openseg() fixes the issue. > > Can you post a patch? Sure, attached. I'm not sure this is the best way to go about this. I can see valid arguments for *always* using _mdfd_openseg() in mdsync(); and I'm wondering whether we shouldn't make EXTENSION_* into a bitmask (extend,extend_recovery,return_null,open_deleted). Andres
Attachment
Hello Andres, > Sure, attached. For what it is worth, it works for me on head. I'm wondering that if _mdfd_openseg may return NULL, then ISTM that "opened_directly" should logically be false, because it was not opened? If so, maybe consider something like: opened_directy = (seg != NULL); Now it does not change the result because the later code seems garded against a NULL seg, but it does not look right to have a boolean to say that a segment was opened if it was not indeed the case. Given the comments, I understand that the truncation implementation is a shortcut with a sting, as a lot of functions must then take into account that something unusual may have happen somewhere and deal with it... Also, I do not understand why this issue is raised by the flushing patch, it seems rather independent. > I'm not sure this is the best way to go about this. I can see valid > arguments for *always* using _mdfd_openseg() in mdsync(); and I'm > wondering whether we shouldn't make EXTENSION_* into a bitmask > (extend,extend_recovery,return_null,open_deleted). I thought about that when I looked at the previous fix, but it seemed that not all combinations made sense. -- Fabien.
On 2016-05-02 22:00:14 +0200, Fabien COELHO wrote: > I'm wondering that if _mdfd_openseg may return NULL, then ISTM that > "opened_directly" should logically be false, because it was not opened? > > If so, maybe consider something like: > > opened_directy = (seg != NULL); Hm, don't care either way. Seems just as valid to understand it as the attempt to directly open the segment. > Also, I do not understand why this issue is raised by the flushing patch, it > seems rather independent. It's not the flushing itself, it's 72a98a639574d2e25ed94652848555900c81a799 > >I'm not sure this is the best way to go about this. I can see valid > >arguments for *always* using _mdfd_openseg() in mdsync(); and I'm > >wondering whether we shouldn't make EXTENSION_* into a bitmask > >(extend,extend_recovery,return_null,open_deleted). > > I thought about that when I looked at the previous fix, but it seemed that > not all combinations made sense. Sure, but that's nothing unusual. Here's an attempt at doing so - not fully polished, just as a discussion point. I think it looks better. Fabien, Robert, what do you think? Greetings, Andres Freund
Attachment
Hello Andres, >>> I'm not sure this is the best way to go about this. I can see valid >>> arguments for *always* using _mdfd_openseg() in mdsync(); and I'm >>> wondering whether we shouldn't make EXTENSION_* into a bitmask >>> (extend,extend_recovery,return_null,open_deleted). >> >> I thought about that when I looked at the previous fix, but it seemed that >> not all combinations made sense. > > Sure, but that's nothing unusual. Here's an attempt at doing so - not > fully polished, just as a discussion point. I think it looks better. > Fabien, Robert, what do you think? My 0,02€. Not tested, just a few comments on the patch from someone which does not understand this API deep down... Nevertheless: I agree that it is looks better than "EXTENSION_REALLY_RETURNS_NULL", that I did not like much. There are 3 possible behaviors on extension, but coding them as bits does not make their exclusivity clear. Now mixing numbers & bits does not seem advisable either. Maybe consider checking for the exclusivity explicitely? EXTENSION_BEHAVIORS = (EXTENSION_RETURN_NULL | ..._FAIL | ..._CREATE); And then the Assert can check for the exclusivity: int behavior = option & EXTENSION_BEHAVIORS; Assert( (behavior == EXTENSION_RETURN_NULL) || (behavior == ..._FAIL)|| (behavior == ..._CREATE)); I'm unsure about switching enum to #define, could be an enum still with explicit values set, something like: enum { EXTENSION_RETURN_NULL = (1 << 0), ... } extension_behavior; I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it RECREATE really? -- Fabien.
Hi, On 2016-05-03 00:05:35 +0200, Fabien COELHO wrote: > Maybe consider checking for the exclusivity explicitely? I thought about it, and decided it's not worth it. Requiring one of those to be specified seems stringent enough. > I'm unsure about switching enum to #define, could be an enum still with > explicit values set, something like: > > enum { > EXTENSION_RETURN_NULL = (1 << 0), > ... > } extension_behavior; An enum doesn't have a benefit for a bitmask imo - you can't "legally" use it as a type for functions accepting the bitmask. > I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it > RECREATE really? No. The relevant explanation is at the top of the file:* On disk, a relation must consist of consecutively numbered segment* files in the pattern* -- Zero or more full segments of exactly RELSEG_SIZE blocks each* -- Exactlyone partial segment of size 0 <= size < RELSEG_SIZE blocks* -- Optionally, any number of inactive segmentsof size 0 blocks.* The full and partial segments are collectively the "active" segments.* Inactive segmentsare those that once contained data but are currently* not needed because of an mdtruncate() operation. The reasonfor leaving* them present at size zero, rather than unlinking them, is that other* backends and/or the checkpointermight be holding open file references to* such segments. If the relation expands again after mdtruncate(),such* that a deactivated segment becomes active again, it is important that* such file references stillbe valid --- else data might get written* out to an unlinked old copy of a segment file that will eventually* disappear. - Andres
Hello, >> I'm unsure about switching enum to #define, could be an enum still with >> explicit values set, something like: > > An enum doesn't have a benefit for a bitmask imo - you can't "legally" > use it as a type for functions accepting the bitmask. I do not understand. I suggested to use enum to enumerate the bitmask constants, ISTM that it does not preclude to use it as a bitmask as you do, it is just a replacement of the #define? The type constraint on the enum does not disallow bitmasking values, I checked with both gcc & clang. >> I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it >> RECREATE really? > > No. The relevant explanation is at the top of the file: [...] > * -- Optionally, any number of inactive segments of size 0 blocks. > * Inactive segments are those that once contained data but are currently > * not needed because of an mdtruncate() operation. The reason for leaving > * them present at size zero, rather than unlinking them, is that other > * backends and/or the checkpointer might be holding open file references to > * such segments. If the relation expands again after mdtruncate(), such > * that a deactivated segment becomes active again, it is important that > * such file references still be valid --- else data might get written > * out to an unlinked old copy of a segment file that will eventually > * disappear. Ok. Then should it be _OPEN_INACTIVE[TED] or _OPEN_TRUNCATED rather than _OPEN_DELETED, which is contradictory? -- Fabien.
Hi, On 2016-05-03 07:24:46 +0200, Fabien COELHO wrote: > >>I'm unsure about switching enum to #define, could be an enum still with > >>explicit values set, something like: > > > >An enum doesn't have a benefit for a bitmask imo - you can't "legally" > >use it as a type for functions accepting the bitmask. > > I do not understand. I suggested to use enum to enumerate the bitmask > constants, ISTM that it does not preclude to use it as a bitmask as you do, > it is just a replacement of the #define? The type constraint on the enum > does not disallow bitmasking values, I checked with both gcc & clang. There's not really a point in using an enum if you use neither the type (which you shouldn't because if you or the bitmask value you have types outside the range of the enum), nor the autogenerated numbers. Anyway seems fairly unimportant. > >>I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it > >>RECREATE really? > > > >No. The relevant explanation is at the top of the file: > > [...] > > >* -- Optionally, any number of inactive segments of size 0 blocks. > >* Inactive segments are those that once contained data but are currently > >* not needed because of an mdtruncate() operation. The reason for leaving > >* them present at size zero, rather than unlinking them, is that other > >* backends and/or the checkpointer might be holding open file references to > >* such segments. If the relation expands again after mdtruncate(), such > >* that a deactivated segment becomes active again, it is important that > >* such file references still be valid --- else data might get written > >* out to an unlinked old copy of a segment file that will eventually > >* disappear. > > Ok. > > Then should it be _OPEN_INACTIVE[TED] or _OPEN_TRUNCATED rather than > _OPEN_DELETED, which is contradictory? Well, TRUNCATED doesn't entirely work, there's I think some cases where this currently also applies to deleted relations. I kind of like the unintuitive sounding name, because it's really a dangerous options (any further mdnblocks will be wrong). Andres
Hello Andres, >>> An enum doesn't have a benefit for a bitmask imo - you can't "legally" >>> use it as a type for functions accepting the bitmask. >> >> I do not understand. I suggested to use enum to enumerate the bitmask >> constants, ISTM that it does not preclude to use it as a bitmask as you do, >> it is just a replacement of the #define? The type constraint on the enum >> does not disallow bitmasking values, I checked with both gcc & clang. > > There's not really a point in using an enum if you use neither the type > (which you shouldn't because if you or the bitmask value you have types > outside the range of the enum), nor the autogenerated numbers. I do not think that there is such a constraint in C, you can use the enum bitfield type, so you should. You can say in the type name that it is a bitmask to make it clearer: typedef enum { EXTENSION_XXX = ...; } extension_behavior_bitfield; > Anyway seems fairly unimportant. Sure, it is just cosmetic. Now the type is already an enum and you can keep it that way, ISTM that it is cleaner to avoid defines, so I think you should. >> Then should it be _OPEN_INACTIVE[TED] or _OPEN_TRUNCATED rather than >> _OPEN_DELETED, which is contradictory? > > Well, TRUNCATED doesn't entirely work, there's I think some cases where > this currently also applies to deleted relations. I kind of like the > unintuitive sounding name, because it's really a dangerous options (any > further mdnblocks will be wrong). Hmmm. My issue is with the semantics of the name which implies how it can be understand by someone reading the code. The interface deals with files, so implicitely DELETED refers to files, and AFAICS no file was deleted. Maybe rows in the relation where deleted somehow, or entries in an index, but that has no sense at the API level. So I think you should not use DELETED. I can see why a vaguely oxymoronic name is amusing, but I do not think that it conveys any idea of "dangerous" as you suggest. Now the important think is probably not that the code is the cleanest possible one, but that it fixes the issue, so you should probably commit something. -- Fabien.
On 2016-04-28 17:41:29 +0100, Thom Brown wrote: > I've noticed another breakage, which I can reproduce consistently. Thanks for the testing! I pushed a fix for this. This wasn't actually an issue in the original patch, but a too strict test added in my fix. Greetings, Andres Freund
On 2016-05-04 07:46:42 +0200, Fabien COELHO wrote: > >Well, TRUNCATED doesn't entirely work, there's I think some cases where > >this currently also applies to deleted relations. I kind of like the > >unintuitive sounding name, because it's really a dangerous options (any > >further mdnblocks will be wrong). > > Hmmm. I went with the boring DONT_CHECK_SIZE ;)
On 4 May 2016 at 09:59, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-28 17:41:29 +0100, Thom Brown wrote:
> I've noticed another breakage, which I can reproduce consistently.
Thanks for the testing! I pushed a fix for this. This wasn't actually
an issue in the original patch, but a too strict test added in my fix.
Re-testing shows that this appears to have solved the issue.
Thanks
Thom
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> There's not really a point in using an enum if you use neither the type >> (which you shouldn't because if you or the bitmask value you have types >> outside the range of the enum), nor the autogenerated numbers. > I do not think that there is such a constraint in C, you can use the enum > bitfield type, so you should. I think you are failing to understand Andres' point. If you're going to OR together some bits, the result is no longer a member of the enum type, and the advantages that an enum would have immediately turn into disadvantages. regards, tom lane
Hello Tom, >>> There's not really a point in using an enum if you use neither the type >>> (which you shouldn't because if you or the bitmask value you have types >>> outside the range of the enum), nor the autogenerated numbers. > >> I do not think that there is such a constraint in C, you can use the enum >> bitfield type, so you should. > > I think you are failing to understand Andres' point. If you're going > to OR together some bits, the result is no longer a member of the enum > type, and the advantages that an enum would have immediately turn into > disadvantages. I understood the point and I do not see real disadvantages. The C standard really says that an enum is an int, and compilers just do that. I see it as a matter of interpretation whether enum members are strictly allowed values or just allowed bits, but maybe the standard says otherwise. Anyway, the good news is that the bug is now fixed. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > I understood the point and I do not see real disadvantages. The C standard > really says that an enum is an int, and compilers just do that. No, it doesn't say that, and compilers don't just do that. A compiler is specifically allowed to store an enum in char or short if the enum's declared values would all fit into that width. (Admittedly, if you're choosing the values as powers of 2, an OR of them would still fit; but if you think "oh, an enum is just an int", you will get burned.) More to the point, once you allow OR'd values then none of the practical benefits of an enum still hold good. The typical things that I rely on in an enum that you don't get from a collection of #define's are: * compiler warnings if you forget some members of the enum in a switch * debugger ability to print variables symbolically Those benefits both go up in smoke as soon as you allow OR'd values. At that point you might as well use the #defines rather than playing language lawyer about whether what you're doing meets the letter of the spec. I note that C99 specifically mentions this as something a compiler might warn about: -- A value is given to an object of an enumeration type other than by assignment of an enumeration constant that is a member of that type, or an enumeration variable that has the sametype, or the value of a function that returns the same enumeration type (6.7.2.2). which certainly looks like they don't consider "enumvar = ENUMVAL1 | ENUMVAL2" to be strictly kosher. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > * debugger ability to print variables symbolically I might be misunderstanding what you're getting at here, but if you want to be able to use #define'd values using their name, you can get that by compiling with -g3. With -g3 and gdb, you can do things like: (gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT | DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL) where all the DUMP_COMPONENT items are #define's. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> * debugger ability to print variables symbolically > I might be misunderstanding what you're getting at here, but if you want > to be able to use #define'd values using their name, you can get that by > compiling with -g3. With -g3 and gdb, you can do things like: > (gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT | > DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL) > where all the DUMP_COMPONENT items are #define's. Yes, but if you just do "p tblinfo[i].dobj.dump", you're only going to get a number, right? The value-added for an enum type is that the debugger knows which symbol to substitute for a numeric value when printing. But that stops working if what's in the variable is some OR of the values the debugger knows about. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> * debugger ability to print variables symbolically > > > I might be misunderstanding what you're getting at here, but if you want > > to be able to use #define'd values using their name, you can get that by > > compiling with -g3. With -g3 and gdb, you can do things like: > > > (gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT | > > DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL) > > > where all the DUMP_COMPONENT items are #define's. > > Yes, but if you just do "p tblinfo[i].dobj.dump", you're only going to get > a number, right? The value-added for an enum type is that the debugger > knows which symbol to substitute for a numeric value when printing. But > that stops working if what's in the variable is some OR of the values the > debugger knows about. Ah, yeah, that's true. -g3, unsurprisingly, doesn't help with displaying a random int value using some set of #define's. Thanks! Stephen
Hello Tom, >> I understood the point and I do not see real disadvantages. The C standard >> really says that an enum is an int, and compilers just do that. > > No, it doesn't say that, and compilers don't just do that. > A compiler is specifically allowed to store an enum in char or short if > the enum's declared values would all fit into that width. Hmm. That is a bit of a subtle one: Spec says that enum *constants* are ints "The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted." But indeed, as you point out, per spec the storage could be made smaller. However, I'm yet to meet a compiler which does not do ints: typedef enum { false, true } boolean; assert(sizeof(boolean) == sizeof(int)); // ok with gcc & clang I can guess why: such a compiler would not be able to work with libraries compiled with gcc or clang, which would be an pretty annoying feature. Now it does not mean that such a compiler does not exist in some realm (8/16 bits architectures maybe? but then ints would be 16 bits on these...). > (Admittedly, if you're choosing the values as powers of 2, an OR of them > would still fit; Yep. > but if you think "oh, an enum is just an int", you will get burned.) In theory, yes. In practice, the compiler writer would have get burned before me:-). > * compiler warnings if you forget some members of the enum in a switch Sure. Using a switch on a bitfield is an uncommon pattern, though. > * debugger ability to print variables symbolically Yep. Names are lost by defines, which is my preliminary concern to try to avoid them, even at the price of beting against the spec letter:-) > At that point you might as well use the #defines rather than playing > language lawyer about whether what you're doing meets the letter of > the spec. Hmmm... the compilers are the real judges, the spec is just the law:-) > I note that C99 specifically mentions this as something a compiler might > warn about: [...] Indeed. Neither gcc nor clang emit such warnings... but they might some day, which would be a blow for my suggestion! Another option would have been explicit bitfields, something like: typedef struct { int create : 1, return_null : 1, fail : 1, create_in_recovery : 1, // ... ; } extension_bitfield_t; void foo(extension_bitfield_t behavior) { if (behavior.create) printf("create...\n"); if (behavior.return_null)printf("null...\n"); if (behavior.fail) printf("fail...\n"); if (behavior.create_in_recovery)printf("recovery...\n"); } void bla(void) { foo((extension_bitfield_t) { .fail = 1, .create_in_recovery = 1 }); } With gdb it is quite readable: // (gdb) p behavior // $1 = {create = 0, return_null = 0, fail = -1, create_in_recovery = -1} Anyway, thanks for the discussion! -- Fabien.
On 2016-05-05 09:32, Fabien COELHO wrote: >> I note that C99 specifically mentions this as something a compiler >> might warn about: [...] > > Indeed. Neither gcc nor clang emit such warnings... but they might some > day, which would be a blow for my suggestion! For what it's worth, newer versions of clang can emit useful warnings for cases like this: int main(void) {enum test { FOUR = 4 };enum incompatible { INCOMPATIBLE_FOUR = 4 };enum test variable;variable = INCOMPATIBLE_FOUR;variable= 5;variable = 4;variable = 3;return 0; } enum.c:5:13: warning: implicit conversion from enumeration type 'enum incompatible' to different enumeration type 'enum test' [-Wenum-conversion] variable = INCOMPATIBLE_FOUR; ~ ^~~~~~~~~~~~~~~~~ enum.c:6:13: warning: integer constant not in range of enumerated type 'enum test' [-Wassign-enum] variable = 5; ^ enum.c:8:13: warning: integer constant not in range of enumerated type 'enum test' [-Wassign-enum] variable = 3; ^ 3 warnings generated. So with -Wenum-conversion -Wassign-enum you could treat enum types as distinct and incompatible with each other.