Thread: Postmaster holding unlinked files for pg_largeobject table
Hello Hackers, There is some strange behavior we're experiencing with one of the customer's DBs (8.4) We've noticed that free disk space went down heavily on a system, and after a short analysis determined that the reason wasthat postmaster was holding lots of unlinked files open. A sample of lsof output was something like this: postmaste 15484 postgres 57u REG 253,0 1073741824 41125093 /srv/pgsql/data/base/352483309/2613.2 (deleted) postmaste 15484 postgres 58u REG 253,0 1073741824 41125094 /srv/pgsql/data/base/352483309/2613.3 (deleted) postmaste 15484 postgres 59u REG 253,0 1073741824 41125095 /srv/pgsql/data/base/352483309/2613.4 (deleted) There were about 450 such (or similar) files, all of them having /2613 in the filename. Since 2613 is a regclass of pg_largeobjectand we are indeed working with quite a few large objects in that DB so this is where our problem lies we suspect. Restarting PostgreSQL obviously helps the issue and the disk space occupied by those unlinked files (about 63GB actually)is reclaimed. So what happens on that host is that we drop/restore a fresh version of the DB from the production host, followed by a migrationscript which among other things loads around 16GB of data files as large objects. This happens nightly. But if we go and run the whole drop/restore and migration manually, the problem doesn't manifest itself right after migrationis successfully finished. Our next thought was that it must be dropdb part of the nightly script that removes the pg_largeobject's files (still wedon't know what makes it keep them opened,) but dropping the DB doesn't manifest the problem either. I'm currently running a VACUUM pg_largeobject on the problematic DB, to see if it triggers the problem, but this didn't happenso far. At this point it would be nice to hear what are your thoughts. What could cause such behavior? -- Regards, Alex
Excerpts from Alexander Shulgin's message of vie jun 03 17:45:28 -0400 2011: > There were about 450 such (or similar) files, all of them having /2613 in the filename. Since 2613 is a regclass of pg_largeobjectand we are indeed working with quite a few large objects in that DB so this is where our problem lies we suspect. > > Restarting PostgreSQL obviously helps the issue and the disk space occupied by those unlinked files (about 63GB actually)is reclaimed. > > So what happens on that host is that we drop/restore a fresh version of the DB from the production host, followed by amigration script which among other things loads around 16GB of data files as large objects. This happens nightly. What surprises me is that the open references remain after a database drop. Surely this means that no backends keep open file descriptors to any table in that database, because there are no connections. I also requested Alexander to run a checkpoint and see if that made the FDs go away (on the theory that bgwriter could be the culprit) -- no dice. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > What surprises me is that the open references remain after a database > drop. Surely this means that no backends keep open file descriptors to > any table in that database, because there are no connections. bgwriter ... regards, tom lane
Excerpts from Tom Lane's message of sáb jun 04 12:49:05 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > What surprises me is that the open references remain after a database > > drop. Surely this means that no backends keep open file descriptors to > > any table in that database, because there are no connections. > > bgwriter ... Actually you were both wrong, hah. It's not bgwriter, and this doesn't belong on pgsql-general. It's a backend. However, as we mentioned initially, the database to which this file belongs is dropped. What we found out after more careful investigation is that the file is kept open by a backend connected to a different database. I have a suspicion that what happened here is that this backend was forced to flush out a page from shared buffers to read some other page; and it was forced to do a fsync of this file. And then it forgets to close the file descriptor. Actually, there are 11 processes holding open file descriptors to the table, each to a slightly different subset of the many segments of the table. (There's also one holding a FD to the deleted pg_largeobject_loid_pn_index -- remember, this is a dropped database). All those backends belong to Zabbix, the monitoring platform, which are connected to a different database. I think what we have here is a new bug. (This is running 8.4.8, by the way.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > What we found out after more careful investigation is that the > file is kept open by a backend connected to a different database. > I have a suspicion that what happened here is that this backend > was forced to flush out a page from shared buffers to read some > other page; and it was forced to do a fsync of this file. And > then it forgets to close the file descriptor. This sounds vaguely similar to what I found with WAL files being held open for days after they were deleted by read-only backends: http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us I mention it only because there might be one place to fix both.... -Kevin
Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > What we found out after more careful investigation is that the > > file is kept open by a backend connected to a different database. > > I have a suspicion that what happened here is that this backend > > was forced to flush out a page from shared buffers to read some > > other page; and it was forced to do a fsync of this file. And > > then it forgets to close the file descriptor. > > This sounds vaguely similar to what I found with WAL files being > held open for days after they were deleted by read-only backends: > > http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us > > I mention it only because there might be one place to fix both.... Hmm interesting. I don't think the placement suggested by Tom would be useful, because the Zabbix backends are particularly busy all the time, so they wouldn't run ProcessCatchupEvent at all. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011: >> Alvaro Herrera <alvherre@commandprompt.com> wrote: >>> What we found out after more careful investigation is that the >>> file is kept open by a backend connected to a different database. >>> I have a suspicion that what happened here is that this backend >>> was forced to flush out a page from shared buffers to read some >>> other page; and it was forced to do a fsync of this file. And >>> then it forgets to close the file descriptor. It doesn't "forget" to close the descriptor; it intentionally keeps it for possible further use. >> This sounds vaguely similar to what I found with WAL files being >> held open for days after they were deleted by read-only backends: >> http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us >> I mention it only because there might be one place to fix both.... > Hmm interesting. I don't think the placement suggested by Tom would be > useful, because the Zabbix backends are particularly busy all the time, > so they wouldn't run ProcessCatchupEvent at all. Yeah, I wasn't that thrilled with the suggestion either. But we can't just have backends constantly closing every open FD they hold, or performance will suffer. I don't see any very good place to do this... regards, tom lane
Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Hmm interesting. I don't think the placement suggested by Tom would be > > useful, because the Zabbix backends are particularly busy all the time, > > so they wouldn't run ProcessCatchupEvent at all. > > Yeah, I wasn't that thrilled with the suggestion either. But we can't > just have backends constantly closing every open FD they hold, or > performance will suffer. I don't see any very good place to do this... How about doing something on an sinval message for pg_database? That doesn't solve the WAL problem Kevin found, of course ... -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > That doesn't solve the WAL problem Kevin found, of course ... I wouldn't worry about that too much -- the WAL issue is self-limiting and not likely to ever cause a failure. The biggest risk is that every now and then some new individual will notice it and waste a bit of time investigating. The LO issue, on the other hand, could easily eat enough disk to cause a failure. -Kevin
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: >> Yeah, I wasn't that thrilled with the suggestion either. But we can't >> just have backends constantly closing every open FD they hold, or >> performance will suffer. I don't see any very good place to do this... > How about doing something on an sinval message for pg_database? > That doesn't solve the WAL problem Kevin found, of course ... Hmm ... that would help for the specific scenario of dropped databases, but we've also heard complaints about narrower cases such as a single dropped table. A bigger issue is that I don't believe it's very practical to scan the FD array looking for files associated with a particular database (or table). They aren't labeled that way, and parsing the file path to find out the info seems pretty grotty. On reflection I think this behavior is probably limited to the case where we've done what we used to call a "blind write" of a block that is unrelated to our database or tables. For normal SQL-driven accesses, there's a relcache entry, and flushing of that entry will lead to closure of associated files. I wonder whether we should go back to forcibly closing the FD after a blind write. This would suck if a backend had to do many dirty-buffer flushes for the same relation, but hopefully the bgwriter is doing most of those. We'd want to make sure such forced closure *doesn't* occur in the bgwriter. (If memory serves, it has a checkpoint-driven closure mechanism instead.) regards, tom lane
On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: >>> Yeah, I wasn't that thrilled with the suggestion either. But we can't >>> just have backends constantly closing every open FD they hold, or >>> performance will suffer. I don't see any very good place to do this... > >> How about doing something on an sinval message for pg_database? >> That doesn't solve the WAL problem Kevin found, of course ... > > Hmm ... that would help for the specific scenario of dropped databases, > but we've also heard complaints about narrower cases such as a single > dropped table. > > A bigger issue is that I don't believe it's very practical to scan the > FD array looking for files associated with a particular database (or > table). They aren't labeled that way, and parsing the file path to > find out the info seems pretty grotty. > > On reflection I think this behavior is probably limited to the case > where we've done what we used to call a "blind write" of a block that > is unrelated to our database or tables. For normal SQL-driven accesses, > there's a relcache entry, and flushing of that entry will lead to > closure of associated files. I wonder whether we should go back to > forcibly closing the FD after a blind write. This would suck if a > backend had to do many dirty-buffer flushes for the same relation, > but hopefully the bgwriter is doing most of those. We'd want to make > sure such forced closure *doesn't* occur in the bgwriter. (If memory > serves, it has a checkpoint-driven closure mechanism instead.) Instead of closing them immediately, how about flagging the FD and closing all the flagged FDs at the end of each query, or something like that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On reflection I think this behavior is probably limited to the case >> where we've done what we used to call a "blind write" of a block that >> is unrelated to our database or tables. �For normal SQL-driven accesses, >> there's a relcache entry, and flushing of that entry will lead to >> closure of associated files. �I wonder whether we should go back to >> forcibly closing the FD after a blind write. �This would suck if a >> backend had to do many dirty-buffer flushes for the same relation, >> but hopefully the bgwriter is doing most of those. �We'd want to make >> sure such forced closure *doesn't* occur in the bgwriter. �(If memory >> serves, it has a checkpoint-driven closure mechanism instead.) > Instead of closing them immediately, how about flagging the FD and > closing all the flagged FDs at the end of each query, or something > like that? Hmm, there's already a mechanism for closing "temp" FDs at the end of a query ... maybe blind writes could use temp-like FDs? regards, tom lane
Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> On reflection I think this behavior is probably limited to the case > >> where we've done what we used to call a "blind write" of a block that > >> is unrelated to our database or tables. For normal SQL-driven accesses, > >> there's a relcache entry, and flushing of that entry will lead to > >> closure of associated files. I wonder whether we should go back to > >> forcibly closing the FD after a blind write. This would suck if a > >> backend had to do many dirty-buffer flushes for the same relation, > >> but hopefully the bgwriter is doing most of those. We'd want to make > >> sure such forced closure *doesn't* occur in the bgwriter. (If memory > >> serves, it has a checkpoint-driven closure mechanism instead.) > > > Instead of closing them immediately, how about flagging the FD and > > closing all the flagged FDs at the end of each query, or something > > like that? > > Hmm, there's already a mechanism for closing "temp" FDs at the end of a > query ... maybe blind writes could use temp-like FDs? OK, I'll have a look at how blind writes work this afternoon and propose something. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011: > Robert Haas <robertmhaas@gmail.com> writes: > > Instead of closing them immediately, how about flagging the FD and > > closing all the flagged FDs at the end of each query, or something > > like that? > > Hmm, there's already a mechanism for closing "temp" FDs at the end of a > query ... maybe blind writes could use temp-like FDs? I don't think it can be made to work exactly like that. If I understand correctly, the code involved here is the FlushBuffer() call that happens during BufferAlloc(), and what we have at that point is a SMgrRelation; we're several levels removed from actually being able to set the FD_XACT_TEMPORARY flag which is what I think you're thinking of. What I think would make some sense is to keep a list of SMgrRelations that we opened during FlushBuffer that are foreign to the current database, in the current ResourceOwner. That way, when the resowner is destroyed, we would be able to smgrclose() them. This would only be done when called by a backend, not bgwriter. (I'm not really sure about requiring the buffer to belong to a relation in another database, given the report that this is also a problem with dropped tables. However, it certainly makes no sense to try to remember *all* buffers.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011: >> Hmm, there's already a mechanism for closing "temp" FDs at the end of a >> query ... maybe blind writes could use temp-like FDs? > I don't think it can be made to work exactly like that. If I understand > correctly, the code involved here is the FlushBuffer() call that happens > during BufferAlloc(), and what we have at that point is a SMgrRelation; > we're several levels removed from actually being able to set the > FD_XACT_TEMPORARY flag which is what I think you're thinking of. It's not *that* many levels: in fact, I think md.c is the only level that would just have to pass it through without doing anything useful. I think that working from there is a saner and more efficient approach than what you're sketching. If you want a concrete design sketch, consider this: 1. Add a flag to the SMgrRelation struct that has the semantics of "all files opened through this SMgrRelation should be marked as transient, causing them to be automatically closed at end of xact". 2. *Any* normal smgropen() call would reset this flag (since it suggests that we are accessing the relation because of SQL activity). In the single case where FlushBuffer() is called with reln == NULL, it would set the flag after doing its local smgropen(). 3. Then, modify md.c to pass the flag down to fd.c whenever opening an FD file. fd.c sets a bit in the resulting VFD. 4. Extend CleanupTempFiles to close the kernel FD (but not release the VFD) when a VFD has the bit set. I'm fairly sure that CleanupTempFiles is never called in the bgwriter, so we don't even need any special hack to prevent the flag from becoming set in the bgwriter. regards, tom lane
Excerpts from Tom Lane's message of mar jun 07 12:26:34 -0400 2011: > It's not *that* many levels: in fact, I think md.c is the only level > that would just have to pass it through without doing anything useful. > I think that working from there is a saner and more efficient approach > than what you're sketching. > > If you want a concrete design sketch, consider this: Okay, here's a patch implementing this idea. It seems to work quite well, and it solves the problem in a limited testing scenario -- I haven't yet tested on the customer machines. This customer is running on 8.4 so I started from there; should I backpatch this to 8.2, or not at all? (I have all the branches ready anyway.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > Okay, here's a patch implementing this idea. It seems to work quite > well, and it solves the problem in a limited testing scenario -- I > haven't yet tested on the customer machines. This seems mostly sane, except I think you have not considered the issue of when to clear the smgr_transient flag on an existing SMgrRelation: if it starts getting used for "normal" accesses after having by chance been used for a blind write, we don't want the transient marking to persist. That's why I suggested having smgropen always clear it. Likewise, I think the FD_XACT_TRANSIENT flag on a VFD needs to go away at some point, probably once it's actually been closed at EOXACT, though there's doubtless more than one way to handle that. > This customer is running on 8.4 so I started from there; should I > backpatch this to 8.2, or not at all? I'm not excited about back-patching it... regards, tom lane
Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Okay, here's a patch implementing this idea. It seems to work quite > > well, and it solves the problem in a limited testing scenario -- I > > haven't yet tested on the customer machines. > > This seems mostly sane, except I think you have not considered the > issue of when to clear the smgr_transient flag on an existing > SMgrRelation: if it starts getting used for "normal" accesses after > having by chance been used for a blind write, we don't want the > transient marking to persist. That's why I suggested having smgropen > always clear it. > > Likewise, I think the FD_XACT_TRANSIENT flag on a VFD needs to go away > at some point, probably once it's actually been closed at EOXACT, though > there's doubtless more than one way to handle that. Aha, I see -- makes sense. Here's an updated patch. > > This customer is running on 8.4 so I started from there; should I > > backpatch this to 8.2, or not at all? > > I'm not excited about back-patching it... Bummer. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> This customer is running on 8.4 so I started from there; should I >>> backpatch this to 8.2, or not at all? >> I'm not excited about back-patching it... > Bummer. Well, of course mine is only one opinion; anybody else feel this *is* worth risking a back-patch for? My thought is that it needs some beta testing. Perhaps it'd be sane to push it into beta2 now, and then back-patch sometime after 9.1 final, if no problems pop up. regards, tom lane
On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011: >>> Alvaro Herrera <alvherre@commandprompt.com> writes: >>>> This customer is running on 8.4 so I started from there; should I >>>> backpatch this to 8.2, or not at all? > >>> I'm not excited about back-patching it... > >> Bummer. > > Well, of course mine is only one opinion; anybody else feel this *is* > worth risking a back-patch for? > > My thought is that it needs some beta testing. Perhaps it'd be sane to > push it into beta2 now, and then back-patch sometime after 9.1 final, > if no problems pop up. I think it'd be sensible to back-patch it. I'm not sure whether now or later. It's a bug fix that is biting real users in the field, so it seems like we ought to do something about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My thought is that it needs some beta testing. �Perhaps it'd be sane to >> push it into beta2 now, and then back-patch sometime after 9.1 final, >> if no problems pop up. > I think it'd be sensible to back-patch it. I'm not sure whether now > or later. It's a bug fix that is biting real users in the field, so > it seems like we ought to do something about it. I don't deny it's a bug fix; I'm just dubious about the risk-reward ratio. As to risk: the patch isn't trivial (notice Alvaro didn't get it right the first time). As to reward: it's been like that since forever, so if the problem were really serious, we'd have identified it before. Letting it age a bit during beta would do a lot to damp down the risk side of the equation. regards, tom lane
On Thu, Jun 9, 2011 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My thought is that it needs some beta testing. Perhaps it'd be sane to >>> push it into beta2 now, and then back-patch sometime after 9.1 final, >>> if no problems pop up. > >> I think it'd be sensible to back-patch it. I'm not sure whether now >> or later. It's a bug fix that is biting real users in the field, so >> it seems like we ought to do something about it. > > I don't deny it's a bug fix; I'm just dubious about the risk-reward > ratio. As to risk: the patch isn't trivial (notice Alvaro didn't get it > right the first time). As to reward: it's been like that since forever, > so if the problem were really serious, we'd have identified it before. > > Letting it age a bit during beta would do a lot to damp down the risk > side of the equation. OK by me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Tom Lane's message of jue jun 09 14:45:31 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011: > >> Alvaro Herrera <alvherre@commandprompt.com> writes: > >>> This customer is running on 8.4 so I started from there; should I > >>> backpatch this to 8.2, or not at all? > > >> I'm not excited about back-patching it... > > > Bummer. > > Well, of course mine is only one opinion; anybody else feel this *is* > worth risking a back-patch for? > > My thought is that it needs some beta testing. Perhaps it'd be sane to > push it into beta2 now, and then back-patch sometime after 9.1 final, > if no problems pop up. FWIW I was about to push it but noticed that regression tests fail with this: TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "/pgsql/source/HEAD/src/backend/access/index/indexam.c",Line: 283) I just make distclean'd -- still there. I'm gonna revert my patch and retry. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Alvaro Herrera's message of jue jun 09 16:02:14 -0400 2011: > Excerpts from Tom Lane's message of jue jun 09 14:45:31 -0400 2011: > > My thought is that it needs some beta testing. Perhaps it'd be sane to > > push it into beta2 now, and then back-patch sometime after 9.1 final, > > if no problems pop up. > FWIW I was about to push it but noticed that regression tests fail with > this: > > TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "/pgsql/source/HEAD/src/backend/access/index/indexam.c",Line: 283) > > I just make distclean'd -- still there. I'm gonna revert my patch and > retry. That was pretty weird. I had rm'd the build tree and rebuilt, failure still there. I then did a git reset FETCH_HEAD, recompiled, and the problem was gone. git reset to my revision, failed. Then git clean -dfx, nuked the build tree, redid the whole thing from scratch, no problem. I guess there was a mismatch on the files that we build in the source tree. I have pushed it now. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > That was pretty weird. I had rm'd the build tree and rebuilt, failure > still there. I then did a git reset FETCH_HEAD, recompiled, and the > problem was gone. git reset to my revision, failed. Then git clean > -dfx, nuked the build tree, redid the whole thing from scratch, no > problem. I guess there was a mismatch on the files that we build in the > source tree. git bug? I'd expect exactly that failure if you had the test changes but not the source fixes from commit dccfb72892acabd25568539ec882cc44c57c25bd. regards, tom lane
Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011: > I have pushed it now. ... and it caused a failure on the buildfarm, so I panicked and reverted it. I think the patch below fixes it. Let me know if you think I should push the whole thing again. *** a/src/backend/storage/file/fd.c --- b/src/backend/storage/file/fd.c *************** *** 1045,1070 **** FileSetTransient(File file) } /* - * Close a file at the kernel level, but keep the VFD open - */ - static void - FileKernelClose(File file) - { - Vfd *vfdP; - - Assert(FileIsValid(file)); - - vfdP = &VfdCache[file]; - - if (!FileIsNotOpen(file)) - { - if (close(vfdP->fd)) - elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName); - vfdP->fd = VFD_CLOSED; - } - } - - /* * close a file when done with it */ void --- 1045,1050 ---- *************** *** 1892,1903 **** CleanupTempFiles(bool isProcExit) else if (fdstate & FD_XACT_TRANSIENT) { /* ! * Close the kernel file descriptor, but also remove the ! * flag from the VFD. This is to ensure that if the VFD is ! * reused in the future for non-transient access, we don't ! * close it inappropriately then. */ ! FileKernelClose(i); VfdCache[i].fdstate &= ~FD_XACT_TRANSIENT; } } --- 1872,1884 ---- else if (fdstate & FD_XACT_TRANSIENT) { /* ! * Close the FD, and remove the entry from the LRU ring, ! * but also remove the flag from the VFD. This is to ! * ensure that if the VFD is reused in the future for ! * non-transient access, we don't close it inappropriately ! * then. */ ! LruDelete(i); VfdCache[i].fdstate &= ~FD_XACT_TRANSIENT; } } -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011: >> I have pushed it now. > ... and it caused a failure on the buildfarm, so I panicked and reverted > it. I think the patch below fixes it. Actually, I think what you want is what closeAllVfds does, which suggests that you need a FileIsNotOpen test too. > Let me know if you think I should push the whole thing again. Yesterday I would have said okay, but now we're too close to the beta2 wrap deadline. Please hold off until beta2 is out. regards, tom lane
Excerpts from Tom Lane's message of jue jun 09 17:10:10 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011: > >> I have pushed it now. > > > ... and it caused a failure on the buildfarm, so I panicked and reverted > > it. I think the patch below fixes it. > > Actually, I think what you want is what closeAllVfds does, which > suggests that you need a FileIsNotOpen test too. Yeah, I noticed that my proposed change doesn't solve the problem. > > Let me know if you think I should push the whole thing again. > > Yesterday I would have said okay, but now we're too close to the beta2 > wrap deadline. Please hold off until beta2 is out. Will do. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support