Thread: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16663 Logged by: Denis Patron Email address: denis.patron@previnet.it PostgreSQL version: 11.9 Operating system: CentOS 7 Description: I have an index, which at the file system level, is made up of multiple segments (file: <id>.1, <id>.2 ecc). When I DROP INDEX, the index is dropped in Postgresql but at the file system level, the segments are marked as "deleted". if I check with the lsof command, I see that the segments are in use from an idle connection. This does not happen if the index is formed by only one segment (in my case <1Gb). How can I prevent this? thanks
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Kyotaro Horiguchi
Date:
This is not a bug. At Fri, 09 Oct 2020 13:24:15 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > The following bug has been logged on the website: > > Bug reference: 16663 > Logged by: Denis Patron > Email address: denis.patron@previnet.it > PostgreSQL version: 11.9 > Operating system: CentOS 7 > Description: > > I have an index, which at the file system level, is made up of multiple > segments (file: <id>.1, <id>.2 ecc). When I DROP INDEX, the index is dropped > in Postgresql but at the file system level, the segments are marked as > "deleted". if I check with the lsof command, I see that the segments are in > use from an idle connection. This does not happen if the index is formed by > only one segment (in my case <1Gb). How can I prevent this? > thanks That references to deleted files will dissapear at the beginning of the next transaction. At the time a relation including an index is dropped, the first segment file (named as "<id>" without a suffix number) is left behind so the file is not shown as "(deleted)" in lsof output. The next checkpoint removes the first segment. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Andres Freund
Date:
Hi, On 2020-10-14 12:05:10 +0900, Kyotaro Horiguchi wrote: > This is not a bug. > > At Fri, 09 Oct 2020 13:24:15 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > > The following bug has been logged on the website: > > > > Bug reference: 16663 > > Logged by: Denis Patron > > Email address: denis.patron@previnet.it > > PostgreSQL version: 11.9 > > Operating system: CentOS 7 > > Description: > > > > I have an index, which at the file system level, is made up of multiple > > segments (file: <id>.1, <id>.2 ecc). When I DROP INDEX, the index is dropped > > in Postgresql but at the file system level, the segments are marked as > > "deleted". if I check with the lsof command, I see that the segments are in > > use from an idle connection. This does not happen if the index is formed by > > only one segment (in my case <1Gb). How can I prevent this? > > thanks > > That references to deleted files will dissapear at the beginning of > the next transaction. > > At the time a relation including an index is dropped, the first > segment file (named as "<id>" without a suffix number) is left behind > so the file is not shown as "(deleted)" in lsof output. I think we should consider either occasionally sending a sinval catchup interrupt to backends that have been idle for a while, or to use a timer that we use to limit the maximum time until we process sinvals. Just having to wait till all backends become busy and process sinval events doesn't really seem like good approach to me. Regards, Andres
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
"denis.patron"
Date:
Andres Freund wrote > Hi, > > On 2020-10-14 12:05:10 +0900, Kyotaro Horiguchi wrote: >> This is not a bug. >> >> At Fri, 09 Oct 2020 13:24:15 +0000, PG Bug reporting form < > noreply@ > > wrote in >> > The following bug has been logged on the website: >> > >> > Bug reference: 16663 >> > Logged by: Denis Patron >> > Email address: > denis.patron@ >> > PostgreSQL version: 11.9 >> > Operating system: CentOS 7 >> > Description: >> > >> > I have an index, which at the file system level, is made up of multiple >> > segments (file: > <id> > .1, > <id> > .2 ecc). When I DROP INDEX, the index is dropped >> > in Postgresql but at the file system level, the segments are marked as >> > "deleted". if I check with the lsof command, I see that the segments >> are in >> > use from an idle connection. This does not happen if the index is >> formed by >> > only one segment (in my case <1Gb). How can I prevent this? >> > thanks >> >> That references to deleted files will dissapear at the beginning of >> the next transaction. >> >> At the time a relation including an index is dropped, the first >> segment file (named as " > <id> > " without a suffix number) is left behind >> so the file is not shown as "(deleted)" in lsof output. > > I think we should consider either occasionally sending a sinval catchup > interrupt to backends that have been idle for a while, or to use a timer > that we use to limit the maximum time until we process sinvals. Just > having to wait till all backends become busy and process sinval events > doesn't really seem like good approach to me. > > Regards, > > Andres thanks for replying. the problem is that I have a very large database, with indexes of up to 70 Gb. while I redo the indexes in concurrently mode, if an idle transaction is using the index in question, the segment file (<id> _1 <id> _2 etc) of the index remains in the filesystem (marked as deleted) as long as the idle connection that it is blocking it does not make another transaction. this means that I can have hundreds of GB of space occupied by files marked "deleted", and this for hours. the risk is to run out of free space -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Thomas Munro
Date:
On Wed, Oct 14, 2020 at 5:35 PM Andres Freund <andres@anarazel.de> wrote: > On 2020-10-14 12:05:10 +0900, Kyotaro Horiguchi wrote: > > At the time a relation including an index is dropped, the first > > segment file (named as "<id>" without a suffix number) is left behind > > so the file is not shown as "(deleted)" in lsof output. > > I think we should consider either occasionally sending a sinval catchup > interrupt to backends that have been idle for a while, or to use a timer > that we use to limit the maximum time until we process sinvals. Just > having to wait till all backends become busy and process sinval events > doesn't really seem like good approach to me. Oops, I also replied to this but now I see that I accidentally replied only to Horiguchi-san and not the list! I was thinking that we should perhaps consider truncating the files to give back the disk space (as we do for the first segment), so that it doesn't matter so much how long other backends take to process SHAREDINVALSMGR_ID, close their descriptors and release the inode.
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Oct 14, 2020 at 5:35 PM Andres Freund <andres@anarazel.de> wrote: >> I think we should consider either occasionally sending a sinval catchup >> interrupt to backends that have been idle for a while, or to use a timer >> that we use to limit the maximum time until we process sinvals. Just >> having to wait till all backends become busy and process sinval events >> doesn't really seem like good approach to me. > Oops, I also replied to this but now I see that I accidentally replied > only to Horiguchi-san and not the list! I was thinking that we should > perhaps consider truncating the files to give back the disk space (as > we do for the first segment), so that it doesn't matter so much how > long other backends take to process SHAREDINVALSMGR_ID, close their > descriptors and release the inode. +1, I was also thinking that. It'd be pretty easy to fit into the existing system structure (I think, without having looked at the relevant code lately), and it would not add any overhead to normal processing. Installing a timeout to handle this per Andres' idea inevitably *would* add overhead. regards, tom lane
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Thomas Munro
Date:
On Thu, Oct 15, 2020 at 8:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Wed, Oct 14, 2020 at 5:35 PM Andres Freund <andres@anarazel.de> wrote: > >> I think we should consider either occasionally sending a sinval catchup > >> interrupt to backends that have been idle for a while, or to use a timer > >> that we use to limit the maximum time until we process sinvals. Just > >> having to wait till all backends become busy and process sinval events > >> doesn't really seem like good approach to me. > > > Oops, I also replied to this but now I see that I accidentally replied > > only to Horiguchi-san and not the list! I was thinking that we should > > perhaps consider truncating the files to give back the disk space (as > > we do for the first segment), so that it doesn't matter so much how > > long other backends take to process SHAREDINVALSMGR_ID, close their > > descriptors and release the inode. > > +1, I was also thinking that. It'd be pretty easy to fit into the > existing system structure (I think, without having looked at the relevant > code lately), and it would not add any overhead to normal processing. > Installing a timeout to handle this per Andres' idea inevitably *would* > add overhead. Alright, here is a first swing at making our behaviour more consistent in two ways: 1. The first segment should be truncated even in recovery. 2. Later segments should be truncated on commit. I don't know why the existing coding decides not to try to unlink the later segments if the truncate of segment 0 failed. We already committed, we should plough on.
Attachment
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Kyotaro Horiguchi
Date:
Ouch. You beat me to it. At Thu, 15 Oct 2020 14:26:36 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in > On Thu, Oct 15, 2020 at 8:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > On Wed, Oct 14, 2020 at 5:35 PM Andres Freund <andres@anarazel.de> wrote: > > >> I think we should consider either occasionally sending a sinval catchup > > >> interrupt to backends that have been idle for a while, or to use a timer > > >> that we use to limit the maximum time until we process sinvals. Just > > >> having to wait till all backends become busy and process sinval events > > >> doesn't really seem like good approach to me. > > > > > Oops, I also replied to this but now I see that I accidentally replied > > > only to Horiguchi-san and not the list! I was thinking that we should > > > perhaps consider truncating the files to give back the disk space (as > > > we do for the first segment), so that it doesn't matter so much how > > > long other backends take to process SHAREDINVALSMGR_ID, close their > > > descriptors and release the inode. > > > > +1, I was also thinking that. It'd be pretty easy to fit into the > > existing system structure (I think, without having looked at the relevant > > code lately), and it would not add any overhead to normal processing. > > Installing a timeout to handle this per Andres' idea inevitably *would* > > add overhead. > > Alright, here is a first swing at making our behaviour more consistent > in two ways: > > 1. The first segment should be truncated even in recovery. > 2. Later segments should be truncated on commit. > > I don't know why the existing coding decides not to try to unlink the > later segments if the truncate of segment 0 failed. We already > committed, we should plough on. I was trying the almost the same thing except how to emit the error message for truncation and not trying to unlink if truncation ends with ENOENT for following segments. regareds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
"denni.pat"
Date:
thanks for the patch. Do you think it can be included in the next minor releases or the only solution will be to recompile? regards Denis -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Thomas Munro
Date:
On Thu, Oct 15, 2020 at 8:20 PM denni.pat <denni.pat@gmail.com> wrote: > thanks for the patch. > Do you think it can be included in the next minor releases or the only > solution will be to recompile? I would vote +1 for back-patching a fix for this problem (that is, pushing it into the minor releases), because I agree that it's very arguably a bug that we treat the segments differently, and looking around I do see reports of people having to terminate processes to get their disk space back. I'd definitely want a consensus on that plan from some experienced reviewers and testers, though. For anyone wanting to test this, you might want to set RELSEGSIZE to a smaller number in src/include/pg_config.h.
Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
From
Pavel Borisov
Date:
Thomas,
I get into the patch and I think it's worth being committed and backpatched.
-- BTW I noticed that sometimes the same comparisons are done twice, and I made a very minor refactor of the code. PFA v2 of a patch if you don't mind.
As for the question on what to do with the additional segments if the first one failed to be truncated, I don't consider myself experienced enough and surely someone else's independent opinion is very much welcome.