Thread: Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

I verified the patch "v2-0001-Free-disk-space-for-dropped-relations-on-commit.patch" on master branch
"0cc99327888840f2bf572303b68438e4caf62de9".It works for me. Below is my test procedure and results.
 

=== Before the patch ===
#1 from psql console 1, create table and index then insert enough data
postgres=# CREATE TABLE test_tbl ( a int, b text);
postgres=# CREATE INDEX idx_test_tbl on test_tbl (a);
postgres=# INSERT INTO test_tbl SELECT generate_series(1,80000000),'Hello world!';
postgres=# INSERT INTO test_tbl SELECT generate_series(1,80000000),'Hello world!';

#2 check files size 
david:12867$ du -h
12G    .

#3 from psql console 2, drop the index
postgres=# drop index idx_test_tbl;

#4 check files size in different ways,
david:12867$ du -h
7.8G    .
david:12867$ ls -l
...
-rw------- 1 david david          0 Nov 23 20:07 16402
...

$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  25736                  david   45u      REG              259,2          0   12592758
/home/david/sandbox/postgres/pgdata/base/12867/16402(deleted)
 
postgres  25736                  david   49u      REG              259,2 1073741824   12592798
/home/david/sandbox/postgres/pgdata/base/12867/16402.1(deleted)
 
postgres  25736                  david   53u      REG              259,2 1073741824   12592739
/home/david/sandbox/postgres/pgdata/base/12867/16402.2(deleted)
 
postgres  25736                  david   59u      REG              259,2  372604928   12592800
/home/david/sandbox/postgres/pgdata/base/12867/16402.3(deleted)
 
...

The index relnode id "16402" displays size "0" from postgres database folder, but when using lsof to check, all 16402.x
arestill in used by a psql connection except 16402 is set to 0. Check it again after an hour, lsof shows the same
results.

=== After the patch ===
Repeat step 1 ~ 4, lsof shows all the index relnode files (in this case, the index relnode id 16389) are removed within
about1 minute.
 
$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  32707                  david   66u      REG              259,2          0   12592763
/home/david/sandbox/postgres/pgdata/base/12867/16389.1(deleted)
 
postgres  32707                  david   70u      REG              259,2          0   12592823
/home/david/sandbox/postgres/pgdata/base/12867/16389.2(deleted)
 
postgres  32707                  david   74u      REG              259,2          0   12592805
/home/david/sandbox/postgres/pgdata/base/12867/16389.3(deleted)
 
...

One thing interesting for me is that, if the index is created after data records has been inserted, then lsof doesn't
showthis issue. 
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Given we got two other reviews from Neil and David, I think I can finalize my own review and mark the patch as ready
forcommitter if nobody has objections.
 
Thank you!

Pavel Borisov

The new status of this patch is: Ready for Committer

On Wed, Nov 25, 2020 at 8:00 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> The new status of this patch is: Ready for Committer

Thanks!  One small thing bothered me about the last version of the
patch.  It tried to unlink when ENOENT had already been enountered by
open(2), so COMMIT of a DROP looks like:

openat(AT_FDCWD, "base/14208/16384", O_RDWR) = 54
ftruncate(54, 0)                        = 0
close(54)                               = 0
openat(AT_FDCWD, "base/14208/16384.1", O_RDWR) = -1 ENOENT
unlink("base/14208/16384.1")            = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_fsm", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_fsm")          = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_vm", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_vm")           = -1 ENOENT
openat(AT_FDCWD, "base/14208/16384_init", O_RDWR) = -1 ENOENT
unlink("base/14208/16384_init")         = -1 ENOENT

So I fixed that, by adding a return value to do_truncate() and
checking it.  That's the version I plan to commit tomorrow, unless
there are further comments or objections.  I've also attached a
version suitable for REL_11_STABLE and earlier branches (with a name
that cfbot should ignore), where things are slightly different.  In
those branches, the register_forget_request() logic is elsewhere.

While looking at trace output, I figured we should just use
truncate(2) on non-Windows, on the master branch only.  It's not like
it really makes much difference, but I don't see why we shouldn't
allow ourselves to use ancient standardised Unix syscalls when we can.
That'd get us down to just the following when committing a DROP:

truncate("base/14208/16396", 0)         = 0
truncate("base/14208/16396.1", 0)       = -1 ENOENT
truncate("base/14208/16396_fsm", 0)     = -1 ENOENT
truncate("base/14208/16396_vm", 0)      = -1 ENOENT
truncate("base/14208/16396_init", 0)    = -1 ENOENT

Attachment
On Mon, Nov 30, 2020 at 6:59 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Nov 25, 2020 at 8:00 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > The new status of this patch is: Ready for Committer

> ...  That's the version I plan to commit tomorrow, unless
> there are further comments or objections.  ...

Done, and back-patched.

I thought a bit more about the fact that we fail to unlink
higher-numbered segments in certain error cases, potentially leaving
stray files behind.  As far as I can see, nothing we do in this
code-path is going to be a bullet-proof solution to that problem.  One
simple idea would be for the checkpointer to refuse to unlink segment
0 (thereby allowing the relfilenode to be recycled) until it has
scanned the parent directory for any related files that shouldn't be
there.

> While looking at trace output, I figured we should just use
> truncate(2) on non-Windows, on the master branch only.  It's not like
> it really makes much difference, but I don't see why we shouldn't
> allow ourselves to use ancient standardised Unix syscalls when we can.

Also pushed, but only to master.



On Mon, Nov 30, 2020 at 06:59:40PM +1300, Thomas Munro wrote:
> So I fixed that, by adding a return value to do_truncate() and
> checking it.  That's the version I plan to commit tomorrow, unless
> there are further comments or objections.  I've also attached a
> version suitable for REL_11_STABLE and earlier branches (with a name
> that cfbot should ignore), where things are slightly different.  In
> those branches, the register_forget_request() logic is elsewhere.

Hmm.  Sorry for arriving late at the party.  But is that really
something suitable for a backpatch?  Sure, it is not optimal to not
truncate all the segments when a transaction dropping a relation
commits, but this was not completely broken either.
--
Michael

Attachment
On Tue, Dec 1, 2020 at 3:55 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Nov 30, 2020 at 06:59:40PM +1300, Thomas Munro wrote:
> > So I fixed that, by adding a return value to do_truncate() and
> > checking it.  That's the version I plan to commit tomorrow, unless
> > there are further comments or objections.  I've also attached a
> > version suitable for REL_11_STABLE and earlier branches (with a name
> > that cfbot should ignore), where things are slightly different.  In
> > those branches, the register_forget_request() logic is elsewhere.
>
> Hmm.  Sorry for arriving late at the party.  But is that really
> something suitable for a backpatch?  Sure, it is not optimal to not
> truncate all the segments when a transaction dropping a relation
> commits, but this was not completely broken either.

I felt on balance it was a "bug", since it causes operational
difficulties for people and was clearly not our intended behaviour,
and I announced this intention 6 weeks ago.  Of course I'll be happy
to revert it from the back-branches if that's the consensus.  Any
other opinions?



On Tue, Dec 01, 2020 at 04:06:48PM +1300, Thomas Munro wrote:
> I felt on balance it was a "bug", since it causes operational
> difficulties for people and was clearly not our intended behaviour,
> and I announced this intention 6 weeks ago.

Oops, sorry for missing this discussion for such a long time :/

> Of course I'll be happy to revert it from the back-branches if
> that's the consensus.  Any > other opinions?

If there are no other opinions, I am also fine to rely on your
judgment.
--
Michael

Attachment