Thread: Always truncate segments before unlink

Always truncate segments before unlink

From

Takahiro Itagaki

Date:

05 July 2010, 02:23:13

I have a report from an user that postgres server gave up REINDEX
commands on the almost-disk-full machine. The disk spaces were
filled with old index segments, that should be replaced with
re-constructed files made by the REINDEX.

In mdunlink(), we truncate the first main fork to zero length
and actually unlink at the next checkpoint, but other segments
are not truncated and only unlinked. Then, if another backend
open the segments, disk spaces occupied by them are not reclaimed
until all of the backends close their file descriptors. Longer
checkpoint timeout and connection pooling make things worse.

I'd like to suggest that we always truncate any segments before
unlink them. The truncate-and-unlink hack seems to be developed
to avoid reuse of relfilenode:
| Leaving the empty file in place prevents that relfilenode
| number from being reused.
but is also useful to release disk spaces in the early stages.

Am I missing something? Comments welcome.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Always truncate segments before unlink

From

Tom Lane

Date:

05 July 2010, 14:34:00

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> In mdunlink(), we truncate the first main fork to zero length
> and actually unlink at the next checkpoint, but other segments
> are not truncated and only unlinked. Then, if another backend
> open the segments, disk spaces occupied by them are not reclaimed
> until all of the backends close their file descriptors. Longer
> checkpoint timeout and connection pooling make things worse.

Truncating seems like an ugly kluge that's not fixing the real problem.
Why are there open descriptors for a dropped relation?  They should all
get closed as a consequence of relcache flush.
        regards, tom lane

Re: Always truncate segments before unlink

From

Takahiro Itagaki

Date:

06 July 2010, 00:58:50

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Truncating seems like an ugly kluge that's not fixing the real problem.
> Why are there open descriptors for a dropped relation?  They should all
> get closed as a consequence of relcache flush.

Relcache will be flushed at the next command, but there could be some
*idle backends* kept by connection pooling. They won't close dropped files
until shared cache invalidation queue are almost filled, that might take
long time.

There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT
signal not only on the threshold of queue length but also on timeout,
where the signal is sent when we have some old messages in the queue
longer than 30sec - 1min.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Always truncate segments before unlink

From

Fujii Masao

Date:

16 July 2010, 08:34:35

On Tue, Jul 6, 2010 at 9:59 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Truncating seems like an ugly kluge that's not fixing the real problem.
>> Why are there open descriptors for a dropped relation?  They should all
>> get closed as a consequence of relcache flush.
>
> Relcache will be flushed at the next command, but there could be some
> *idle backends* kept by connection pooling. They won't close dropped files
> until shared cache invalidation queue are almost filled, that might take
> long time.

Right. Since many connection poolers use LIFO method to manage the pooled
connections, this problem is very likely to happen.

> There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT
> signal not only on the threshold of queue length but also on timeout,
> where the signal is sent when we have some old messages in the queue
> longer than 30sec - 1min.

REINDEX or something should not send PROCSIG_CATCHUP_INTERRUPT immediately?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center