Thread: Always truncate segments before unlink
I have a report from an user that postgres server gave up REINDEX commands on the almost-disk-full machine. The disk spaces were filled with old index segments, that should be replaced with re-constructed files made by the REINDEX. In mdunlink(), we truncate the first main fork to zero length and actually unlink at the next checkpoint, but other segments are not truncated and only unlinked. Then, if another backend open the segments, disk spaces occupied by them are not reclaimed until all of the backends close their file descriptors. Longer checkpoint timeout and connection pooling make things worse. I'd like to suggest that we always truncate any segments before unlink them. The truncate-and-unlink hack seems to be developed to avoid reuse of relfilenode: | Leaving the empty file in place prevents that relfilenode | number from being reused. but is also useful to release disk spaces in the early stages. Am I missing something? Comments welcome. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > In mdunlink(), we truncate the first main fork to zero length > and actually unlink at the next checkpoint, but other segments > are not truncated and only unlinked. Then, if another backend > open the segments, disk spaces occupied by them are not reclaimed > until all of the backends close their file descriptors. Longer > checkpoint timeout and connection pooling make things worse. Truncating seems like an ugly kluge that's not fixing the real problem. Why are there open descriptors for a dropped relation? They should all get closed as a consequence of relcache flush. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Truncating seems like an ugly kluge that's not fixing the real problem. > Why are there open descriptors for a dropped relation? They should all > get closed as a consequence of relcache flush. Relcache will be flushed at the next command, but there could be some *idle backends* kept by connection pooling. They won't close dropped files until shared cache invalidation queue are almost filled, that might take long time. There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT signal not only on the threshold of queue length but also on timeout, where the signal is sent when we have some old messages in the queue longer than 30sec - 1min. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Tue, Jul 6, 2010 at 9:59 AM, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Truncating seems like an ugly kluge that's not fixing the real problem. >> Why are there open descriptors for a dropped relation? They should all >> get closed as a consequence of relcache flush. > > Relcache will be flushed at the next command, but there could be some > *idle backends* kept by connection pooling. They won't close dropped files > until shared cache invalidation queue are almost filled, that might take > long time. Right. Since many connection poolers use LIFO method to manage the pooled connections, this problem is very likely to happen. > There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT > signal not only on the threshold of queue length but also on timeout, > where the signal is sent when we have some old messages in the queue > longer than 30sec - 1min. REINDEX or something should not send PROCSIG_CATCHUP_INTERRUPT immediately? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center