Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: .ready and .done files considered harmful
Date
Msg-id CAE7DCDB-80E9-454E-A825-CB62496FB652@amazon.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: .ready and .done files considered harmful  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 9/7/21, 1:42 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> I was thinking that the multple-files approch would work efficiently
> but the the patch still runs directory scans every 64 files.  As
> Robert mentioned it is still O(N^2).  I'm not sure the reason for the
> limit, but if it were to lower memory consumption or the cost to sort,
> we can resolve that issue by taking trying-the-next approach ignoring
> the case of having many gaps (discussed below). If it were to cause
> voluntary checking of out-of-order files, almost the same can be
> achieved by running directory scans every 64 files in the
> trying-the-next approach (and we would suffer O(N^2) again).  On the
> other hand, if archiving is delayed by several segments, the
> multiple-files method might reduce the cost to scan the status
> directory but it won't matter since the directory contains only
> several files.  (I think that it might be better that we don't go to
> trying-the-next path if we found only several files by running a
> directory scan.)  The multiple-files approach reduces the number of
> directory scans if there were many gaps in the WAL file
> sequence. Alghouth theoretically the last max_backend(+alpha?)
> segemnts could be written out-of-order, but I suppose we only have
> gaps only among the several latest files in reality. I'm not sure,
> though..
>
> In short, the trying-the-next approach seems to me to be the way to
> go, for the reason that it is simpler but it can cover the possible
> failures by almost the same measures with the muliple-files approach.

Thanks for chiming in.  The limit of 64 in the multiple-files-per-
directory-scan approach was mostly arbitrary.  My earlier testing [0]
with different limits didn't reveal any significant difference, but
using a higher limit might yield a small improvement when there are
several hundred thousand .ready files.  IMO increasing the limit isn't
really worth it for this approach.  For 500,000 .ready files,
ordinarily you'd need 500,000 directory scans.  When 64 files are
archived for each directory scan, you need ~8,000 directory scans.
With 128 files per directory scan, you need ~4,000.  With 256, you
need ~2000.  The difference between 8,000 directory scans and 500,000
is quite significant.  The difference between 2,000 and 8,000 isn't
nearly as significant in comparison.

Nathan

[0] https://www.postgresql.org/message-id/3ECC212F-88FD-4FB2-BAF1-C2DD1563E310%40amazon.com


pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: Estimating HugePages Requirements?
Next
From: Robert Haas
Date:
Subject: Re: .ready and .done files considered harmful