Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Robert Haas
Subject Re: .ready and .done files considered harmful
Date
Msg-id CA+TgmoYc84KaS=ieRM_AdpU+O_7DnbofXOiKVZ0+TFojFPNThg@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
On Tue, Sep 7, 2021 at 1:28 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Thanks for chiming in.  The limit of 64 in the multiple-files-per-
> directory-scan approach was mostly arbitrary.  My earlier testing [0]
> with different limits didn't reveal any significant difference, but
> using a higher limit might yield a small improvement when there are
> several hundred thousand .ready files.  IMO increasing the limit isn't
> really worth it for this approach.  For 500,000 .ready files,
> ordinarily you'd need 500,000 directory scans.  When 64 files are
> archived for each directory scan, you need ~8,000 directory scans.
> With 128 files per directory scan, you need ~4,000.  With 256, you
> need ~2000.  The difference between 8,000 directory scans and 500,000
> is quite significant.  The difference between 2,000 and 8,000 isn't
> nearly as significant in comparison.

That's certainly true.

I guess what I don't understand about the multiple-files-per-dirctory
scan implementation is what happens when something happens that would
require the keep-trying-the-next-file approach to perform a forced
scan. It seems to me that you still need to force an immediate full
scan, because if the idea is that you want to, say, prioritize
archiving of new timeline files over any others, a cached list of
files that you should archive next doesn't accomplish that, just like
keeping on trying the next file in sequence doesn't accomplish that.

So I'm wondering if in the end the two approaches converge somewhat,
so that with either patch you get (1) some kind of optimization to
scan the directory less often, plus (2) some kind of notification
mechanism to tell you when you need to avoid applying that
optimization. If you wanted to, (1) could even include both batching
and then, when the batch is exhausted, trying files in sequence. I'm
not saying that's the way to go, but you could. In the end, it seems
less important that we do any particular thing here and more important
that we do something - but if prioritizing timeline history files is
important, then we have to preserve that behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: .ready and .done files considered harmful
Next
From: Robert Haas
Date:
Subject: Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?