On Tue, Aug 24, 2021 at 1:26 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> I think Horiguchi-san made a good point that the .ready file creators
> should ideally not need to understand archiving details. However, I
> think this approach requires them to be inextricably linked. In the
> happy case, the archiver will follow the simple path of processing
> each consecutive WAL file without incurring a directory scan. Any
> time there is something other than a regular WAL file to archive, we
> need to take special action to make sure it is picked up.
I think they should be inextricably linked, really. If we know
something - like that there's a file ready to be archived - then it
seems like we should not throw that information away and force
somebody else to rediscover it through an expensive process. The whole
problem here comes from the fact that we're using the filesystem as an
IPC mechanism, and it's sometimes a very inefficient one.
I can't quite decide whether the problems we're worrying about here
are real issues or just kind of hypothetical. I mean, today, it seems
to be possible that we fail to mark some file ready for archiving,
emit a log message, and then a huge amount of time could go by before
we try again to mark it ready for archiving. Are the problems we're
talking about here objectively worse than that, or just different? Is
it a problem in practice, or just in theory?
I really want to avoid getting backed into a corner where we decide
that the status quo is the best we can do, because I'm pretty sure
that has to be the wrong conclusion. If we think that
get-a-bunch-of-files-per-readdir approach is better than the
keep-trying-the-next-file approach, I mean that's OK with me; I just
want to do something about this. I am not sure whether or not that's
the right course of action.
--
Robert Haas
EDB: http://www.enterprisedb.com