Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: .ready and .done files considered harmful
Date
Msg-id CAFiTN-tR+3+GjP0Qeys8jwh=jz9VpP2ibhT9ubFDLmgNb1QtMg@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: .ready and .done files considered harmful
List pgsql-hackers
  iOn Tue, May 4, 2021 at 7:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 4, 2021 at 12:27 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2021-05-03 16:49:16 -0400, Robert Haas wrote:
> > > I have two possible ideas for addressing this; perhaps other people
> > > will have further suggestions. A relatively non-invasive fix would be
> > > to teach pgarch.c how to increment a WAL file name. After archiving
> > > segment N, check using stat() whether there's an .ready file for
> > > segment N+1. If so, do that one next. If not, then fall back to
> > > performing a full directory scan.
> >
> > Hm. I wonder if it'd not be better to determine multiple files to be
> > archived in one readdir() pass?
>
> I think both methods have some merit. If we had a way to pass a range
> of files to archive_command instead of just one, then your way is
> distinctly better, and perhaps we should just go ahead and invent such
> a thing. If not, your way doesn't entirely solve the O(n^2) problem,
> since you have to choose some upper bound on the number of file names
> you're willing to buffer in memory, but it may lower it enough that it
> makes no practical difference. I am somewhat inclined to think that it
> would be good to start with the method I'm proposing, since it is a
> clear-cut improvement over what we have today and can be done with a
> relatively limited amount of code change and no redesign, and then
> perhaps do something more ambitious afterward.

I agree that if we continue to archive one file using the archive
command then Robert's solution of checking the existence of the next
WAL segment (N+1) has an advantage.  But, currently, if you notice
pgarch_readyXlog always consider any history file as the oldest file
but that will not be true if we try to predict the next WAL segment
name.  For example, if we have archived 000000010000000000000004 then
next we will look for 000000010000000000000005 but after generating
segment 000000010000000000000005, if there is a timeline switch then
we will have the below files in the archive status
(000000010000000000000005.ready, 00000002.history file).  Now, the
existing archiver will archive 00000002.history first whereas our code
will archive 000000010000000000000005 first.  Said that I don't see
any problem with that because before archiving any segment file from
TL 2 we will definitely archive the 00000002.history file because we
will not find the 000000010000000000000006.ready and we will scan the
full directory and now we will find 00000002.history as oldest file.

>
> > > However, that's still pretty wasteful. Every time we have to wait for
> > > the next file to be ready for archiving, we'll basically fall back to
> > > repeatedly scanning the whole directory, waiting for it to show up.

Is this true? that only when we have to wait for the next file to be
ready we got for scanning?  If I read the code in
"pgarch_ArchiverCopyLoop", for every single file to achieve it is
calling "pgarch_readyXlog", wherein it scans the directory every time.
So I did not understand your point that only when it needs to wait for
the next .ready file it need to scan the full directory.  It appeared
it always scans the full directory after archiving each WAL segment.
What am I missing?

> > Hm. That seems like it's only an issue because .done and .ready are in
> > the same directory? Otherwise the directory would be empty while we're
> > waiting for the next file to be ready to be archived.
>
> I think that's right.

If we agree with your above point that it only needs to scan the full
directory when it has to wait for the next file to be ready then
making a separate directory for .done file can improve a lot because
the directory will be empty so scanning will not be very costly.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Simplify backend terminate and wait logic in postgres_fdw test
Next
From: Mark Dilger
Date:
Subject: Extending amcheck to check toast size and compression