Thread: pgsql: pg_rewind: Fetch small files according to new size.

pgsql: pg_rewind: Fetch small files according to new size.

From
Daniel Gustafsson
Date:
pg_rewind: Fetch small files according to new size.

There's a race condition if a file changes in the source system
after we have collected the file list. If the file becomes larger,
we only fetched up to its original size. That can easily result in
a truncated file.  That's not a problem for relation files, files
in pg_xact, etc. because any actions on them will be replayed from
the WAL.  However, configuration files are affected.

This commit mitigates the race condition by fetching small files in
whole, even if they have grown.  A test is added in which an extra
file copied is concurrently grown with the output of pg_rewind thus
guaranteeing it to have changed in size during the operation.  This
is not a full fix: we still believe the original file size for files
larger than 1 MB.  That should be enough for configuration files,
and doing more than that would require big changes to the chunking
logic in libpq_source.c.

This mitigates the race condition if the file is modified between
the original scan of files and copying the file, but there's still
a race condition if a file is changed while it's being copied.
That's a much smaller window, though, and pg_basebackup has the
same issue.

This race can be seen with pg_auto_failover, which frequently uses
ALTER SYSTEM, which updates postgresql.auto.conf.  Often, pg_rewind
will fail, because the postgresql.auto.conf file changed concurrently
and a partial version of it was copied to the target.  The partial
file would fail to parse, preventing the server from starting up.

Author: Heikki Linnakangas
Reviewed-by: Cary Huang
Discussion: https://postgr.es/m/f67feb24-5833-88cb-1020-19a4a2b83ac7%40iki.fi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/16915126746e2d8597a92197a346fea0756f8e3e

Modified Files
--------------
src/bin/pg_rewind/libpq_source.c         | 32 ++++++++++++++
src/bin/pg_rewind/local_source.c         | 76 +++++++++++++++++++++++++++-----
src/bin/pg_rewind/pg_rewind.c            |  5 +--
src/bin/pg_rewind/rewind_source.h        | 13 ++++++
src/bin/pg_rewind/t/009_growing_files.pl | 76 ++++++++++++++++++++++++++++++++
5 files changed, 188 insertions(+), 14 deletions(-)


Re: pgsql: pg_rewind: Fetch small files according to new size.

From
Daniel Gustafsson
Date:
> On 5 Apr 2022, at 15:02, Daniel Gustafsson <dgustafsson@postgresql.org> wrote:
>
> pg_rewind: Fetch small files according to new size.

The buildfarm is less impressed than CI was, I’m collecting more feedback and will then fix.

/ daniel


Re: pgsql: pg_rewind: Fetch small files according to new size.

From
Daniel Gustafsson
Date:
> On 5 Apr 2022, at 15:36, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 5 Apr 2022, at 15:02, Daniel Gustafsson <dgustafsson@postgresql.org> wrote:
>>
>> pg_rewind: Fetch small files according to new size.
>
> The buildfarm is less impressed than CI was, I’m collecting more feedback and will then fix.

Sorry for being slow, life took over and children with fever took priority.
The error in question was:

local_source.c:118:15: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka
'unsignedint') [-Werror,-Wformat] 
                                 srcpath, len, written_len);
                                 ~~~~~~~~~^~~~~~~~~~~~~~~~~

I'm running a fixup with casting to int and printing with %d (like how
pg_rewind.c:digestControlFile already does it for printing a size_t) through CI
just to be sure and will push once it's had a green run:

-               pg_fatal("size of source file \"%s\" changed concurrently: " UINT64_FORMAT " bytes expected, "
UINT64_FORMAT" copied", 
-                                srcpath, len, written_len);
+               pg_fatal("size of source file \"%s\" changed concurrently: %d bytes expected, %d copied",
+                                srcpath, (int) len, (int) written_len);

--
Daniel Gustafsson        https://vmware.com/




Re: pgsql: pg_rewind: Fetch small files according to new size.

From
Alvaro Herrera
Date:
On 2022-Apr-05, Daniel Gustafsson wrote:

> local_source.c:118:15: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka
'unsignedint') [-Werror,-Wformat]
 
>                                  srcpath, len, written_len);
>                                  ~~~~~~~~~^~~~~~~~~~~~~~~~~
> 
> I'm running a fixup with casting to int and printing with %d (like how
> pg_rewind.c:digestControlFile already does it for printing a size_t) through CI
> just to be sure and will push once it's had a green run:
> 
> -               pg_fatal("size of source file \"%s\" changed concurrently: " UINT64_FORMAT " bytes expected, "
UINT64_FORMAT" copied",
 
> -                                srcpath, len, written_len);
> +               pg_fatal("size of source file \"%s\" changed concurrently: %d bytes expected, %d copied",
> +                                srcpath, (int) len, (int) written_len);

Hmm, it is typical to cast file sizes to long long and print with %lld.
See d914eb347fcd for a recent example.

.. oh, I see you pushed already.  Not sure this is worth fixing, since
the original commit message claimed to involve only "small files".

I suppose having a non-relation-file in the datadir that's bigger than
4GB is not expected?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)



Re: pgsql: pg_rewind: Fetch small files according to new size.

From
Daniel Gustafsson
Date:
> On 6 Apr 2022, at 10:44, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> Hmm, it is typical to cast file sizes to long long and print with %lld.
> See d914eb347fcd for a recent example.
>
> .. oh, I see you pushed already.  Not sure this is worth fixing, since
> the original commit message claimed to involve only "small files".

The logic is only effective against files smaller than MAX_CHUNK_SIZE which is
1Mb.  I should've mentioned that in the fixup commit message, sorry for being
vague.

--
Daniel Gustafsson        https://vmware.com/