Thread: [v15 beta] pg_upgrade failed if earlier executed with -c switch

[v15 beta] pg_upgrade failed if earlier executed with -c switch

From

tushar

Date:

03 June 2022, 14:19:36

Hi,

While performing pg_upgrade from v15Beta binaries/source,
I got this error below error

could not create directory "d2/pg_upgrade_output.d": File exists
Failure, exiting

Steps to reproduce
v15 Beta sources
initalize a cluster ( ./initdb -D d1)
initalize another cluster ( ./initdb -D d2)
run pg_upgrade with -c option ( ./pg_upgrade -d d1 -D d2 -b . -B . -c -v)
run pg_upgrade without -c option ( ./pg_upgrade -d d1 -D d2 -b . -B .)
--
--
--
Error

This behavior was not there in earlier released versions, i guess.
Is it expected behavior now onwards?

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Daniel Gustafsson

Date:

03 June 2022, 15:01:18

> On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:

> This behavior was not there in earlier released versions, i guess.
> Is it expected behavior now onwards?

That's an unfortunate side effect which AFAICT was overlooked in the original
thread.  Having a predictable name was defined as important for CI/BF, but I
agree that the above is likely to be a common user pattern (first running -c is
exactly what I did when managing databases and upgraded them with pg_upgrade).

This might break a few automated upgrade scripts out there (but they might also
already need changes to cope with the moved file locations).

We can address this by documentation, and specifically highlight under the -c
option in the manual that the folder need to removed/renamed (and possibly to
STDOUT aswell when run with -c).

--
Daniel Gustafsson        https://vmware.com/

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

03 June 2022, 16:53:54

On Fri, Jun 03, 2022 at 02:01:18PM +0200, Daniel Gustafsson wrote:
> > On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:
> 
> > This behavior was not there in earlier released versions, i guess. 
> > Is it expected behavior now onwards?
> 
> That's an unfortunate side effect which AFAICT was overlooked in the original
> thread.  Having a predictable name was defined as important for CI/BF, but I
> agree that the above is likely to be a common user pattern (first running -c is
> exactly what I did when managing databases and upgraded them with pg_upgrade).

I agree that it's an problem, but it's not limited to -c.

For example, I ran this:

|$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin/initdb -d ./pgsql14.dat -D ./pgsql15.dat 
|"/usr/pgsql-14/bin/initdb" is not a directory
|Failure, exiting

And then reran with the correct "-b" option, but then it failed because it had
failed before...

|$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin -d ./pgsql14.dat -D ./pgsql15.dat
|could not create directory "pgsql15.dat/pg_upgrade_output.d": File exists
|Failure, exiting

This is a kind of geometric circle of errors - an error at point A requires
first re-running after fixing A's issue, and then an error at B requires
re-running after fixing B's issue, hitting the "A" error again, and then
rerunning again again.  It's the same kind of problem that led to 3c0471b5f.

-c could use a different output directory, but that means it would fail if
pg_upgrade -c were run multiple times, which seems undesirable for a "check"
command.

We could call cleanup() if -c was successful.  But that doesn't help the case
that -c fails; the new dir would still need to be manually removed, which seems
like imposing useless busywork on the user.

We could allow mkdir to fail with EEXIST, except that breaks the original
motivation for the patch: the logs are appended to and any old errors are still
in the logs after re-running pg_upgrade.

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Daniel Gustafsson

Date:

03 June 2022, 17:52:09

> On 3 Jun 2022, at 15:53, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Jun 03, 2022 at 02:01:18PM +0200, Daniel Gustafsson wrote:
>>> On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>
>>> This behavior was not there in earlier released versions, i guess.
>>> Is it expected behavior now onwards?
>>
>> That's an unfortunate side effect which AFAICT was overlooked in the original
>> thread.  Having a predictable name was defined as important for CI/BF, but I
>> agree that the above is likely to be a common user pattern (first running -c is
>> exactly what I did when managing databases and upgraded them with pg_upgrade).
>
> I agree that it's an problem, but it's not limited to -c.

Indeed.

> For example, I ran this:
>
> |$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin/initdb -d ./pgsql14.dat -D ./pgsql15.dat
> |"/usr/pgsql-14/bin/initdb" is not a directory
> |Failure, exiting
>
> And then reran with the correct "-b" option, but then it failed because it had
> failed before...

Thats, not ideal.

> We could call cleanup() if -c was successful.  But that doesn't help the case
> that -c fails; the new dir would still need to be manually removed, which seems
> like imposing useless busywork on the user.
>
> We could allow mkdir to fail with EEXIST, except that breaks the original
> motivation for the patch: the logs are appended to and any old errors are still
> in the logs after re-running pg_upgrade.

Or we could revisit Tom's proposal in the thread that implemented the feature:
to have timestamped directory names to get around this very problem?  I think
we should be able to figure out a way to make it easy enough for the BF code to
figure out (and clean up).

--
Daniel Gustafsson        https://vmware.com/

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Tom Lane

Date:

03 June 2022, 19:26:55

Daniel Gustafsson <daniel@yesql.se> writes:
> Or we could revisit Tom's proposal in the thread that implemented the feature:
> to have timestamped directory names to get around this very problem?  I think
> we should be able to figure out a way to make it easy enough for the BF code to
> figure out (and clean up).

How about inserting an additional level of subdirectory?

pg_upgrade_output.d/20220603122528/foo.log

Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

            regards, tom lane

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Daniel Gustafsson

Date:

03 June 2022, 19:55:28

> On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> Or we could revisit Tom's proposal in the thread that implemented the feature:
>> to have timestamped directory names to get around this very problem?  I think
>> we should be able to figure out a way to make it easy enough for the BF code to
>> figure out (and clean up).
>
> How about inserting an additional level of subdirectory?
>
> pg_upgrade_output.d/20220603122528/foo.log
>
> Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

Off the cuff that seems like a good compromise.  Adding Andrew on CC: for input
on how that affects the buildfarm.

--
Daniel Gustafsson        https://vmware.com/

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

04 June 2022, 06:13:19

On Fri, Jun 03, 2022 at 06:55:28PM +0200, Daniel Gustafsson wrote:
> On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How about inserting an additional level of subdirectory?
>>
>> pg_upgrade_output.d/20220603122528/foo.log
>>
>> Then code doing "rm -rf pg_upgrade_output.d" needs no changes.
>
> Off the cuff that seems like a good compromise.  Adding Andrew on CC: for input
> on how that affects the buildfarm.

I am not so sure.  My first reaction was actually to bypass the
creation of the directories on EEXIST.  But, isn't the problem
different and actually older here?  In the set of commands given by
Tushar, he uses the --check option without --retain, but the logs are
kept around after the execution of the command.  It seems to me that
there is an argument for also removing the logs if the caller of the
command does not want to retain them.

Seeing the top of the thread, I think that it would be a good idea to
add an extra pg_upgrade --check before the real upgrade run.  I've
also relied on --check as a safety measure in the past for automated
workflows.
--
Michael

Attachment

signature.asc

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

04 June 2022, 06:32:27

On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:
> On Fri, Jun 03, 2022 at 06:55:28PM +0200, Daniel Gustafsson wrote:
> > On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> How about inserting an additional level of subdirectory?
> >> 
> >> pg_upgrade_output.d/20220603122528/foo.log
> >> 
> >> Then code doing "rm -rf pg_upgrade_output.d" needs no changes.
> > 
> > Off the cuff that seems like a good compromise.  Adding Andrew on CC: for input
> > on how that affects the buildfarm.
> 
> I am not so sure.  My first reaction was actually to bypass the
> creation of the directories on EEXIST.

But that breaks the original motive behind the patch I wrote - logfiles are
appended to, even if they're full of errors that were fixed before re-running
pg_upgrade.

> But, isn't the problem different and actually older here?  In the set of
> commands given by Tushar, he uses the --check option without --retain, but
> the logs are kept around after the execution of the command.  It seems to me
> that there is an argument for also removing the logs if the caller of the
> command does not want to retain them.

You're right that --check is a bit inconsistent, but I don't think we could
backpatch any change to fix it.  It wouldn't matter much anyway, since the
usual workflow would be "pg_upgrade --check && pg_upgrade".  In which case the
logs would end up being removed anyway.

On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:
> Seeing the top of the thread, I think that it would be a good idea to
> add an extra pg_upgrade --check before the real upgrade run.  I've
> also relied on --check as a safety measure in the past for automated
> workflows.

It already does this; --check really means "stop-after-checking".

Hmm .. maybe what you mean is that the *tap test* should first run with
--check?

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

04 June 2022, 12:48:19

On Fri, Jun 03, 2022 at 10:32:27PM -0500, Justin Pryzby wrote:
> On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:
>> I am not so sure.  My first reaction was actually to bypass the
>> creation of the directories on EEXIST.
>
> But that breaks the original motive behind the patch I wrote - logfiles are
> appended to, even if they're full of errors that were fixed before re-running
> pg_upgrade.

Yep.

>> But, isn't the problem different and actually older here?  In the set of
>> commands given by Tushar, he uses the --check option without --retain, but
>> the logs are kept around after the execution of the command.  It seems to me
>> that there is an argument for also removing the logs if the caller of the
>> command does not want to retain them.
>
> You're right that --check is a bit inconsistent, but I don't think we could
> backpatch any change to fix it.  It wouldn't matter much anyway, since the
> usual workflow would be "pg_upgrade --check && pg_upgrade".  In which case the
> logs would end up being removed anyway.

Exactly, the inconsistency in the log handling is annoying, and
cleaning up the logs when --retain is not used makes sense to me when
the --check command succeeds, but we should keep them if the --check
fails.  I don't see an argument in backpatching that either.

> Hmm .. maybe what you mean is that the *tap test* should first run with
> --check?

Sorry for the confusion.  I meant to add an extra command in the TAP
test itself.

I would suggest the attached patch then, to add a --check command in
the test suite, with a change to clean up the logs when --check is
used without --retain.
--
Michael

Attachment

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

04 June 2022, 17:13:46

On Sat, Jun 04, 2022 at 06:48:19PM +0900, Michael Paquier wrote:
> I would suggest the attached patch then, to add a --check command in
> the test suite, with a change to clean up the logs when --check is
> used without --retain.

This doesn't address one of the problems that I already enumerated.

./tmp_install/usr/local/pgsql/bin/initdb -D pgsql15.dat
./tmp_install/usr/local/pgsql/bin/initdb -D pgsql15.dat-2

$ ./tmp_install/usr/local/pgsql/bin/pg_upgrade -b ./tmp_install/usr/local/pgsql/bin/bad -d pgsql15.dat-2 -D
pgsql15.dat-2

check for "tmp_install/usr/local/pgsql/bin/bad" failed: No such file or directory
Failure, exiting

$ ./tmp_install/usr/local/pgsql/bin/pg_upgrade -b ./tmp_install/usr/local/pgsql/bin/bad -d pgsql15.dat-2 -D
pgsql15.dat-2

could not create directory "pgsql15.dat-2/pg_upgrade_output.d": File exists
Failure, exiting

..failing the 2nd time because it failed the 1st time (even if I fix the bad
argument).

Maybe that's easy enough to fix just be rearranging verify_directories() or
make_outputdirs().

But actually it seems annoying to have to remove the failed outputdir.
It's true that those logs *can* be useful to fix whatever underlying problem,
but I'm afraid the *requirement* to remove the failed outputdir is a nuisance,
even outside of check mode.

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

05 June 2022, 03:24:25

On Sat, Jun 04, 2022 at 09:13:46AM -0500, Justin Pryzby wrote:
> Maybe that's easy enough to fix just be rearranging verify_directories() or
> make_outputdirs().

For the case, I mentioned, yes.

> But actually it seems annoying to have to remove the failed outputdir.
> It's true that those logs *can* be useful to fix whatever underlying problem,
> but I'm afraid the *requirement* to remove the failed outputdir is a nuisance,
> even outside of check mode.

Well, another error that could happen in the early code paths is
EACCES on a custom socket directory specified, and we'd still face the
same problem on a follow-up restart.  Using a sub-directory structure
as Daniel and Tom mention would address all that (if ignoring EEXIST
for the BASE_OUTPUTDIR), removing any existing content from the base
path when not using --retain.  This comes with the disadvantage of
bloating the disk on repeated errors, but this last bit would not
really be a huge problem, I guess, as it could be more useful to keep
the error information around.
--
Michael

Attachment

signature.asc

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

05 June 2022, 12:19:33

On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:
> Well, another error that could happen in the early code paths is
> EACCES on a custom socket directory specified, and we'd still face the
> same problem on a follow-up restart.  Using a sub-directory structure
> as Daniel and Tom mention would address all that (if ignoring EEXIST
> for the BASE_OUTPUTDIR), removing any existing content from the base
> path when not using --retain.  This comes with the disadvantage of
> bloating the disk on repeated errors, but this last bit would not
> really be a huge problem, I guess, as it could be more useful to keep
> the error information around.

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached.  The logs are removed from the root path when --check is
used without --retain, like for a non-check command.  I have added a
set of tests to provide some coverage for the whole:
- Failure of --check where the binary path does not exist, and
pg_upgrade_output.d/ is not removed.
- Follow-up run of pg_upgrade --check, where pg_upgrade_output.d/ is
removed.
- Check that pg_upgrade_output.d/ is also removed after the main
upgrade command completes.

The logic in charge of cleaning up the logs has been moved to a single
routine, aka cleanup_logs().

Thoughts?
--
Michael

Attachment

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Daniel Gustafsson

Date:

06 June 2022, 03:38:03

> On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:
>> Well, another error that could happen in the early code paths is
>> EACCES on a custom socket directory specified, and we'd still face the
>> same problem on a follow-up restart.  Using a sub-directory structure
>> as Daniel and Tom mention would address all that (if ignoring EEXIST
>> for the BASE_OUTPUTDIR), removing any existing content from the base
>> path when not using --retain.  This comes with the disadvantage of
>> bloating the disk on repeated errors, but this last bit would not
>> really be a huge problem, I guess, as it could be more useful to keep
>> the error information around.
>
> I have been toying with the idea of a sub-directory named with a
> timestamp (Unix time, like log_line_prefix's %n but this could be
> any format) under pg_upgrade_output.d/ and finished with the
> attached.

I was thinking more along the lines of %m to make it (more) human readable, but
I'm certainly not wedded to any format.

> The logs are removed from the root path when --check is
> used without --retain, like for a non-check command.

This removes all logs after a command without --retain, even if a previous
command used --retain to keep the logs around.

As a user I would expect the logs from this current invocation to be removed
without --retain, and any other older log entries be kept.  I think we should
remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
after .logdir is removed.

> The logic in charge of cleaning up the logs has been moved to a single
> routine, aka cleanup_logs().

+        cleanup_logs();

Maybe we should register cleanup_logs() as an atexit() handler once we're done
with option processing?

+    snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+             timebuf, LOG_OUTPUTDIR);

While not introduced by this patch, it does make me uneasy that we create paths
without checking for buffer overflows..

--
Daniel Gustafsson        https://vmware.com/

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

06 June 2022, 07:17:52

On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:
> On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:
>> On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:
>>> Well, another error that could happen in the early code paths is
>>> EACCES on a custom socket directory specified, and we'd still face the
>>> same problem on a follow-up restart.  Using a sub-directory structure
>>> as Daniel and Tom mention would address all that (if ignoring EEXIST
>>> for the BASE_OUTPUTDIR), removing any existing content from the base
>>> path when not using --retain.  This comes with the disadvantage of
>>> bloating the disk on repeated errors, but this last bit would not
>>> really be a huge problem, I guess, as it could be more useful to keep
>>> the error information around.
>>
>> I have been toying with the idea of a sub-directory named with a
>> timestamp (Unix time, like log_line_prefix's %n but this could be
>> any format) under pg_upgrade_output.d/ and finished with the
>> attached.
>
> I was thinking more along the lines of %m to make it (more) human readable, but
> I'm certainly not wedded to any format.

Neither am I.  I would not map exactly to %m as it uses whitespaces,
but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
be fine?  If there are other ideas for the format, just let me know.

> As a user I would expect the logs from this current invocation to be removed
> without --retain, and any other older log entries be kept.  I think we should
> remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
> after .logdir is removed.

Okay, however I think you mean log_opts.basedir rather than logdir?
That's simple enough to switch around as pg_check_dir() does this
job.

>> The logic in charge of cleaning up the logs has been moved to a single
>> routine, aka cleanup_logs().
>
> +        cleanup_logs();
>
> Maybe we should register cleanup_logs() as an atexit() handler once we're done
> with option processing?

It seems to me that the original intention is to keep the logs around
on failure, hence we should only clean up things on a clean exit().
That's why I didn't add an exit callback for that.

> +    snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
> +             timebuf, LOG_OUTPUTDIR);
>
> While not introduced by this patch, it does make me uneasy that we create paths
> without checking for buffer overflows..

I don't mind adding such checks in those code paths.  You are right
that they tend to produce longer path strings than others.
--
Michael

Attachment

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Daniel Gustafsson

Date:

06 June 2022, 20:43:53

> On 6 Jun 2022, at 06:17, Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:
>> On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:

>>> I have been toying with the idea of a sub-directory named with a
>>> timestamp (Unix time, like log_line_prefix's %n but this could be
>>> any format) under pg_upgrade_output.d/ and finished with the
>>> attached.
>>
>> I was thinking more along the lines of %m to make it (more) human readable, but
>> I'm certainly not wedded to any format.
>
> Neither am I. I would not map exactly to %m as it uses whitespaces,
> but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
> be fine? If there are other ideas for the format, just let me know.

I think this makes more sense from an end-user perspective.

>> As a user I would expect the logs from this current invocation to be removed
>> without --retain, and any other older log entries be kept. I think we should
>> remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
>> after .logdir is removed.
>
> Okay, however I think you mean log_opts.basedir rather than logdir?
> That's simple enough to switch around as pg_check_dir() does this
> job.

Correct, I mistyped.  The cleanup in this version of the patch looks sane to
me.

--
Daniel Gustafsson        https://vmware.com/

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

06 June 2022, 21:53:35

On Mon, Jun 06, 2022 at 07:43:53PM +0200, Daniel Gustafsson wrote:
> > On 6 Jun 2022, at 06:17, Michael Paquier <michael@paquier.xyz> wrote:
> > On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:
> >> On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:
> 
> >>> I have been toying with the idea of a sub-directory named with a
> >>> timestamp (Unix time, like log_line_prefix's %n but this could be
> >>> any format) under pg_upgrade_output.d/ and finished with the
> >>> attached. 
> >> 
> >> I was thinking more along the lines of %m to make it (more) human readable, but
> >> I'm certainly not wedded to any format.

It seems important to use a format in most-significant-parts-first which sorts
nicely by filename, but otherwise anything could be okay.

> > Neither am I. I would not map exactly to %m as it uses whitespaces,
> > but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
> > be fine? If there are other ideas for the format, just let me know.
> 
> I think this makes more sense from an end-user perspective.

Is it better to use "T" instead of "_" ?

Apparently, that's ISO 8601, which can optionally use separators
(YYYY-MM-DDTHH:MM:SS).

https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations

I was thinking this would not include fractional seconds.  Maybe that would
mean that the TAP tests would need to sleep(1) at some points...

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

07 June 2022, 02:30:47

On Mon, Jun 06, 2022 at 01:53:35PM -0500, Justin Pryzby wrote:
> It seems important to use a format in most-significant-parts-first which sorts
> nicely by filename, but otherwise anything could be okay.

Agreed.

> Apparently, that's ISO 8601, which can optionally use separators
> (YYYY-MM-DDTHH:MM:SS).

OK, let's use a T, with the basic format and a minimal number of
separators then, we get 20220603T082255.

> I was thinking this would not include fractional seconds.  Maybe that would
> mean that the TAP tests would need to sleep(1) at some points...

If we don't split by the millisecond, we would come back to the
problems of the original report.  On my laptop, the --check phase
that passes takes more than 1s, but the one that fails takes 0.1s, so
a follow-up run would complain with the path conflicts.  So at the end
I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use
a logic that checks for conflicts and appends an extra number if
needed, though the addition of the extra ms is a bit shorter).
--
Michael

Attachment

signature.asc

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

07 June 2022, 05:42:37

On Tue, Jun 07, 2022 at 08:30:47AM +0900, Michael Paquier wrote:
> If we don't split by the millisecond, we would come back to the
> problems of the original report.  On my laptop, the --check phase
> that passes takes more than 1s, but the one that fails takes 0.1s, so
> a follow-up run would complain with the path conflicts.  So at the end
> I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use
> a logic that checks for conflicts and appends an extra number if
> needed, though the addition of the extra ms is a bit shorter).

So, attached is the patch I would like to apply for all that (commit
message included).  One issue I missed previously is that the TAP test
missed the log files on failure, so I had to tweak that with a find
routine.  I have fixed a few comments, and improved the docs to
describe the directory structure.

We are still need a refresh of the buildfarm client for the case where
pg_upgrade is tested without TAP, like that I guess:
--- a/PGBuild/Modules/TestUpgrade.pm
+++ b/PGBuild/Modules/TestUpgrade.pm
@@ -140,6 +140,7 @@ sub check
          $self->{pgsql}/src/bin/pg_upgrade/log/*
          $self->{pgsql}/src/bin/pg_upgrade/tmp_check/*/*.diffs
          $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/log/*
+         $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/*/log/*
          $self->{pgsql}/src/test/regress/*.diffs"
     );
     $log->add_log($_) foreach (@logfiles);
--
Michael

Attachment

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

07 June 2022, 06:11:48

tather => rather
is charge => in charge

On Mon, Jun 06, 2022 at 01:17:52PM +0900, Michael Paquier wrote:
> but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would

On Tue, Jun 07, 2022 at 08:30:47AM +0900, Michael Paquier wrote:
> I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use

I think it's better with a dot (HHMMSS.ms) rather than underscore (HHMMSS_ms).

On Tue, Jun 07, 2022 at 11:42:37AM +0900, Michael Paquier wrote:
> +    /* append milliseconds */
> +    snprintf(timebuf, sizeof(timebuf), "%s_%03d",
> +             timebuf, (int) (time.tv_usec / 1000));

> +   with a timestamp formatted as per ISO 8601
> +   (<literal>%Y%m%dT%H%M%S</literal>) appended by an underscore and
> +   the timestamp's milliseconds, where all the generated files are stored.

The ISO timestamp can include milliseconds (or apparently fractional parts of
the "lowest-order" unit), so the "appended by" part doesn't need to be
explained here.

+       snprintf(timebuf, sizeof(timebuf), "%s_%03d",
+                        timebuf, (int) (time.tv_usec / 1000));

Is it really allowed to sprintf a buffer onto itself ?
I can't find any existing cases doing that.

It seems useless in any case - you could instead
snprintf(timebuf+strlen(timebuf), or increment len+=snprintf()...

Or append the milliseconds here:

+       len = snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
+                                  timebuf);

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

08 June 2022, 04:55:29

On Mon, Jun 06, 2022 at 10:11:48PM -0500, Justin Pryzby wrote:
> tather => rather
> is charge => in charge

Thanks for the extra read.  Fixed.  There was an extra one in the
comments, as of s/thier/their/.

> I think it's better with a dot (HHMMSS.ms) rather than underscore (HHMMSS_ms).
>
> The ISO timestamp can include milliseconds (or apparently fractional parts of
> the "lowest-order" unit), so the "appended by" part doesn't need to be
> explained here.
>
> +       snprintf(timebuf, sizeof(timebuf), "%s_%03d",
> +                        timebuf, (int) (time.tv_usec / 1000));
>
> Is it really allowed to sprintf a buffer onto itself ?
> I can't find any existing cases doing that.

Yes, there is no need to do that, so I have just appended the ms
digits to the end of the string.

And applied, to take care of this open item.
--
Michael

Attachment

signature.asc

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Justin Pryzby

Date:

09 June 2022, 00:13:37

On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:
> And applied, to take care of this open item.

Shouldn't this wait for the buildfarm to be updated again ?

-- 
Justin

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

09 June 2022, 03:53:36

On Wed, Jun 08, 2022 at 04:13:37PM -0500, Justin Pryzby wrote:
> On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:
>> And applied, to take care of this open item.
>
> Shouldn't this wait for the buildfarm to be updated again ?

The TAP logic is able to find any logs by itself on failure, so what
would be impacted is the case of the tests running pg_upgrade via the
past route in TestUpgrade.pm (it had better not run in the buildfarm
client for 15~ and I am wondering if it would be worth backpatching
the TAP test once it brews a bit more).  Anyway, seeing my time sheet
for the next couple of days coupled with a potential beta2 in the very
short term and with the broken upgrade workflow, I have given priority
to fix the issue because that's what impacts directly people looking
at 15 and testing their upgrades, which is what Tushar did.

Saying that, I have already sent a pull request to the buildfarm repo
to refresh the set of logs, as of the patch attached.  This updates
the logic so as this would work for any changes in the structure of
pg_upgrade_output.d/, fetching any files prefixed by ".log".
--
Michael

Attachment

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Andrew Dunstan

Date:

11 June 2022, 00:45:11

On 2022-06-08 We 20:53, Michael Paquier wrote:
> On Wed, Jun 08, 2022 at 04:13:37PM -0500, Justin Pryzby wrote:
>> On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:
>>> And applied, to take care of this open item.
>> Shouldn't this wait for the buildfarm to be updated again ?
> The TAP logic is able to find any logs by itself on failure, so what
> would be impacted is the case of the tests running pg_upgrade via the
> past route in TestUpgrade.pm (it had better not run in the buildfarm
> client for 15~ and I am wondering if it would be worth backpatching
> the TAP test once it brews a bit more).  Anyway, seeing my time sheet
> for the next couple of days coupled with a potential beta2 in the very
> short term and with the broken upgrade workflow, I have given priority
> to fix the issue because that's what impacts directly people looking
> at 15 and testing their upgrades, which is what Tushar did.
>
> Saying that, I have already sent a pull request to the buildfarm repo
> to refresh the set of logs, as of the patch attached.  This updates
> the logic so as this would work for any changes in the structure of
> pg_upgrade_output.d/, fetching any files prefixed by ".log".




The module is already a noop if there's a TAP test for pg_upgrade. So I
don't understand the point of the PR at all.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Michael Paquier

Date:

14 June 2022, 05:50:12

On Fri, Jun 10, 2022 at 05:45:11PM -0400, Andrew Dunstan wrote:
> The module is already a noop if there's a TAP test for pg_upgrade. So I
> don't understand the point of the PR at all.

Oh.  I thought that the old path was still taken as long as
--enable-tap-tests was not used.  I was wrong, then.  I'll go and
remove the pull request.
--
Michael

Attachment

signature.asc

Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

From

Andrew Dunstan

Date:

14 June 2022, 15:42:55

On 2022-06-13 Mo 22:50, Michael Paquier wrote:
> On Fri, Jun 10, 2022 at 05:45:11PM -0400, Andrew Dunstan wrote:
>> The module is already a noop if there's a TAP test for pg_upgrade. So I
>> don't understand the point of the PR at all.
> Oh.  I thought that the old path was still taken as long as
> --enable-tap-tests was not used.  I was wrong, then.  I'll go and
> remove the pull request.


It did that from 2018 (826d450), but from 2021(691e649) all it does is
look for the TAP test subdirectory. The old logic is still there
redundantly, so I'll remove it to clean up confusion.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com