Thread: optimizing pg_upgrade's once-in-each-database steps

optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

16 May 2024, 21:16:38

A number of pg_upgrade steps require connecting to each database and
running a query.  When there are many databases, these steps are
particularly time-consuming, especially since this is done sequentially in
a single process.  At a quick glance, I see the following such steps:

    * create_logical_replication_slots
    * check_for_data_types_usage
    * check_for_isn_and_int8_passing_mismatch
    * check_for_user_defined_postfix_ops
    * check_for_incompatible_polymorphics
    * check_for_tables_with_oids
    * check_for_user_defined_encoding_conversions
    * check_old_cluster_subscription_state
    * get_loadable_libraries
    * get_db_rel_and_slot_infos
    * old_9_6_invalidate_hash_indexes
    * report_extension_updates

I set out to parallelize these kinds of steps via multiple threads or
processes, but I ended up realizing that we could likely achieve much of
the same gain with libpq's asynchronous APIs.  Specifically, both
establishing the connections and running the queries can be done without
blocking, so we can just loop over a handful of slots and advance a simple
state machine for each.  The attached is a proof-of-concept grade patch for
doing this for get_db_rel_and_slot_infos(), which yielded the following
results on my laptop for "pg_upgrade --link --sync-method=syncfs --jobs 8"
for a cluster with 10K empty databases.

    total pg_upgrade_time:
    * HEAD:  14m 8s
    * patch: 10m 58s

    get_db_rel_and_slot_infos() on old cluster:
    * HEAD:  2m 45s
    * patch: 36s

    get_db_rel_and_slot_infos() on new cluster:
    * HEAD:  1m 46s
    * patch: 29s

I am posting this early to get thoughts on the general approach.  If we
proceeded with this strategy, I'd probably create some generic tooling that
each relevant step would provide a set of callback functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-parallel-get-relinfos.patch

Re: optimizing pg_upgrade's once-in-each-database steps

From

Jeff Davis

Date:

17 May 2024, 00:09:55

On Thu, 2024-05-16 at 16:16 -0500, Nathan Bossart wrote:
> I am posting this early to get thoughts on the general approach.  If
> we
> proceeded with this strategy, I'd probably create some generic
> tooling that
> each relevant step would provide a set of callback functions.

The documentation states:

"pg_dump -j uses multiple database connections; it connects to the
database once with the leader process and once again for each worker
job."

That might need to be adjusted.

How much complexity do you avoid by using async instead of multiple
processes?

Also, did you consider connecting once to each database and running
many queries? Most of those seem like just checks.

Regards,
    Jeff Davis

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

17 May 2024, 01:24:08

On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote:
> How much complexity do you avoid by using async instead of multiple
> processes?

If we didn't want to use async, my guess is we'd want to use threads to
avoid complicated IPC.  And if we followed pgbench's example for using
threads, it might end up at a comparable level of complexity, although I'd
bet that threading would be the more complex of the two.  It's hard to say
definitively without coding it up both ways, which might be worth doing.

> Also, did you consider connecting once to each database and running
> many queries? Most of those seem like just checks.

This was the idea behind 347758b.  It may be possible to do more along
these lines.  IMO parallelizing will still be useful even if we do combine
more of the steps.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

01 July 2024, 19:46:56

I figured I'd post what I have so far since this thread hasn't been updated
in a while.  The attached patches are still "proof-of-concept grade," but
they are at least moving in the right direction (IMHO).  The variable
naming is still not great, and they are woefully undercommented, among
other things.

0001 introduces a new API for registering callbacks and running them in
parallel on all databases in the cluster.  This new system manages a set of
"slots" that follow a simple state machine to asynchronously establish a
connection and run the queries.  It uses system() to wait for these
asynchronous tasks to complete.  Users of this API only need to provide two
callbacks: one to return the query that should be run on each database and
another to process the results of that query.  If multiple queries are
required for each database, users can provide multiple sets of callbacks.

The other patches change several of the existing tasks to use this new API.
With these patches applied, I see the following differences in the output
of 'pg_upgrade | ts -i' for a cluster with 1k empty databases:

    WITHOUT PATCH

    00:00:19 Checking database user is the install user                    ok
    00:00:02 Checking for subscription state                               ok
    00:00:06 Adding ".old" suffix to old global/pg_control                 ok
    00:00:04 Checking for extension updates                                ok

    WITH PATCHES (--jobs 1)

    00:00:10 Checking database user is the install user                    ok
    00:00:02 Checking for subscription state                               ok
    00:00:07 Adding ".old" suffix to old global/pg_control                 ok
    00:00:05 Checking for extension updates                                ok

    WITH PATCHES (--jobs 4)

    00:00:06 Checking database user is the install user                    ok
    00:00:00 Checking for subscription state                               ok
    00:00:02 Adding ".old" suffix to old global/pg_control                 ok
    00:00:01 Checking for extension updates                                ok

Note that the "Checking database user is the install user" time also
includes the call to get_db_rel_and_slot_infos() on the old cluster as well
as the call to get_loadable_libraries() on the old cluster.  I believe the
improvement with the patches with just one job is due to the consolidation
of the queries into one database connection (presently,
get_db_rel_and_slot_infos() creates 3 connections per database for some
upgrades).  Similarly, the "Adding \".old\" suffix to old
global/pg_control" time includes the call to get_db_rel_and_slot_infos() on
the new cluster.

There are several remaining places where we could use this new API to speed
up upgrades.  For example, I haven't attempted to use it for the data type
checks yet, and that tends to eat up a sizable chunk of time when there are
many databases.

On Thu, May 16, 2024 at 08:24:08PM -0500, Nathan Bossart wrote:
> On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote:
>> Also, did you consider connecting once to each database and running
>> many queries? Most of those seem like just checks.
> 
> This was the idea behind 347758b.  It may be possible to do more along
> these lines.  IMO parallelizing will still be useful even if we do combine
> more of the steps.

My current thinking is that any possible further consolidation should
happen as part of a follow-up effort to parallelization.  I'm cautiously
optimistic that the parallelization work will make the consolidation easier
since it moves things to rigidly-defined callback functions.

A separate piece of off-list feedback from Michael Paquier is that this new
parallel system might be something we can teach the ParallelSlot code used
by bin/scripts/ to do.  I've yet to look too deeply into this, but I
suspect that it will be difficult to combine the two.  For example, the
ParallelSlot system doesn't seem well-suited for the kind of
run-once-in-each-database tasks required by pg_upgrade, and the error
handling is probably little different, too.  However, it's still worth a
closer look, and I'm interested in folks' opinions on the subject.

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Robert Haas

Date:

01 July 2024, 19:58:16

On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> 0001 introduces a new API for registering callbacks and running them in
> parallel on all databases in the cluster.  This new system manages a set of
> "slots" that follow a simple state machine to asynchronously establish a
> connection and run the queries.  It uses system() to wait for these
> asynchronous tasks to complete.  Users of this API only need to provide two
> callbacks: one to return the query that should be run on each database and
> another to process the results of that query.  If multiple queries are
> required for each database, users can provide multiple sets of callbacks.

I do really like the idea of using asynchronous communication here. It
should be significantly cheaper than using multiple processes or
threads, and maybe less code, too. But I'm confused about why you
would need or want to use system() to wait for asynchronous tasks to
complete. Wouldn't it be something like select()?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

01 July 2024, 20:00:53

On Mon, Jul 01, 2024 at 03:58:16PM -0400, Robert Haas wrote:
> On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> 0001 introduces a new API for registering callbacks and running them in
>> parallel on all databases in the cluster.  This new system manages a set of
>> "slots" that follow a simple state machine to asynchronously establish a
>> connection and run the queries.  It uses system() to wait for these
>> asynchronous tasks to complete.  Users of this API only need to provide two
>> callbacks: one to return the query that should be run on each database and
>> another to process the results of that query.  If multiple queries are
>> required for each database, users can provide multiple sets of callbacks.
> 
> I do really like the idea of using asynchronous communication here. It
> should be significantly cheaper than using multiple processes or
> threads, and maybe less code, too. But I'm confused about why you
> would need or want to use system() to wait for asynchronous tasks to
> complete. Wouldn't it be something like select()?

Whoops, I meant to type "select()" there.  Sorry for the confusion.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

08 July 2024, 15:22:08

As I mentioned elsewhere [0], here's a first attempt at parallelizing the
data type checks.  I was worried that I might have to refactor Daniel's
work in commit 347758b quite significantly, but I was able to avoid that by
using a set of generic callbacks and providing each task step an index to
the data_types_usage_checks array.

[0] https://postgr.es/m/Zov5kHbEyDMuHJI_%40nathan

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

09 July 2024, 03:33:13

I finished parallelizing everything in pg_upgrade I could easily
parallelize with the proposed async task API.  There are a few remaining
places where I could not easily convert to the new API for whatever reason.
AFAICT those remaining places are either not showing up prominently in my
tests, or they only affect versions that are unsupported or will soon be
unsupported when v18 is released.  Similarly, I noticed many of the checks
follow a very similar coding pattern, but most (if not all) only affect
older versions, so I'm unsure whether it's worth the trouble to try
consolidating them into one scan through all the databases.

The code is still very rough and nowhere near committable, but this at
least gets the patch set into the editing phase.

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Daniel Gustafsson

Date:

17 July 2024, 21:16:59

> On 9 Jul 2024, at 05:33, Nathan Bossart <nathandbossart@gmail.com> wrote:

> The code is still very rough and nowhere near committable, but this at
> least gets the patch set into the editing phase.

First reaction after a read-through is that this seems really cool, can't wait
to see how much v18 pg_upgrade will be over v17.  I will do more testing and
review once back from vacation, below are some comments from reading which is
all I had time for at this point:

+static void
+conn_failure(PGconn *conn)
+{
+   pg_log(PG_REPORT, "%s", PQerrorMessage(conn));
+   printf(_("Failure, exiting\n"));
+   exit(1);
+}

Any particular reason this isn't using pg_fatal()?

+static void
+dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
    ....
+   pg_free(query);
+}

A minor point, perhaps fueled by me not having played around much with this
patchset.  It seems a bit odd that dispatch_query is responsible for freeing
the query from the get_query callback.  I would have expected the output from
AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.

+static void
+sub_process(DbInfo *dbinfo, PGresult *res, void *arg)
+{
     ....
+       fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\"
schema:\"%s\"relation:\"%s\"\n", 
+               PQgetvalue(res, i, 0),
+               dbinfo->db_name,
+               PQgetvalue(res, i, 1),

With the query being in a separate place in the code from the processing it
takes a bit of jumping around to resolve the columns in PQgetvalue calls like
this.  Using PQfnumber() calls and descriptive names would make this easier.  I
know this case is copying old behavior, but the function splits make it less
useful than before.

+   char       *query = pg_malloc(QUERY_ALLOC);

Should we convert this to a PQExpBuffer?

--
Daniel Gustafsson

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

17 July 2024, 21:32:10

On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:
> First reaction after a read-through is that this seems really cool, can't wait
> to see how much v18 pg_upgrade will be over v17.  I will do more testing and
> review once back from vacation, below are some comments from reading which is
> all I had time for at this point:

Thanks for taking a look!

> +static void
> +conn_failure(PGconn *conn)
> +{
> +   pg_log(PG_REPORT, "%s", PQerrorMessage(conn));
> +   printf(_("Failure, exiting\n"));
> +   exit(1);
> +}
> 
> Any particular reason this isn't using pg_fatal()?

IIRC this was to match the error in connectToServer().  We could probably
move to pg_fatal().

> +static void
> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
>     ....
> +   pg_free(query);
> +}
> 
> A minor point, perhaps fueled by me not having played around much with this
> patchset.  It seems a bit odd that dispatch_query is responsible for freeing
> the query from the get_query callback.  I would have expected the output from
> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.

I don't see any problem with doing it the way you suggest.

Tangentially related, I waffled a bit on whether to require the query
callbacks to put the result in allocated memory.  Some queries are the same
no matter what, and some require customization at runtime.  As you can see,
I ended up just requiring allocated memory.  That makes the code a tad
simpler, and I doubt the extra work is noticeable.

> +static void
> +sub_process(DbInfo *dbinfo, PGresult *res, void *arg)
> +{
>      ....
> +       fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\"
schema:\"%s\"relation:\"%s\"\n",
 
> +               PQgetvalue(res, i, 0),
> +               dbinfo->db_name,
> +               PQgetvalue(res, i, 1),
> 
> With the query being in a separate place in the code from the processing it
> takes a bit of jumping around to resolve the columns in PQgetvalue calls like
> this.  Using PQfnumber() calls and descriptive names would make this easier.  I
> know this case is copying old behavior, but the function splits make it less
> useful than before.

Good point.

> +   char       *query = pg_malloc(QUERY_ALLOC);
> 
> Should we convert this to a PQExpBuffer?

Seems like a good idea.  I think I was trying to change as few lines as
possible for my proof-of-concept.  :)

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Daniel Gustafsson

Date:

18 July 2024, 07:57:23

> On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:

>> +static void
>> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
>>    ....
>> +   pg_free(query);
>> +}
>>
>> A minor point, perhaps fueled by me not having played around much with this
>> patchset.  It seems a bit odd that dispatch_query is responsible for freeing
>> the query from the get_query callback.  I would have expected the output from
>> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.
>
> I don't see any problem with doing it the way you suggest.
>
> Tangentially related, I waffled a bit on whether to require the query
> callbacks to put the result in allocated memory.  Some queries are the same
> no matter what, and some require customization at runtime.  As you can see,
> I ended up just requiring allocated memory.  That makes the code a tad
> simpler, and I doubt the extra work is noticeable.

I absolutely agree with this.

>> +   char       *query = pg_malloc(QUERY_ALLOC);
>>
>> Should we convert this to a PQExpBuffer?
>
> Seems like a good idea.  I think I was trying to change as few lines as
> possible for my proof-of-concept.  :)

Yeah, that's a good approach, I just noticed it while reading the hunks.  We
can do that separately from this patchset.

In order to trim down the size of the patchset I think going ahead with 0003
independently of this makes sense, it has value by itself.

--
Daniel Gustafsson

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

19 July 2024, 21:21:37

On Thu, Jul 18, 2024 at 09:57:23AM +0200, Daniel Gustafsson wrote:
>> On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:
> 
>>> +static void
>>> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
>>>    ....
>>> +   pg_free(query);
>>> +}
>>> 
>>> A minor point, perhaps fueled by me not having played around much with this
>>> patchset.  It seems a bit odd that dispatch_query is responsible for freeing
>>> the query from the get_query callback.  I would have expected the output from
>>> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.
>> 
>> I don't see any problem with doing it the way you suggest.

Actually, I do see a problem.  If we do it this way, we'll have to store a
string per database somewhere, which seems unnecessary.

However, while looking into this, I noticed that only one get_query
callback (get_db_subscription_count()) actually customizes the generated
query using information in the provided DbInfo.  AFAICT we can do this
particular step without running a query in each database, as I mentioned
elsewhere [0].  That should speed things up a bit and allow us to simplify
the AsyncTask code.

With that, if we are willing to assume that a given get_query callback will
generate the same string for all databases (and I think we should), we can
run the callback once and save the string in the step for dispatch_query()
to use.  This would look more like what you suggested in the quoted text.

[0] https://postgr.es/m/ZprQJv_TxccN3tkr%40nathan

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

22 July 2024, 20:07:10

On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
> However, while looking into this, I noticed that only one get_query
> callback (get_db_subscription_count()) actually customizes the generated
> query using information in the provided DbInfo.  AFAICT we can do this
> particular step without running a query in each database, as I mentioned
> elsewhere [0].  That should speed things up a bit and allow us to simplify
> the AsyncTask code.
> 
> With that, if we are willing to assume that a given get_query callback will
> generate the same string for all databases (and I think we should), we can
> run the callback once and save the string in the step for dispatch_query()
> to use.  This would look more like what you suggested in the quoted text.

Here is a new patch set.  I've included the latest revision of the patch to
fix get_db_subscription_count() from the other thread [0] as 0001 since I
expect that to be committed soon.  I've also moved the patch that moves the
"live_check" variable to "user_opts" to 0002 since I plan on committing
that sooner than later, too.  Otherwise, I've tried to address all feedback
provided thus far.

[0] https://commitfest.postgresql.org/49/5135/

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

25 July 2024, 02:58:52

On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote:
> Here is a new patch set.  I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon.  I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too.  Otherwise, I've tried to address all feedback
> provided thus far.

Here is just the live_check patch.  I rebased it, gave it a commit message,
and fixed a silly mistake.  Barring objections or cfbot indigestion, I plan
to commit this within the next couple of days.

-- 
nathan

Attachment

v6-0001-pg_upgrade-Move-live_check-variable-to-user_opts.patch

Re: optimizing pg_upgrade's once-in-each-database steps

From

Daniel Gustafsson

Date:

25 July 2024, 09:42:55

> On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote:
>> Here is a new patch set.  I've included the latest revision of the patch to
>> fix get_db_subscription_count() from the other thread [0] as 0001 since I
>> expect that to be committed soon.  I've also moved the patch that moves the
>> "live_check" variable to "user_opts" to 0002 since I plan on committing
>> that sooner than later, too.  Otherwise, I've tried to address all feedback
>> provided thus far.
>
> Here is just the live_check patch.  I rebased it, gave it a commit message,
> and fixed a silly mistake.  Barring objections or cfbot indigestion, I plan
> to commit this within the next couple of days.

LGTM

--
Daniel Gustafsson

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

26 July 2024, 18:39:23

On Thu, Jul 25, 2024 at 11:42:55AM +0200, Daniel Gustafsson wrote:
>> On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Here is just the live_check patch.  I rebased it, gave it a commit message,
>> and fixed a silly mistake.  Barring objections or cfbot indigestion, I plan
>> to commit this within the next couple of days.
> 
> LGTM

Thanks, committed.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Ilya Gladyshev

Date:

31 July 2024, 21:55:33

On 22.07.2024 21:07, Nathan Bossart wrote:
> On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
>> However, while looking into this, I noticed that only one get_query
>> callback (get_db_subscription_count()) actually customizes the generated
>> query using information in the provided DbInfo.  AFAICT we can do this
>> particular step without running a query in each database, as I mentioned
>> elsewhere [0].  That should speed things up a bit and allow us to simplify
>> the AsyncTask code.
>>
>> With that, if we are willing to assume that a given get_query callback will
>> generate the same string for all databases (and I think we should), we can
>> run the callback once and save the string in the step for dispatch_query()
>> to use.  This would look more like what you suggested in the quoted text.
> Here is a new patch set.  I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon.  I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too.  Otherwise, I've tried to address all feedback
> provided thus far.
>
> [0] https://commitfest.postgresql.org/49/5135/
>
Hi,

I like your idea of parallelizing these checks with async libpq API, 
thanks for working on it. The patch doesn't apply cleanly on master 
anymore, but I've rebased locally and taken it for a quick spin with a 
pg16 instance of 1000 empty databases. Didn't see any regressions with 
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).

One thing that I noticed that could be improved is we could start a new 
connection right away after having run all query callbacks for the 
current connection in process_slot, instead of just returning and 
establishing the new connection only on the next iteration of the loop 
in async_task_run after potentially sleeping on select.

+1 to Jeff's suggestion that perhaps we could reuse connections, but 
perhaps that's a separate story.

Regards,

Ilya

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

01 August 2024, 17:44:35

On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
> I like your idea of parallelizing these checks with async libpq API, thanks
> for working on it. The patch doesn't apply cleanly on master anymore, but
> I've rebased locally and taken it for a quick spin with a pg16 instance of
> 1000 empty databases. Didn't see any regressions with -j 1, there's some
> speedup with -j 8 (33 sec vs 8 sec for these checks).

Thanks for taking a look.  I'm hoping to do a round of polishing before
posting a rebased patch set soon.

> One thing that I noticed that could be improved is we could start a new
> connection right away after having run all query callbacks for the current
> connection in process_slot, instead of just returning and establishing the
> new connection only on the next iteration of the loop in async_task_run
> after potentially sleeping on select.

Yeah, we could just recursively call process_slot() right after freeing the
slot.  That'd at least allow us to avoid the spinning behavior as we run
out of databases to process, if nothing else.

> +1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps
> that's a separate story.

When I skimmed through the various tasks, I didn't see a ton of
opportunities for further consolidation, or at least opportunities that
would help for upgrades from supported versions.  The data type checks are
already consolidated, for example.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

01 August 2024, 21:41:18

On Thu, Aug 01, 2024 at 12:44:35PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
>> I like your idea of parallelizing these checks with async libpq API, thanks
>> for working on it. The patch doesn't apply cleanly on master anymore, but
>> I've rebased locally and taken it for a quick spin with a pg16 instance of
>> 1000 empty databases. Didn't see any regressions with -j 1, there's some
>> speedup with -j 8 (33 sec vs 8 sec for these checks).
> 
> Thanks for taking a look.  I'm hoping to do a round of polishing before
> posting a rebased patch set soon.
> 
>> One thing that I noticed that could be improved is we could start a new
>> connection right away after having run all query callbacks for the current
>> connection in process_slot, instead of just returning and establishing the
>> new connection only on the next iteration of the loop in async_task_run
>> after potentially sleeping on select.
> 
> Yeah, we could just recursively call process_slot() right after freeing the
> slot.  That'd at least allow us to avoid the spinning behavior as we run
> out of databases to process, if nothing else.

Here is a new patch set.  Besides rebasing, I've added the recursive call
to process_slot() mentioned in the quoted text, and I've added quite a bit
of commentary to async.c.

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Ilya Gladyshev

Date:

04 August 2024, 18:19:57

On 01.08.2024 22:41, Nathan Bossart wrote:
> Here is a new patch set.  Besides rebasing, I've added the recursive call
> to process_slot() mentioned in the quoted text, and I've added quite a bit
> of commentary to async.c.
That's much better now, thanks! Here's my code review, note that I 
haven't tested the patches yet:

+void
+async_task_add_step(AsyncTask *task,
+                    AsyncTaskGetQueryCB query_cb,
+                    AsyncTaskProcessCB process_cb, bool free_result,
+                    void *arg)

Is there any reason to have query as a callback function instead of char 
*? From what I see right now, it doesn't give any extra flexibility, as 
the query has to be static anyway (can't be customized on a per-database 
basis) and will be created once before all the callbacks are run. While 
passing in char * makes the API simpler, excludes any potential error of 
making the query dependent on the current database and removes the 
unnecessary malloc/free of the static strings.

+static void
+dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
+               const AsyncTask *task)
+{
+    ...
+    if (!PQsendQuery(slot->conn, cbs->query))
+        conn_failure(slot->conn);
+}

This will print "connection failure: connection pointer is NULL", which 
I don't think makes a lot of sense to the end user. I'd prefer something 
like pg_fatal("failed to allocate a new connection").

      if (found)
-        pg_fatal("Data type checks failed: %s", report.data);
+    {
+        pg_fatal("Data type checks failed: %s", 
data_type_check_report.data);
+        termPQExpBuffer(&data_type_check_report);
+    }

`found` should be removed and replaced with `data_type_check_failed`, as 
it's not set anymore. Also the termPQExpBuffer after pg_fatal looks 
unnecessary.

+static bool *data_type_check_results;
+static bool data_type_check_failed;
+static PQExpBufferData data_type_check_report;

IMO, it would be nicer to have these as a local state, that's passed in 
as an arg* to the AsyncTaskProcessCB, which aligns with how the other 
checks do it.

-- End of review --

Regarding keeping the connections, the way I envisioned it entailed 
passing a list of connections from one check to the next one (or keeping 
a global state with connections?). I didn't concretely look at the code 
to verify this, so it's just an abstract idea.

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

06 August 2024, 19:20:14

On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote:
> -- End of review --

Thanks for the review.  I've attempted to address all your feedback in v8
of the patch set.  I think the names could still use some work, but I
wanted to get the main structure in place before trying to fix them.

> Regarding keeping the connections, the way I envisioned it entailed passing
> a list of connections from one check to the next one (or keeping a global
> state with connections?). I didn't concretely look at the code to verify
> this, so it's just an abstract idea.

My main concern with doing something like that is it could require
maintaining a huge number of connections when there are many databases.
GUCs like max_connections would need to be set accordingly.  I'm a little
dubious that the gains would be enough to justify it.

-- 
nathan

Attachment

Re: optimizing pg_upgrade's once-in-each-database steps

From

Corey Huinker

Date:

08 August 2024, 22:18:38

On Tue, Aug 6, 2024 at 3:20 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote:
> -- End of review --

Thanks for the review. I've attempted to address all your feedback in v8
of the patch set. I think the names could still use some work, but I
wanted to get the main structure in place before trying to fix them.

I think the underlying mechanism is basically solid, but I have one question: isn't this the ideal case for using libpq pipelining? That would allow subsequent tasks to launch while the main loop slowly gets around to clearing off completed tasks on some other connection.

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

09 August 2024, 14:43:59

On Thu, Aug 08, 2024 at 06:18:38PM -0400, Corey Huinker wrote:
> I think the underlying mechanism is basically solid, but I have one
> question: isn't this the ideal case for using libpq pipelining? That would
> allow subsequent tasks to launch while the main loop slowly gets around to
> clearing off completed tasks on some other connection.

I'll admit I hadn't really considered pipelining, but I'm tempted to say
that it's probably not worth the complexity.  Not only do most of the tasks
have only one step, but even tasks like the data types check are unlikely
to require more than a few queries for upgrades from supported versions.
Furthermore, most of the callbacks should do almost nothing for a given
upgrade, and since pg_upgrade runs on the server, client/server round-trip
time should be pretty low.

Perhaps pipelining would make more sense if we consolidated the tasks a bit
better, but when I last looked into that, I didn't see a ton of great
opportunities that would help anything except for upgrades from really old
versions.  Even then, I'm not sure if pipelining is worth it.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Corey Huinker

Date:

09 August 2024, 20:06:16

I'll admit I hadn't really considered pipelining, but I'm tempted to say
that it's probably not worth the complexity. Not only do most of the tasks
have only one step, but even tasks like the data types check are unlikely
to require more than a few queries for upgrades from supported versions.

Can you point me to a complex multi-step task that you think wouldn't work for pipelining? My skimming of the other patches all seemed to be one query with one result set to be processed by one callback.

Furthermore, most of the callbacks should do almost nothing for a given
upgrade, and since pg_upgrade runs on the server, client/server round-trip
time should be pretty low.

To my mind, that makes pipelining make more sense, you throw out N queries, most of which are trivial, and by the time you cycle back around and start digesting result sets via callbacks, more of the queries have finished because they were waiting on the query ahead of them in the pipeline, not waiting on a callback to finish consuming its assigned result set and then launching the next task query.

Perhaps pipelining would make more sense if we consolidated the tasks a bit
better, but when I last looked into that, I didn't see a ton of great
opportunities that would help anything except for upgrades from really old
versions. Even then, I'm not sure if pipelining is worth it.

I think you'd want to do the opposite of consolidating the tasks. If anything, you'd want to break them down in known single-query operations, and if the callback function for one of them happens to queue up a subsequent query (with subsequent callback) then so be it.

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

10 August 2024, 15:17:27

On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote:
>> I'll admit I hadn't really considered pipelining, but I'm tempted to say
>> that it's probably not worth the complexity.  Not only do most of the tasks
>> have only one step, but even tasks like the data types check are unlikely
>> to require more than a few queries for upgrades from supported versions.
> 
> Can you point me to a complex multi-step task that you think wouldn't work
> for pipelining? My skimming of the other patches all seemed to be one query
> with one result set to be processed by one callback.

I think it would work fine.  I'm just not sure it's worth it, especially
for tasks that run one exactly one query in each connection.

>> Furthermore, most of the callbacks should do almost nothing for a given
>> upgrade, and since pg_upgrade runs on the server, client/server round-trip
>> time should be pretty low.
> 
> To my mind, that makes pipelining make more sense, you throw out N queries,
> most of which are trivial, and by the time you cycle back around and start
> digesting result sets via callbacks, more of the queries have finished
> because they were waiting on the query ahead of them in the pipeline, not
> waiting on a callback to finish consuming its assigned result set and then
> launching the next task query.

My assumption is that the "waiting for a callback before launching the next
query" time will typically be pretty short in practice.  I could try
measuring it...

>> Perhaps pipelining would make more sense if we consolidated the tasks a bit
>> better, but when I last looked into that, I didn't see a ton of great
>> opportunities that would help anything except for upgrades from really old
>> versions.  Even then, I'm not sure if pipelining is worth it.
> 
> I think you'd want to do the opposite of consolidating the tasks. If
> anything, you'd want to break them down in known single-query operations,
> and if the callback function for one of them happens to queue up a
> subsequent query (with subsequent callback) then so be it.

By "consolidating," I mean combining tasks into fewer tasks with additional
steps.  This would allow us to reuse connections instead of creating N
connections for every single query.  If we used a task per query, I'd
expect pipelining to provide zero benefit.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

10 August 2024, 15:35:46

On Sat, Aug 10, 2024 at 10:17:27AM -0500, Nathan Bossart wrote:
> On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote:
>>> Furthermore, most of the callbacks should do almost nothing for a given
>>> upgrade, and since pg_upgrade runs on the server, client/server round-trip
>>> time should be pretty low.
>> 
>> To my mind, that makes pipelining make more sense, you throw out N queries,
>> most of which are trivial, and by the time you cycle back around and start
>> digesting result sets via callbacks, more of the queries have finished
>> because they were waiting on the query ahead of them in the pipeline, not
>> waiting on a callback to finish consuming its assigned result set and then
>> launching the next task query.
> 
> My assumption is that the "waiting for a callback before launching the next
> query" time will typically be pretty short in practice.  I could try
> measuring it...

Another option might be to combine all the queries for a task into a single
string and then send that in one PQsendQuery() call.  That may be a simpler
way to eliminate the time between queries.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Ilya Gladyshev

Date:

31 August 2024, 03:18:10

LGTM in general, but here are some final nitpicks:

+ if (maxFd != 0)

+ (void) select(maxFd + 1, &input_mask, &output_mask, &except_mask, NULL);

It’s a good idea to check for the return value of select, in case it returns any errors.

+ dbs_complete++;

+ (void) PQgetResult(slot->conn);

+ PQfinish(slot->conn);

Perhaps it’s rather for me to understand, what does PQgetResult call do here?

+ /* Check for connection failure. */

+ if (PQconnectPoll(slot->conn) == PGRES_POLLING_FAILED)

+ pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));

+ /* Check whether the connection is still establishing. */

+ if (PQconnectPoll(slot->conn) != PGRES_POLLING_OK)

+ return;

Are the two consecutive calls of PQconnectPoll intended here? Seems a bit wasteful, but maybe that’s ok.

We should probably mention this change in the docs as well, I found these two places that I think can be improved:

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml

index 9877f2f01c..ad7aa33f07 100644

--- a/doc/src/sgml/ref/pgupgrade.sgml

+++ b/doc/src/sgml/ref/pgupgrade.sgml

@@ -118,7 +118,7 @@ PostgreSQL documentation

<term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>

<term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>

- <listitem><para>number of simultaneous processes or threads to use

+ <listitem><para>number of simultaneous processes, threads or connections to use

</para></listitem>

</varlistentry>

@@ -587,8 +587,9 @@ NET STOP postgresql-&majorversion;

<para>

The <option>--jobs</option> option allows multiple CPU cores to be used

- for copying/linking of files and to dump and restore database schemas

- in parallel; a good place to start is the maximum of the number of

+ for copying/linking of files, to dump and restore database schemas in

+ parallel and to use multiple simultaneous connections for upgrade checks;

+ a good place to start is the maximum of the number of

CPU cores and tablespaces. This option can dramatically reduce the

time to upgrade a multi-database server running on a multiprocessor

machine.

28 авг. 2024 г., в 22:46, Nathan Bossart <nathandbossart@gmail.com> написал(а):

I spent some time cleaning up names, comments, etc. Barring additional
feedback, I'm planning to commit this stuff in the September commitfest so
that it has plenty of time to bake in the buildfarm.

--
nathan
<v10-0001-Introduce-framework-for-parallelizing-various-pg.patch><v10-0002-Use-pg_upgrade-s-new-parallel-framework-for-subs.patch><v10-0003-Use-pg_upgrade-s-new-parallel-framework-to-get-r.patch><v10-0004-Use-pg_upgrade-s-new-parallel-framework-to-get-l.patch><v10-0005-Use-pg_upgrade-s-new-parallel-framework-for-exte.patch><v10-0006-Use-pg_upgrade-s-new-parallel-framework-for-data.patch><v10-0007-Use-pg_upgrade-s-new-parallel-framework-for-isn-.patch><v10-0008-Use-pg_upgrade-s-new-parallel-framework-for-post.patch><v10-0009-Use-pg_upgrade-s-new-parallel-framework-for-poly.patch><v10-0010-Use-pg_upgrade-s-new-parallel-framework-for-WITH.patch><v10-0011-Use-pg_upgrade-s-parallel-framework-for-encoding.patch>

Re: optimizing pg_upgrade's once-in-each-database steps

From

Ilya Gladyshev

Date:

04 September 2024, 02:28:23

On 01.09.2024 22:05, Nathan Bossart wrote:

I think we can actually just use PQstatus() here.  But furthermore, I think
the way I was initiating connections was completely bogus.  IIUC before
calling PQconnectPoll() the first time, we should wait for a write
indicator from select(), and then we should only call PQconnectPoll() after
subsequent indicators from select().  After spending quite a bit of time
staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any
problems thus far.  If I had to guess, I'd say that either PQconnectPoll()
is more resilient than I think it is, or I've just gotten lucky because
pg_upgrade establishes connections quickly.

Good catch, I didn't look so closely at this.

Anyway, to fix this, I've added some more fields to the slot struct to
keep track of the information we need to properly establish connections,
and we now pay careful attention to the return value of select() so that we
know which slots are ready for processing.  This seemed like a nice little
optimization independent of fixing connection establishment.  I was worried
this was going to require a lot more code, but I think it only added ~50
lines or so.

The fix looks right to me, but I got confused by the skip_wait and this `if`:

+ if (PQstatus(slot->conn) != CONNECTION_OK)
+ return;

This branch checks connection status that hasn't been refreshed after the select. When we go back to wait_slots after this, PQconnectPoll will refresh the connection status and run select with skip_wait=true, I believe, we could simplify this by moving the PQconnectPoll back to the process_slots, so that we can process connection right after polling, if it's ready. Something like this:

diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index 3618dc08ff..73e987febf 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -237,6 +237,8 @@ process_query_result(const ClusterInfo *cluster, UpgradeTaskSlot *slot,
static void
process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTask *task)
{
+    PostgresPollingStatusType status;
+
     if (!slot->ready)
         return;

@@ -260,26 +262,26 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
             start_conn(cluster, slot);

             return;
-
         case CONNECTING:

-            /* Check for connection failure. */
-            if (PQstatus(slot->conn) == CONNECTION_BAD)
-                pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
-
-            /* Check whether the connection is still establishing. */
-            if (PQstatus(slot->conn) != CONNECTION_OK)
-                return;
-
-            /*
-             * Move on to running/processing the queries in the task.
-             */
-            slot->state = RUNNING_QUERIES;
-            if (!PQsendQuery(slot->conn, task->queries->data))
+            status = PQconnectPoll(slot->conn);
+            if (status == PGRES_POLLING_READING)
+                slot->select_mode = true;
+            else if (status == PGRES_POLLING_WRITING)
+                slot->select_mode = false;
+            else if (status == PGRES_POLLING_FAILED)
                 pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+            else
+            {
+                /*
+                 * Move on to running/processing the queries in the task.
+                 */
+                slot->state = RUNNING_QUERIES;
+                if (!PQsendQuery(slot->conn, task->queries->data))
+                    pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));

+            }
             return;
-
         case RUNNING_QUERIES:

             /*
@@ -370,8 +372,6 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)

     for (int i = 0; i < numslots; i++)
     {
-        PostgresPollingStatusType status;
-
         switch (slots[i].state)
         {
             case FREE:
@@ -386,33 +386,7 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
                 continue;

             case CONNECTING:
-
-                /*
-                 * Don't call PQconnectPoll() again for this slot until
-                 * select() tells us something is ready. Be sure to use the
-                 * previous poll mode in this case.
-                 */
-                if (!slots[i].ready)
-                    break;
-
-                /*
-                 * If we are waiting for the connection to establish, choose
-                 * whether to wait for reading or for writing on the socket as
-                 * appropriate. If neither apply, mark the slot as ready and
-                 * skip waiting so that it is handled ASAP (we assume this
-                 * means the connection is either bad or fully ready).
-                 */
-                status = PQconnectPoll(slots[i].conn);
-                if (status == PGRES_POLLING_READING)
-                    slots[i].select_mode = true;
-                else if (status == PGRES_POLLING_WRITING)
-                    slots[i].select_mode = false;
-                else
-                {
-                    slots[i].ready = true;
-                    skip_wait = true;
-                    continue;
-                }
+                /* All the slot metadata was already setup in process_slots() */

                 break;

--
2.43.0

skip_wait can be removed in this case as well.

This is up to you, I think the v12 is good and commitable in any case.

Re: optimizing pg_upgrade's once-in-each-database steps

From

Daniel Gustafsson

Date:

05 September 2024, 14:32:34

I've read and tested through the latest version of this patchset and I think
it's ready to go in.  The one concern I have is that tasks can exit(1) on libpq
errors tearing down perfectly functional connections without graceful shutdown.
Longer term I think it would make sense to add similar exit handler callbacks
to the ones in pg_dump for graceful cleanup of connections.  However, in order
to keep goalposts in clear view I don't think this patch need to have it, but
it would be good to consider once in.

Spotted a small typo in the comments:

+     * nothing to process.  This is primarily intended for the inital step in
s/inital/initial/

--
Daniel Gustafsson

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

09 September 2024, 22:17:17

On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote:
> I've read and tested through the latest version of this patchset and I think
> it's ready to go in.

Thanks for reviewing.  I'm aiming to commit it later this week.

> The one concern I have is that tasks can exit(1) on libpq
> errors tearing down perfectly functional connections without graceful shutdown.
> Longer term I think it would make sense to add similar exit handler callbacks
> to the ones in pg_dump for graceful cleanup of connections.  However, in order
> to keep goalposts in clear view I don't think this patch need to have it, but
> it would be good to consider once in.

This did cross my mind.  I haven't noticed any problems in my testing, and
it looks like there are several existing places in pg_upgrade that call
pg_fatal() with open connections, so I'm inclined to agree that this is a
nice follow-up task that needn't hold up this patch set.

> Spotted a small typo in the comments:
> 
> +     * nothing to process.  This is primarily intended for the inital step in
> s/inital/initial/

Will fix.

-- 
nathan

Re: optimizing pg_upgrade's once-in-each-database steps

From

Daniel Gustafsson

Date:

10 September 2024, 00:20:28

> On 9 Sep 2024, at 21:17, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote:
>> I've read and tested through the latest version of this patchset and I think
>> it's ready to go in.
>
> Thanks for reviewing.  I'm aiming to commit it later this week.

+1.  Looking forward to seeing what all the pg_dump/pg_upgrade changes amount
to in speed improvement when combined.

>> The one concern I have is that tasks can exit(1) on libpq
>> errors tearing down perfectly functional connections without graceful shutdown.
>> Longer term I think it would make sense to add similar exit handler callbacks
>> to the ones in pg_dump for graceful cleanup of connections.  However, in order
>> to keep goalposts in clear view I don't think this patch need to have it, but
>> it would be good to consider once in.
>
> This did cross my mind.  I haven't noticed any problems in my testing, and
> it looks like there are several existing places in pg_upgrade that call
> pg_fatal() with open connections, so I'm inclined to agree that this is a
> nice follow-up task that needn't hold up this patch set.

It could perhaps be a good introductory task for a new contributor who want a
fairly confined project to hack on.

--
Daniel Gustafsson

Re: optimizing pg_upgrade's once-in-each-database steps

From

Nathan Bossart

Date:

17 September 2024, 00:16:12

On Mon, Sep 09, 2024 at 11:20:28PM +0200, Daniel Gustafsson wrote:
>> On 9 Sep 2024, at 21:17, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> 
>> On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote:
>>> I've read and tested through the latest version of this patchset and I think
>>> it's ready to go in.
>> 
>> Thanks for reviewing.  I'm aiming to commit it later this week.
> 
> +1.  Looking forward to seeing what all the pg_dump/pg_upgrade changes amount
> to in speed improvement when combined.

Committed.

-- 
nathan