Thread: optimizing pg_upgrade's once-in-each-database steps
A number of pg_upgrade steps require connecting to each database and running a query. When there are many databases, these steps are particularly time-consuming, especially since this is done sequentially in a single process. At a quick glance, I see the following such steps: * create_logical_replication_slots * check_for_data_types_usage * check_for_isn_and_int8_passing_mismatch * check_for_user_defined_postfix_ops * check_for_incompatible_polymorphics * check_for_tables_with_oids * check_for_user_defined_encoding_conversions * check_old_cluster_subscription_state * get_loadable_libraries * get_db_rel_and_slot_infos * old_9_6_invalidate_hash_indexes * report_extension_updates I set out to parallelize these kinds of steps via multiple threads or processes, but I ended up realizing that we could likely achieve much of the same gain with libpq's asynchronous APIs. Specifically, both establishing the connections and running the queries can be done without blocking, so we can just loop over a handful of slots and advance a simple state machine for each. The attached is a proof-of-concept grade patch for doing this for get_db_rel_and_slot_infos(), which yielded the following results on my laptop for "pg_upgrade --link --sync-method=syncfs --jobs 8" for a cluster with 10K empty databases. total pg_upgrade_time: * HEAD: 14m 8s * patch: 10m 58s get_db_rel_and_slot_infos() on old cluster: * HEAD: 2m 45s * patch: 36s get_db_rel_and_slot_infos() on new cluster: * HEAD: 1m 46s * patch: 29s I am posting this early to get thoughts on the general approach. If we proceeded with this strategy, I'd probably create some generic tooling that each relevant step would provide a set of callback functions. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, 2024-05-16 at 16:16 -0500, Nathan Bossart wrote: > I am posting this early to get thoughts on the general approach. If > we > proceeded with this strategy, I'd probably create some generic > tooling that > each relevant step would provide a set of callback functions. The documentation states: "pg_dump -j uses multiple database connections; it connects to the database once with the leader process and once again for each worker job." That might need to be adjusted. How much complexity do you avoid by using async instead of multiple processes? Also, did you consider connecting once to each database and running many queries? Most of those seem like just checks. Regards, Jeff Davis
On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote: > How much complexity do you avoid by using async instead of multiple > processes? If we didn't want to use async, my guess is we'd want to use threads to avoid complicated IPC. And if we followed pgbench's example for using threads, it might end up at a comparable level of complexity, although I'd bet that threading would be the more complex of the two. It's hard to say definitively without coding it up both ways, which might be worth doing. > Also, did you consider connecting once to each database and running > many queries? Most of those seem like just checks. This was the idea behind 347758b. It may be possible to do more along these lines. IMO parallelizing will still be useful even if we do combine more of the steps. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
I figured I'd post what I have so far since this thread hasn't been updated in a while. The attached patches are still "proof-of-concept grade," but they are at least moving in the right direction (IMHO). The variable naming is still not great, and they are woefully undercommented, among other things. 0001 introduces a new API for registering callbacks and running them in parallel on all databases in the cluster. This new system manages a set of "slots" that follow a simple state machine to asynchronously establish a connection and run the queries. It uses system() to wait for these asynchronous tasks to complete. Users of this API only need to provide two callbacks: one to return the query that should be run on each database and another to process the results of that query. If multiple queries are required for each database, users can provide multiple sets of callbacks. The other patches change several of the existing tasks to use this new API. With these patches applied, I see the following differences in the output of 'pg_upgrade | ts -i' for a cluster with 1k empty databases: WITHOUT PATCH 00:00:19 Checking database user is the install user ok 00:00:02 Checking for subscription state ok 00:00:06 Adding ".old" suffix to old global/pg_control ok 00:00:04 Checking for extension updates ok WITH PATCHES (--jobs 1) 00:00:10 Checking database user is the install user ok 00:00:02 Checking for subscription state ok 00:00:07 Adding ".old" suffix to old global/pg_control ok 00:00:05 Checking for extension updates ok WITH PATCHES (--jobs 4) 00:00:06 Checking database user is the install user ok 00:00:00 Checking for subscription state ok 00:00:02 Adding ".old" suffix to old global/pg_control ok 00:00:01 Checking for extension updates ok Note that the "Checking database user is the install user" time also includes the call to get_db_rel_and_slot_infos() on the old cluster as well as the call to get_loadable_libraries() on the old cluster. I believe the improvement with the patches with just one job is due to the consolidation of the queries into one database connection (presently, get_db_rel_and_slot_infos() creates 3 connections per database for some upgrades). Similarly, the "Adding \".old\" suffix to old global/pg_control" time includes the call to get_db_rel_and_slot_infos() on the new cluster. There are several remaining places where we could use this new API to speed up upgrades. For example, I haven't attempted to use it for the data type checks yet, and that tends to eat up a sizable chunk of time when there are many databases. On Thu, May 16, 2024 at 08:24:08PM -0500, Nathan Bossart wrote: > On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote: >> Also, did you consider connecting once to each database and running >> many queries? Most of those seem like just checks. > > This was the idea behind 347758b. It may be possible to do more along > these lines. IMO parallelizing will still be useful even if we do combine > more of the steps. My current thinking is that any possible further consolidation should happen as part of a follow-up effort to parallelization. I'm cautiously optimistic that the parallelization work will make the consolidation easier since it moves things to rigidly-defined callback functions. A separate piece of off-list feedback from Michael Paquier is that this new parallel system might be something we can teach the ParallelSlot code used by bin/scripts/ to do. I've yet to look too deeply into this, but I suspect that it will be difficult to combine the two. For example, the ParallelSlot system doesn't seem well-suited for the kind of run-once-in-each-database tasks required by pg_upgrade, and the error handling is probably little different, too. However, it's still worth a closer look, and I'm interested in folks' opinions on the subject. -- nathan
Attachment
- v2-0001-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v2-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v2-0003-move-live_check-variable-to-user_opts.patch
- v2-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v2-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v2-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > 0001 introduces a new API for registering callbacks and running them in > parallel on all databases in the cluster. This new system manages a set of > "slots" that follow a simple state machine to asynchronously establish a > connection and run the queries. It uses system() to wait for these > asynchronous tasks to complete. Users of this API only need to provide two > callbacks: one to return the query that should be run on each database and > another to process the results of that query. If multiple queries are > required for each database, users can provide multiple sets of callbacks. I do really like the idea of using asynchronous communication here. It should be significantly cheaper than using multiple processes or threads, and maybe less code, too. But I'm confused about why you would need or want to use system() to wait for asynchronous tasks to complete. Wouldn't it be something like select()? -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jul 01, 2024 at 03:58:16PM -0400, Robert Haas wrote: > On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> 0001 introduces a new API for registering callbacks and running them in >> parallel on all databases in the cluster. This new system manages a set of >> "slots" that follow a simple state machine to asynchronously establish a >> connection and run the queries. It uses system() to wait for these >> asynchronous tasks to complete. Users of this API only need to provide two >> callbacks: one to return the query that should be run on each database and >> another to process the results of that query. If multiple queries are >> required for each database, users can provide multiple sets of callbacks. > > I do really like the idea of using asynchronous communication here. It > should be significantly cheaper than using multiple processes or > threads, and maybe less code, too. But I'm confused about why you > would need or want to use system() to wait for asynchronous tasks to > complete. Wouldn't it be something like select()? Whoops, I meant to type "select()" there. Sorry for the confusion. -- nathan
As I mentioned elsewhere [0], here's a first attempt at parallelizing the data type checks. I was worried that I might have to refactor Daniel's work in commit 347758b quite significantly, but I was able to avoid that by using a set of generic callbacks and providing each task step an index to the data_types_usage_checks array. [0] https://postgr.es/m/Zov5kHbEyDMuHJI_%40nathan -- nathan
Attachment
- v3-0001-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v3-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v3-0003-move-live_check-variable-to-user_opts.patch
- v3-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v3-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v3-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
- v3-0007-parallelize-data-type-checks-in-pg_upgrade.patch
I finished parallelizing everything in pg_upgrade I could easily parallelize with the proposed async task API. There are a few remaining places where I could not easily convert to the new API for whatever reason. AFAICT those remaining places are either not showing up prominently in my tests, or they only affect versions that are unsupported or will soon be unsupported when v18 is released. Similarly, I noticed many of the checks follow a very similar coding pattern, but most (if not all) only affect older versions, so I'm unsure whether it's worth the trouble to try consolidating them into one scan through all the databases. The code is still very rough and nowhere near committable, but this at least gets the patch set into the editing phase. -- nathan
Attachment
- v4-0001-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v4-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v4-0003-move-live_check-variable-to-user_opts.patch
- v4-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v4-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v4-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
- v4-0007-parallelize-data-type-checks-in-pg_upgrade.patch
- v4-0008-parallelize-isn-and-int8-passing-mismatch-check-i.patch
- v4-0009-parallelize-user-defined-postfix-ops-check-in-pg_.patch
- v4-0010-parallelize-incompatible-polymorphics-check-in-pg.patch
- v4-0011-parallelize-tables-with-oids-check-in-pg_upgrade.patch
- v4-0012-parallelize-user-defined-encoding-conversions-che.patch
> On 9 Jul 2024, at 05:33, Nathan Bossart <nathandbossart@gmail.com> wrote: > The code is still very rough and nowhere near committable, but this at > least gets the patch set into the editing phase. First reaction after a read-through is that this seems really cool, can't wait to see how much v18 pg_upgrade will be over v17. I will do more testing and review once back from vacation, below are some comments from reading which is all I had time for at this point: +static void +conn_failure(PGconn *conn) +{ + pg_log(PG_REPORT, "%s", PQerrorMessage(conn)); + printf(_("Failure, exiting\n")); + exit(1); +} Any particular reason this isn't using pg_fatal()? +static void +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, .... + pg_free(query); +} A minor point, perhaps fueled by me not having played around much with this patchset. It seems a bit odd that dispatch_query is responsible for freeing the query from the get_query callback. I would have expected the output from AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free. +static void +sub_process(DbInfo *dbinfo, PGresult *res, void *arg) +{ .... + fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\" schema:\"%s\"relation:\"%s\"\n", + PQgetvalue(res, i, 0), + dbinfo->db_name, + PQgetvalue(res, i, 1), With the query being in a separate place in the code from the processing it takes a bit of jumping around to resolve the columns in PQgetvalue calls like this. Using PQfnumber() calls and descriptive names would make this easier. I know this case is copying old behavior, but the function splits make it less useful than before. + char *query = pg_malloc(QUERY_ALLOC); Should we convert this to a PQExpBuffer? -- Daniel Gustafsson
On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: > First reaction after a read-through is that this seems really cool, can't wait > to see how much v18 pg_upgrade will be over v17. I will do more testing and > review once back from vacation, below are some comments from reading which is > all I had time for at this point: Thanks for taking a look! > +static void > +conn_failure(PGconn *conn) > +{ > + pg_log(PG_REPORT, "%s", PQerrorMessage(conn)); > + printf(_("Failure, exiting\n")); > + exit(1); > +} > > Any particular reason this isn't using pg_fatal()? IIRC this was to match the error in connectToServer(). We could probably move to pg_fatal(). > +static void > +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, > .... > + pg_free(query); > +} > > A minor point, perhaps fueled by me not having played around much with this > patchset. It seems a bit odd that dispatch_query is responsible for freeing > the query from the get_query callback. I would have expected the output from > AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free. I don't see any problem with doing it the way you suggest. Tangentially related, I waffled a bit on whether to require the query callbacks to put the result in allocated memory. Some queries are the same no matter what, and some require customization at runtime. As you can see, I ended up just requiring allocated memory. That makes the code a tad simpler, and I doubt the extra work is noticeable. > +static void > +sub_process(DbInfo *dbinfo, PGresult *res, void *arg) > +{ > .... > + fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\" schema:\"%s\"relation:\"%s\"\n", > + PQgetvalue(res, i, 0), > + dbinfo->db_name, > + PQgetvalue(res, i, 1), > > With the query being in a separate place in the code from the processing it > takes a bit of jumping around to resolve the columns in PQgetvalue calls like > this. Using PQfnumber() calls and descriptive names would make this easier. I > know this case is copying old behavior, but the function splits make it less > useful than before. Good point. > + char *query = pg_malloc(QUERY_ALLOC); > > Should we convert this to a PQExpBuffer? Seems like a good idea. I think I was trying to change as few lines as possible for my proof-of-concept. :) -- nathan
> On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote: > On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: >> +static void >> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, >> .... >> + pg_free(query); >> +} >> >> A minor point, perhaps fueled by me not having played around much with this >> patchset. It seems a bit odd that dispatch_query is responsible for freeing >> the query from the get_query callback. I would have expected the output from >> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free. > > I don't see any problem with doing it the way you suggest. > > Tangentially related, I waffled a bit on whether to require the query > callbacks to put the result in allocated memory. Some queries are the same > no matter what, and some require customization at runtime. As you can see, > I ended up just requiring allocated memory. That makes the code a tad > simpler, and I doubt the extra work is noticeable. I absolutely agree with this. >> + char *query = pg_malloc(QUERY_ALLOC); >> >> Should we convert this to a PQExpBuffer? > > Seems like a good idea. I think I was trying to change as few lines as > possible for my proof-of-concept. :) Yeah, that's a good approach, I just noticed it while reading the hunks. We can do that separately from this patchset. In order to trim down the size of the patchset I think going ahead with 0003 independently of this makes sense, it has value by itself. -- Daniel Gustafsson
On Thu, Jul 18, 2024 at 09:57:23AM +0200, Daniel Gustafsson wrote: >> On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote: >> On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: > >>> +static void >>> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, >>> .... >>> + pg_free(query); >>> +} >>> >>> A minor point, perhaps fueled by me not having played around much with this >>> patchset. It seems a bit odd that dispatch_query is responsible for freeing >>> the query from the get_query callback. I would have expected the output from >>> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free. >> >> I don't see any problem with doing it the way you suggest. Actually, I do see a problem. If we do it this way, we'll have to store a string per database somewhere, which seems unnecessary. However, while looking into this, I noticed that only one get_query callback (get_db_subscription_count()) actually customizes the generated query using information in the provided DbInfo. AFAICT we can do this particular step without running a query in each database, as I mentioned elsewhere [0]. That should speed things up a bit and allow us to simplify the AsyncTask code. With that, if we are willing to assume that a given get_query callback will generate the same string for all databases (and I think we should), we can run the callback once and save the string in the step for dispatch_query() to use. This would look more like what you suggested in the quoted text. [0] https://postgr.es/m/ZprQJv_TxccN3tkr%40nathan -- nathan
On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote: > However, while looking into this, I noticed that only one get_query > callback (get_db_subscription_count()) actually customizes the generated > query using information in the provided DbInfo. AFAICT we can do this > particular step without running a query in each database, as I mentioned > elsewhere [0]. That should speed things up a bit and allow us to simplify > the AsyncTask code. > > With that, if we are willing to assume that a given get_query callback will > generate the same string for all databases (and I think we should), we can > run the callback once and save the string in the step for dispatch_query() > to use. This would look more like what you suggested in the quoted text. Here is a new patch set. I've included the latest revision of the patch to fix get_db_subscription_count() from the other thread [0] as 0001 since I expect that to be committed soon. I've also moved the patch that moves the "live_check" variable to "user_opts" to 0002 since I plan on committing that sooner than later, too. Otherwise, I've tried to address all feedback provided thus far. [0] https://commitfest.postgresql.org/49/5135/ -- nathan
Attachment
- v5-0013-parallelize-user-defined-encoding-conversions-che.patch
- v5-0001-pg_upgrade-retrieve-subscription-count-more-effic.patch
- v5-0002-move-live_check-variable-to-user_opts.patch
- v5-0003-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v5-0004-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v5-0005-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v5-0006-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v5-0007-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
- v5-0008-parallelize-data-type-checks-in-pg_upgrade.patch
- v5-0009-parallelize-isn-and-int8-passing-mismatch-check-i.patch
- v5-0010-parallelize-user-defined-postfix-ops-check-in-pg_.patch
- v5-0011-parallelize-incompatible-polymorphics-check-in-pg.patch
- v5-0012-parallelize-tables-with-oids-check-in-pg_upgrade.patch
On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote: > Here is a new patch set. I've included the latest revision of the patch to > fix get_db_subscription_count() from the other thread [0] as 0001 since I > expect that to be committed soon. I've also moved the patch that moves the > "live_check" variable to "user_opts" to 0002 since I plan on committing > that sooner than later, too. Otherwise, I've tried to address all feedback > provided thus far. Here is just the live_check patch. I rebased it, gave it a commit message, and fixed a silly mistake. Barring objections or cfbot indigestion, I plan to commit this within the next couple of days. -- nathan
Attachment
> On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote: >> Here is a new patch set. I've included the latest revision of the patch to >> fix get_db_subscription_count() from the other thread [0] as 0001 since I >> expect that to be committed soon. I've also moved the patch that moves the >> "live_check" variable to "user_opts" to 0002 since I plan on committing >> that sooner than later, too. Otherwise, I've tried to address all feedback >> provided thus far. > > Here is just the live_check patch. I rebased it, gave it a commit message, > and fixed a silly mistake. Barring objections or cfbot indigestion, I plan > to commit this within the next couple of days. LGTM -- Daniel Gustafsson
On Thu, Jul 25, 2024 at 11:42:55AM +0200, Daniel Gustafsson wrote: >> On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote: >> Here is just the live_check patch. I rebased it, gave it a commit message, >> and fixed a silly mistake. Barring objections or cfbot indigestion, I plan >> to commit this within the next couple of days. > > LGTM Thanks, committed. -- nathan
On 22.07.2024 21:07, Nathan Bossart wrote: > On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote: >> However, while looking into this, I noticed that only one get_query >> callback (get_db_subscription_count()) actually customizes the generated >> query using information in the provided DbInfo. AFAICT we can do this >> particular step without running a query in each database, as I mentioned >> elsewhere [0]. That should speed things up a bit and allow us to simplify >> the AsyncTask code. >> >> With that, if we are willing to assume that a given get_query callback will >> generate the same string for all databases (and I think we should), we can >> run the callback once and save the string in the step for dispatch_query() >> to use. This would look more like what you suggested in the quoted text. > Here is a new patch set. I've included the latest revision of the patch to > fix get_db_subscription_count() from the other thread [0] as 0001 since I > expect that to be committed soon. I've also moved the patch that moves the > "live_check" variable to "user_opts" to 0002 since I plan on committing > that sooner than later, too. Otherwise, I've tried to address all feedback > provided thus far. > > [0] https://commitfest.postgresql.org/49/5135/ > Hi, I like your idea of parallelizing these checks with async libpq API, thanks for working on it. The patch doesn't apply cleanly on master anymore, but I've rebased locally and taken it for a quick spin with a pg16 instance of 1000 empty databases. Didn't see any regressions with -j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks). One thing that I noticed that could be improved is we could start a new connection right away after having run all query callbacks for the current connection in process_slot, instead of just returning and establishing the new connection only on the next iteration of the loop in async_task_run after potentially sleeping on select. +1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps that's a separate story. Regards, Ilya
On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote: > I like your idea of parallelizing these checks with async libpq API, thanks > for working on it. The patch doesn't apply cleanly on master anymore, but > I've rebased locally and taken it for a quick spin with a pg16 instance of > 1000 empty databases. Didn't see any regressions with -j 1, there's some > speedup with -j 8 (33 sec vs 8 sec for these checks). Thanks for taking a look. I'm hoping to do a round of polishing before posting a rebased patch set soon. > One thing that I noticed that could be improved is we could start a new > connection right away after having run all query callbacks for the current > connection in process_slot, instead of just returning and establishing the > new connection only on the next iteration of the loop in async_task_run > after potentially sleeping on select. Yeah, we could just recursively call process_slot() right after freeing the slot. That'd at least allow us to avoid the spinning behavior as we run out of databases to process, if nothing else. > +1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps > that's a separate story. When I skimmed through the various tasks, I didn't see a ton of opportunities for further consolidation, or at least opportunities that would help for upgrades from supported versions. The data type checks are already consolidated, for example. -- nathan
On Thu, Aug 01, 2024 at 12:44:35PM -0500, Nathan Bossart wrote: > On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote: >> I like your idea of parallelizing these checks with async libpq API, thanks >> for working on it. The patch doesn't apply cleanly on master anymore, but >> I've rebased locally and taken it for a quick spin with a pg16 instance of >> 1000 empty databases. Didn't see any regressions with -j 1, there's some >> speedup with -j 8 (33 sec vs 8 sec for these checks). > > Thanks for taking a look. I'm hoping to do a round of polishing before > posting a rebased patch set soon. > >> One thing that I noticed that could be improved is we could start a new >> connection right away after having run all query callbacks for the current >> connection in process_slot, instead of just returning and establishing the >> new connection only on the next iteration of the loop in async_task_run >> after potentially sleeping on select. > > Yeah, we could just recursively call process_slot() right after freeing the > slot. That'd at least allow us to avoid the spinning behavior as we run > out of databases to process, if nothing else. Here is a new patch set. Besides rebasing, I've added the recursive call to process_slot() mentioned in the quoted text, and I've added quite a bit of commentary to async.c. -- nathan
Attachment
- v7-0001-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v7-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v7-0003-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v7-0004-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v7-0005-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
- v7-0006-parallelize-data-type-checks-in-pg_upgrade.patch
- v7-0007-parallelize-isn-and-int8-passing-mismatch-check-i.patch
- v7-0008-parallelize-user-defined-postfix-ops-check-in-pg_.patch
- v7-0009-parallelize-incompatible-polymorphics-check-in-pg.patch
- v7-0010-parallelize-tables-with-oids-check-in-pg_upgrade.patch
- v7-0011-parallelize-user-defined-encoding-conversions-che.patch
On 01.08.2024 22:41, Nathan Bossart wrote: > Here is a new patch set. Besides rebasing, I've added the recursive call > to process_slot() mentioned in the quoted text, and I've added quite a bit > of commentary to async.c. That's much better now, thanks! Here's my code review, note that I haven't tested the patches yet: +void +async_task_add_step(AsyncTask *task, + AsyncTaskGetQueryCB query_cb, + AsyncTaskProcessCB process_cb, bool free_result, + void *arg) Is there any reason to have query as a callback function instead of char *? From what I see right now, it doesn't give any extra flexibility, as the query has to be static anyway (can't be customized on a per-database basis) and will be created once before all the callbacks are run. While passing in char * makes the API simpler, excludes any potential error of making the query dependent on the current database and removes the unnecessary malloc/free of the static strings. +static void +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, + const AsyncTask *task) +{ + ... + if (!PQsendQuery(slot->conn, cbs->query)) + conn_failure(slot->conn); +} This will print "connection failure: connection pointer is NULL", which I don't think makes a lot of sense to the end user. I'd prefer something like pg_fatal("failed to allocate a new connection"). if (found) - pg_fatal("Data type checks failed: %s", report.data); + { + pg_fatal("Data type checks failed: %s", data_type_check_report.data); + termPQExpBuffer(&data_type_check_report); + } `found` should be removed and replaced with `data_type_check_failed`, as it's not set anymore. Also the termPQExpBuffer after pg_fatal looks unnecessary. +static bool *data_type_check_results; +static bool data_type_check_failed; +static PQExpBufferData data_type_check_report; IMO, it would be nicer to have these as a local state, that's passed in as an arg* to the AsyncTaskProcessCB, which aligns with how the other checks do it. -- End of review -- Regarding keeping the connections, the way I envisioned it entailed passing a list of connections from one check to the next one (or keeping a global state with connections?). I didn't concretely look at the code to verify this, so it's just an abstract idea.
On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote: > -- End of review -- Thanks for the review. I've attempted to address all your feedback in v8 of the patch set. I think the names could still use some work, but I wanted to get the main structure in place before trying to fix them. > Regarding keeping the connections, the way I envisioned it entailed passing > a list of connections from one check to the next one (or keeping a global > state with connections?). I didn't concretely look at the code to verify > this, so it's just an abstract idea. My main concern with doing something like that is it could require maintaining a huge number of connections when there are many databases. GUCs like max_connections would need to be set accordingly. I'm a little dubious that the gains would be enough to justify it. -- nathan
Attachment
- v8-0001-introduce-framework-for-parallelizing-pg_upgrade-.patch
- v8-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patch
- v8-0003-use-new-pg_upgrade-async-API-for-retrieving-relin.patch
- v8-0004-use-new-pg_upgrade-async-API-to-parallelize-getti.patch
- v8-0005-use-new-pg_upgrade-async-API-to-parallelize-repor.patch
- v8-0006-parallelize-data-type-checks-in-pg_upgrade.patch
- v8-0007-parallelize-isn-and-int8-passing-mismatch-check-i.patch
- v8-0008-parallelize-user-defined-postfix-ops-check-in-pg_.patch
- v8-0009-parallelize-incompatible-polymorphics-check-in-pg.patch
- v8-0010-parallelize-tables-with-oids-check-in-pg_upgrade.patch
- v8-0011-parallelize-user-defined-encoding-conversions-che.patch
On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote:
> -- End of review --
Thanks for the review. I've attempted to address all your feedback in v8
of the patch set. I think the names could still use some work, but I
wanted to get the main structure in place before trying to fix them.
I think the underlying mechanism is basically solid, but I have one question: isn't this the ideal case for using libpq pipelining? That would allow subsequent tasks to launch while the main loop slowly gets around to clearing off completed tasks on some other connection.
On Thu, Aug 08, 2024 at 06:18:38PM -0400, Corey Huinker wrote: > I think the underlying mechanism is basically solid, but I have one > question: isn't this the ideal case for using libpq pipelining? That would > allow subsequent tasks to launch while the main loop slowly gets around to > clearing off completed tasks on some other connection. I'll admit I hadn't really considered pipelining, but I'm tempted to say that it's probably not worth the complexity. Not only do most of the tasks have only one step, but even tasks like the data types check are unlikely to require more than a few queries for upgrades from supported versions. Furthermore, most of the callbacks should do almost nothing for a given upgrade, and since pg_upgrade runs on the server, client/server round-trip time should be pretty low. Perhaps pipelining would make more sense if we consolidated the tasks a bit better, but when I last looked into that, I didn't see a ton of great opportunities that would help anything except for upgrades from really old versions. Even then, I'm not sure if pipelining is worth it. -- nathan
I'll admit I hadn't really considered pipelining, but I'm tempted to say
that it's probably not worth the complexity. Not only do most of the tasks
have only one step, but even tasks like the data types check are unlikely
to require more than a few queries for upgrades from supported versions.
Furthermore, most of the callbacks should do almost nothing for a given
upgrade, and since pg_upgrade runs on the server, client/server round-trip
time should be pretty low.
Perhaps pipelining would make more sense if we consolidated the tasks a bit
better, but when I last looked into that, I didn't see a ton of great
opportunities that would help anything except for upgrades from really old
versions. Even then, I'm not sure if pipelining is worth it.
On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote: >> I'll admit I hadn't really considered pipelining, but I'm tempted to say >> that it's probably not worth the complexity. Not only do most of the tasks >> have only one step, but even tasks like the data types check are unlikely >> to require more than a few queries for upgrades from supported versions. > > Can you point me to a complex multi-step task that you think wouldn't work > for pipelining? My skimming of the other patches all seemed to be one query > with one result set to be processed by one callback. I think it would work fine. I'm just not sure it's worth it, especially for tasks that run one exactly one query in each connection. >> Furthermore, most of the callbacks should do almost nothing for a given >> upgrade, and since pg_upgrade runs on the server, client/server round-trip >> time should be pretty low. > > To my mind, that makes pipelining make more sense, you throw out N queries, > most of which are trivial, and by the time you cycle back around and start > digesting result sets via callbacks, more of the queries have finished > because they were waiting on the query ahead of them in the pipeline, not > waiting on a callback to finish consuming its assigned result set and then > launching the next task query. My assumption is that the "waiting for a callback before launching the next query" time will typically be pretty short in practice. I could try measuring it... >> Perhaps pipelining would make more sense if we consolidated the tasks a bit >> better, but when I last looked into that, I didn't see a ton of great >> opportunities that would help anything except for upgrades from really old >> versions. Even then, I'm not sure if pipelining is worth it. > > I think you'd want to do the opposite of consolidating the tasks. If > anything, you'd want to break them down in known single-query operations, > and if the callback function for one of them happens to queue up a > subsequent query (with subsequent callback) then so be it. By "consolidating," I mean combining tasks into fewer tasks with additional steps. This would allow us to reuse connections instead of creating N connections for every single query. If we used a task per query, I'd expect pipelining to provide zero benefit. -- nathan
On Sat, Aug 10, 2024 at 10:17:27AM -0500, Nathan Bossart wrote: > On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote: >>> Furthermore, most of the callbacks should do almost nothing for a given >>> upgrade, and since pg_upgrade runs on the server, client/server round-trip >>> time should be pretty low. >> >> To my mind, that makes pipelining make more sense, you throw out N queries, >> most of which are trivial, and by the time you cycle back around and start >> digesting result sets via callbacks, more of the queries have finished >> because they were waiting on the query ahead of them in the pipeline, not >> waiting on a callback to finish consuming its assigned result set and then >> launching the next task query. > > My assumption is that the "waiting for a callback before launching the next > query" time will typically be pretty short in practice. I could try > measuring it... Another option might be to combine all the queries for a task into a single string and then send that in one PQsendQuery() call. That may be a simpler way to eliminate the time between queries. -- nathan
28 авг. 2024 г., в 22:46, Nathan Bossart <nathandbossart@gmail.com> написал(а):I spent some time cleaning up names, comments, etc. Barring additional
feedback, I'm planning to commit this stuff in the September commitfest so
that it has plenty of time to bake in the buildfarm.
--
nathan
<v10-0001-Introduce-framework-for-parallelizing-various-pg.patch><v10-0002-Use-pg_upgrade-s-new-parallel-framework-for-subs.patch><v10-0003-Use-pg_upgrade-s-new-parallel-framework-to-get-r.patch><v10-0004-Use-pg_upgrade-s-new-parallel-framework-to-get-l.patch><v10-0005-Use-pg_upgrade-s-new-parallel-framework-for-exte.patch><v10-0006-Use-pg_upgrade-s-new-parallel-framework-for-data.patch><v10-0007-Use-pg_upgrade-s-new-parallel-framework-for-isn-.patch><v10-0008-Use-pg_upgrade-s-new-parallel-framework-for-post.patch><v10-0009-Use-pg_upgrade-s-new-parallel-framework-for-poly.patch><v10-0010-Use-pg_upgrade-s-new-parallel-framework-for-WITH.patch><v10-0011-Use-pg_upgrade-s-parallel-framework-for-encoding.patch>
Good catch, I didn't look so closely at this.I think we can actually just use PQstatus() here. But furthermore, I think the way I was initiating connections was completely bogus. IIUC before calling PQconnectPoll() the first time, we should wait for a write indicator from select(), and then we should only call PQconnectPoll() after subsequent indicators from select(). After spending quite a bit of time staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any problems thus far. If I had to guess, I'd say that either PQconnectPoll() is more resilient than I think it is, or I've just gotten lucky because pg_upgrade establishes connections quickly.
The fix looks right to me, but I got confused by the skip_wait and this `if`:Anyway, to fix this, I've added some more fields to the slot struct to keep track of the information we need to properly establish connections, and we now pay careful attention to the return value of select() so that we know which slots are ready for processing. This seemed like a nice little optimization independent of fixing connection establishment. I was worried this was going to require a lot more code, but I think it only added ~50 lines or so.
+ if (PQstatus(slot->conn) != CONNECTION_OK)
+ return;
This branch checks connection status that hasn't been refreshed after the select. When we go back to wait_slots after this, PQconnectPoll will refresh the connection status and run select with skip_wait=true, I believe, we could simplify this by moving the PQconnectPoll back to the process_slots, so that we can process connection right after polling, if it's ready. Something like this:
diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index 3618dc08ff..73e987febf 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -237,6 +237,8 @@ process_query_result(const ClusterInfo *cluster, UpgradeTaskSlot *slot,
static void
process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTask *task)
{
+ PostgresPollingStatusType status;
+
if (!slot->ready)
return;
@@ -260,26 +262,26 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
start_conn(cluster, slot);
return;
-
case CONNECTING:
- /* Check for connection failure. */
- if (PQstatus(slot->conn) == CONNECTION_BAD)
- pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
-
- /* Check whether the connection is still establishing. */
- if (PQstatus(slot->conn) != CONNECTION_OK)
- return;
-
- /*
- * Move on to running/processing the queries in the task.
- */
- slot->state = RUNNING_QUERIES;
- if (!PQsendQuery(slot->conn, task->queries->data))
+ status = PQconnectPoll(slot->conn);
+ if (status == PGRES_POLLING_READING)
+ slot->select_mode = true;
+ else if (status == PGRES_POLLING_WRITING)
+ slot->select_mode = false;
+ else if (status == PGRES_POLLING_FAILED)
pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+ else
+ {
+ /*
+ * Move on to running/processing the queries in the task.
+ */
+ slot->state = RUNNING_QUERIES;
+ if (!PQsendQuery(slot->conn, task->queries->data))
+ pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+ }
return;
-
case RUNNING_QUERIES:
/*
@@ -370,8 +372,6 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
for (int i = 0; i < numslots; i++)
{
- PostgresPollingStatusType status;
-
switch (slots[i].state)
{
case FREE:
@@ -386,33 +386,7 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
continue;
case CONNECTING:
-
- /*
- * Don't call PQconnectPoll() again for this slot until
- * select() tells us something is ready. Be sure to use the
- * previous poll mode in this case.
- */
- if (!slots[i].ready)
- break;
-
- /*
- * If we are waiting for the connection to establish, choose
- * whether to wait for reading or for writing on the socket as
- * appropriate. If neither apply, mark the slot as ready and
- * skip waiting so that it is handled ASAP (we assume this
- * means the connection is either bad or fully ready).
- */
- status = PQconnectPoll(slots[i].conn);
- if (status == PGRES_POLLING_READING)
- slots[i].select_mode = true;
- else if (status == PGRES_POLLING_WRITING)
- slots[i].select_mode = false;
- else
- {
- slots[i].ready = true;
- skip_wait = true;
- continue;
- }
+ /* All the slot metadata was already setup in process_slots() */
break;
--
2.43.0
skip_wait can be removed in this case as well.
This is up to you, I think the v12 is good and commitable in any case.
I've read and tested through the latest version of this patchset and I think it's ready to go in. The one concern I have is that tasks can exit(1) on libpq errors tearing down perfectly functional connections without graceful shutdown. Longer term I think it would make sense to add similar exit handler callbacks to the ones in pg_dump for graceful cleanup of connections. However, in order to keep goalposts in clear view I don't think this patch need to have it, but it would be good to consider once in. Spotted a small typo in the comments: + * nothing to process. This is primarily intended for the inital step in s/inital/initial/ -- Daniel Gustafsson
On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: > I've read and tested through the latest version of this patchset and I think > it's ready to go in. Thanks for reviewing. I'm aiming to commit it later this week. > The one concern I have is that tasks can exit(1) on libpq > errors tearing down perfectly functional connections without graceful shutdown. > Longer term I think it would make sense to add similar exit handler callbacks > to the ones in pg_dump for graceful cleanup of connections. However, in order > to keep goalposts in clear view I don't think this patch need to have it, but > it would be good to consider once in. This did cross my mind. I haven't noticed any problems in my testing, and it looks like there are several existing places in pg_upgrade that call pg_fatal() with open connections, so I'm inclined to agree that this is a nice follow-up task that needn't hold up this patch set. > Spotted a small typo in the comments: > > + * nothing to process. This is primarily intended for the inital step in > s/inital/initial/ Will fix. -- nathan
> On 9 Sep 2024, at 21:17, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: >> I've read and tested through the latest version of this patchset and I think >> it's ready to go in. > > Thanks for reviewing. I'm aiming to commit it later this week. +1. Looking forward to seeing what all the pg_dump/pg_upgrade changes amount to in speed improvement when combined. >> The one concern I have is that tasks can exit(1) on libpq >> errors tearing down perfectly functional connections without graceful shutdown. >> Longer term I think it would make sense to add similar exit handler callbacks >> to the ones in pg_dump for graceful cleanup of connections. However, in order >> to keep goalposts in clear view I don't think this patch need to have it, but >> it would be good to consider once in. > > This did cross my mind. I haven't noticed any problems in my testing, and > it looks like there are several existing places in pg_upgrade that call > pg_fatal() with open connections, so I'm inclined to agree that this is a > nice follow-up task that needn't hold up this patch set. It could perhaps be a good introductory task for a new contributor who want a fairly confined project to hack on. -- Daniel Gustafsson
On Mon, Sep 09, 2024 at 11:20:28PM +0200, Daniel Gustafsson wrote: >> On 9 Sep 2024, at 21:17, Nathan Bossart <nathandbossart@gmail.com> wrote: >> >> On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: >>> I've read and tested through the latest version of this patchset and I think >>> it's ready to go in. >> >> Thanks for reviewing. I'm aiming to commit it later this week. > > +1. Looking forward to seeing what all the pg_dump/pg_upgrade changes amount > to in speed improvement when combined. Committed. -- nathan