Thread: Add parallelism and glibc dependent only options to reindexdb
Hi, With the glibc 2.28 coming, all users will have to reindex almost every indexes after a glibc upgrade to guarantee the lack of corruption. Unfortunately, reindexdb is not ideal for that as it's processing everything using a single connexion and isn't able to discard indexes that doesn't depend on a glibc collation. PFA a patchset to add parallelism to reindexdb (reusing the infrastructure in vacuumdb with some additions) and an option to discard indexes that doesn't depend on glibc (without any specific collation filtering or glibc version detection), with updated regression tests. Note that this should be applied on top of the existing reindexdb cleanup & refactoring patch (https://commitfest.postgresql.org/23/2115/). This was sponsored by VMware, and has been discussed internally with Kevin and Michael, in Cc.
Attachment
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > With the glibc 2.28 coming, all users will have to reindex almost > every indexes after a glibc upgrade to guarantee the lack of > corruption. Unfortunately, reindexdb is not ideal for that as it's > processing everything using a single connexion and isn't able to > discard indexes that doesn't depend on a glibc collation. We have seen that with a database of up to 100GB we finish by cutting the reindex time from 30 minutes to a couple of minutes with a schema we work on. Julien, what were the actual numbers? > PFA a patchset to add parallelism to reindexdb (reusing the > infrastructure in vacuumdb with some additions) and an option to > discard indexes that doesn't depend on glibc (without any specific > collation filtering or glibc version detection), with updated > regression tests. Note that this should be applied on top of the > existing reindexdb cleanup & refactoring patch > (https://commitfest.postgresql.org/23/2115/). Please note that patch 0003 does not seem to apply correctly on HEAD as of c74d49d4. Here is also a small description of each patch: - 0001 refactors the connection slot facility from vacuumdb.c into a new, separate file called parallel.c in src/bin/scripts/. This is not really fancy as some code only moves around. - 0002 adds an extra option for simple lists to be able to use pointers, with an interface to append elements in it. - 0003 begins to be the actual fancy thing with the addition of a --jobs option into reindexdb. The main issue here which should be discussed is that when it comes to reindex of tables, you basically are not going to have any conflicts between the objects manipulated. However if you wish to do a reindex on a set of indexes then things get more tricky as it is necessary to list items per-table so as multiple connections do not conflict with each other if attempting to work on multiple indexes of the same table. What this patch does is to select the set of indexes which need to be worked on (see the addition of cell in ParallelSlot), and then does a kind of pre-planning of each item into the connection slots so as each connection knows from the beginning which items it needs to process. This is quite different from vacuumdb where a new item is distributed only on a free connection from a unique list. I'd personally prefer if we keep the facility in parallel.c so as it is only execution-dependent and that we have no pre-planning. This would require keeping within reindexdb.c an array of lists, with one list corresponding to one connection instead which feels more natural. - 0004 is the part where the concurrent additions really matter as this consists in applying an extra filter to the indexes selected so as only the glibc-sensitive indexes are chosen for the processing. -- Michael
Attachment
On Mon, Jul 1, 2019 at 10:55 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > > With the glibc 2.28 coming, all users will have to reindex almost > > every indexes after a glibc upgrade to guarantee the lack of > > corruption. Unfortunately, reindexdb is not ideal for that as it's > > processing everything using a single connexion and isn't able to > > discard indexes that doesn't depend on a glibc collation. > > We have seen that with a database of up to 100GB we finish by cutting > the reindex time from 30 minutes to a couple of minutes with a schema > we work on. Julien, what were the actual numbers? I did my benchmarking using a quite ideal database, having a large number of tables and various set of indexes, for a 75 GB total size. This was done on my laptop which has 6 multithreaded cores (and crappy IO), also keeping the default max_parallel_maintenance_worker = 2. A naive reindexdb took approximately 33 minutes. Filtering the list of indexes took that down to slightly less than 15 min, but of course each database will have a different ratio there. Then, keeping the --glibc-dependent and using different level of parallelism: -j1: ~ 14:50 -j3: ~ 7:30 -j6: ~ 5:23 -j8: ~ 4:45 That's pretty much the kind of results I was expecting given the hardware I used. > > PFA a patchset to add parallelism to reindexdb (reusing the > > infrastructure in vacuumdb with some additions) and an option to > > discard indexes that doesn't depend on glibc (without any specific > > collation filtering or glibc version detection), with updated > > regression tests. Note that this should be applied on top of the > > existing reindexdb cleanup & refactoring patch > > (https://commitfest.postgresql.org/23/2115/). > > Please note that patch 0003 does not seem to apply correctly on HEAD > as of c74d49d4. Yes, this is because this patchset has to be applied on top of the reindexdb refactoring patch mentioned. It's sad that we don't have a good way to deal with that kind of dependency, as it's also breaking Thomas' cfbot :( > - 0003 begins to be the actual fancy thing with the addition of a > --jobs option into reindexdb. The main issue here which should be > discussed is that when it comes to reindex of tables, you basically > are not going to have any conflicts between the objects manipulated. > However if you wish to do a reindex on a set of indexes then things > get more tricky as it is necessary to list items per-table so as > multiple connections do not conflict with each other if attempting to > work on multiple indexes of the same table. What this patch does is > to select the set of indexes which need to be worked on (see the > addition of cell in ParallelSlot), and then does a kind of > pre-planning of each item into the connection slots so as each > connection knows from the beginning which items it needs to process. > This is quite different from vacuumdb where a new item is distributed > only on a free connection from a unique list. I'd personally prefer > if we keep the facility in parallel.c so as it is only > execution-dependent and that we have no pre-planning. This would > require keeping within reindexdb.c an array of lists, with one list > corresponding to one connection instead which feels more natural. My fear here is that this approach would add some extra complexity, especially requiring to deal with free connection handling both in GetIdleSlot() and the main reindexdb loop. Also, the pre-planning allows us to start processing the biggest tables first, which optimises the overall runtime.
Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to gain more popularity. Please don't reuse a file name as generic as "parallel.c" -- it's annoying when navigating source. Maybe conn_parallel.c multiconn.c connscripts.c admconnection.c ...? If your server crashes or is stopped midway during the reindex, you would have to start again from scratch, and it's tedious (if it's possible at all) to determine which indexes were missed. I think it would be useful to have a two-phase mode: in the initial phase reindexdb computes the list of indexes to be reindexed and saves them into a work table somewhere. In the second phase, it reads indexes from that table and processes them, marking them as done in the work table. If the second phase crashes or is stopped, it can be restarted and consults the work table. I would keep the work table, as it provides a bit of an audit trail. It may be important to be able to run even if unable to create such a work table (because of the <ironic>numerous</> users that DROP DATABASE postgres). Maybe we'd have two flags in the work table for each index: "reindex requested", "reindex done". The "glibc filter" thing (which I take to mean "indexes that depend on collations") would apply to the first phase: it just skips adding other indexes to the work table. I suppose ICU collations are not affected, so the filter would be for glibc collations only? The --glibc-dependent switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can add other rules later. (Not "--exclude=foo" because we'll want to add the possibility to ignore specific indexes by name.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Michael Paquier <michael@paquier.xyz> writes: > - 0003 begins to be the actual fancy thing with the addition of a > --jobs option into reindexdb. The main issue here which should be > discussed is that when it comes to reindex of tables, you basically > are not going to have any conflicts between the objects manipulated. > However if you wish to do a reindex on a set of indexes then things > get more tricky as it is necessary to list items per-table so as > multiple connections do not conflict with each other if attempting to > work on multiple indexes of the same table. What this patch does is > to select the set of indexes which need to be worked on (see the > addition of cell in ParallelSlot), and then does a kind of > pre-planning of each item into the connection slots so as each > connection knows from the beginning which items it needs to process. > This is quite different from vacuumdb where a new item is distributed > only on a free connection from a unique list. I'd personally prefer > if we keep the facility in parallel.c so as it is only > execution-dependent and that we have no pre-planning. This would > require keeping within reindexdb.c an array of lists, with one list > corresponding to one connection instead which feels more natural. Couldn't we make this enormously simpler and less bug-prone by just dictating that --jobs applies only to reindex-table operations? > - 0004 is the part where the concurrent additions really matter as > this consists in applying an extra filter to the indexes selected so > as only the glibc-sensitive indexes are chosen for the processing. I think you'd be better off to define and document this as "reindex only collation-sensitive indexes", without any particular reference to a reason why somebody might want to do that. regards, tom lane
On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Please don't reuse a file name as generic as "parallel.c" -- it's > annoying when navigating source. Maybe conn_parallel.c multiconn.c > connscripts.c admconnection.c ...? I could use scripts_parallel.[ch] as I've already used it in the #define part? > If your server crashes or is stopped midway during the reindex, you > would have to start again from scratch, and it's tedious (if it's > possible at all) to determine which indexes were missed. I think it > would be useful to have a two-phase mode: in the initial phase reindexdb > computes the list of indexes to be reindexed and saves them into a work > table somewhere. In the second phase, it reads indexes from that table > and processes them, marking them as done in the work table. If the > second phase crashes or is stopped, it can be restarted and consults the > work table. I would keep the work table, as it provides a bit of an > audit trail. It may be important to be able to run even if unable to > create such a work table (because of the <ironic>numerous</> users that > DROP DATABASE postgres). Or we could create a table locally in each database, that would fix this problem and probably make the code simpler? It also raises some additional concerns about data expiration. I guess that someone could launch the tool by mistake, kill reindexdb, and run it again 2 months later while a lot of new objects have been added for instance. > The "glibc filter" thing (which I take to mean "indexes that depend on > collations") would apply to the first phase: it just skips adding other > indexes to the work table. I suppose ICU collations are not affected, > so the filter would be for glibc collations only? Indeed, ICU shouldn't need such a filter. xxx_pattern_ops based indexes are also excluded. > The --glibc-dependent > switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can > add other rules later. (Not "--exclude=foo" because we'll want to add > the possibility to ignore specific indexes by name.) That's a good point, I like the --exclude-rule switch.
On Mon, Jul 1, 2019 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > - 0003 begins to be the actual fancy thing with the addition of a > > --jobs option into reindexdb. The main issue here which should be > > discussed is that when it comes to reindex of tables, you basically > > are not going to have any conflicts between the objects manipulated. > > However if you wish to do a reindex on a set of indexes then things > > get more tricky as it is necessary to list items per-table so as > > multiple connections do not conflict with each other if attempting to > > work on multiple indexes of the same table. What this patch does is > > to select the set of indexes which need to be worked on (see the > > addition of cell in ParallelSlot), and then does a kind of > > pre-planning of each item into the connection slots so as each > > connection knows from the beginning which items it needs to process. > > This is quite different from vacuumdb where a new item is distributed > > only on a free connection from a unique list. I'd personally prefer > > if we keep the facility in parallel.c so as it is only > > execution-dependent and that we have no pre-planning. This would > > require keeping within reindexdb.c an array of lists, with one list > > corresponding to one connection instead which feels more natural. > > Couldn't we make this enormously simpler and less bug-prone by just > dictating that --jobs applies only to reindex-table operations? That would also mean that we'll have to fallback on doing reindex at table-level, even if we only want to reindex indexes that depends on glibc. I'm afraid that this will often add a huge penalty. > > - 0004 is the part where the concurrent additions really matter as > > this consists in applying an extra filter to the indexes selected so > > as only the glibc-sensitive indexes are chosen for the processing. > > I think you'd be better off to define and document this as "reindex > only collation-sensitive indexes", without any particular reference > to a reason why somebody might want to do that. We should still document that indexes based on ICU would be exluded? I also realize that I totally forgot to update reindexdb.sgml. Sorry about that, I'll fix with the next versions.
Julien Rouhaud wrote: > > I think you'd be better off to define and document this as "reindex > > only collation-sensitive indexes", without any particular reference > > to a reason why somebody might want to do that. > > We should still document that indexes based on ICU would be exluded? But why exclude them? As a data point, in the last 5 years, the en_US collation in ICU had 7 different versions (across 11 major versions of ICU): ICU Unicode en_US 54.1 7.0 137.56 55.1 7.0 153.56 56.1 8.0 153.64 57.1 8.0 153.64 58.2 9.0 153.72 59.1 9.0 153.72 60.2 10.0 153.80 61.1 10.0 153.80 62.1 11.0 153.88 63.2 11.0 153.88 64.2 12.1 153.97 The rightmost column corresponds to pg_collation.collversion in Postgres. Each time there's a new Unicode version, it seems all collation versions are bumped. And there's a new Unicode version pretty much every year these days. Based on this, most ICU upgrades in practice would require reindexing affected indexes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On 2019-Jul-01, Daniel Verite wrote: > But why exclude them? > As a data point, in the last 5 years, the en_US collation in ICU > had 7 different versions (across 11 major versions of ICU): So we need a switch --include-rule=icu-collations? (I mentioned "--exclude-rule=glibc" elsewhere in the thread, but I think it should be --include-rule=glibc-collations instead, no?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 1, 2019 at 10:13 PM Daniel Verite <daniel@manitou-mail.org> wrote: > > > > I think you'd be better off to define and document this as "reindex > > > only collation-sensitive indexes", without any particular reference > > > to a reason why somebody might want to do that. > > > > We should still document that indexes based on ICU would be exluded? > > But why exclude them? > As a data point, in the last 5 years, the en_US collation in ICU > had 7 different versions (across 11 major versions of ICU): > > ICU Unicode en_US > > 54.1 7.0 137.56 > 55.1 7.0 153.56 > 56.1 8.0 153.64 > 57.1 8.0 153.64 > 58.2 9.0 153.72 > 59.1 9.0 153.72 > 60.2 10.0 153.80 > 61.1 10.0 153.80 > 62.1 11.0 153.88 > 63.2 11.0 153.88 > 64.2 12.1 153.97 > > The rightmost column corresponds to pg_collation.collversion > in Postgres. > Each time there's a new Unicode version, it seems > all collation versions are bumped. And there's a new Unicode > version pretty much every year these days. > Based on this, most ICU upgrades in practice would require reindexing > affected indexes. I have a vague recollection that ICU was providing some backward compatibility so that even if you upgrade your lib you can still get the sort order that was active when you built your indexes, though maybe for a limited number of versions. Even if that's just me being delusional, I'd still prefer Alvaro's approach to have distinct switches for each collation system.
On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > I have a vague recollection that ICU was providing some backward > compatibility so that even if you upgrade your lib you can still get > the sort order that was active when you built your indexes, though > maybe for a limited number of versions. That isn't built in. Another database system that uses ICU handles this by linking to multiple versions of ICU, each with its own UCA version and associated collations. I don't think that we want to go there, so it makes sense to make an upgrade that crosses ICU or glibc versions as painless as possible. Note that ICU does at least provide a standard way to use multiple versions at once; the symbol names have the ICU version baked in. You're actually calling the functions using the versioned symbol names without realizing it, because there is macro trickery involved. -- Peter Geoghegan
On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > Even if that's just me being delusional, I'd still prefer Alvaro's > approach to have distinct switches for each collation system. Hi Julien, Makes sense. But why use the name "glibc" in the code and user interface? The name of the collation provider in PostgreSQL is "libc" (for example in the CREATE COLLATION command), and the problem applies no matter who makes your libc. -- Thomas Munro https://enterprisedb.com
On 2019-Jul-02, Thomas Munro wrote: > On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Even if that's just me being delusional, I'd still prefer Alvaro's > > approach to have distinct switches for each collation system. > > Hi Julien, > > Makes sense. But why use the name "glibc" in the code and user > interface? The name of the collation provider in PostgreSQL is "libc" > (for example in the CREATE COLLATION command), and the problem applies > no matter who makes your libc. Makes sense. "If your libc is glibc and you go across an upgrade over version X, please use --include-rule=libc-collation" -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 01, 2019 at 06:28:13PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Couldn't we make this enormously simpler and less bug-prone by just >> dictating that --jobs applies only to reindex-table operations? I had the same argument about the first patch sets actually, but... :) > That would also mean that we'll have to fallback on doing reindex at > table-level, even if we only want to reindex indexes that depends on > glibc. I'm afraid that this will often add a huge penalty. Yes, I would expect that most of the time glibc-sensible indexes are also mixed with other ones which we don't care about here. One advantage of the argument from Tom though is that it is possible to introduce --jobs with minimal steps: 1) Refactor the code for connection slots, without the cell addition 2) Introduce --jobs without INDEX support. In short, the conflict business between indexes is something which could be tackled afterwards and with a separate patch. Parallel indexes at table-level has value in itself, particularly with CONCURRENTLY coming in the picture. -- Michael
Attachment
On Mon, Jul 01, 2019 at 06:14:20PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > Please don't reuse a file name as generic as "parallel.c" -- it's > > annoying when navigating source. Maybe conn_parallel.c multiconn.c > > connscripts.c admconnection.c ...? > > I could use scripts_parallel.[ch] as I've already used it in the > #define part? multiconn.c sounds rather good, but I have a poor ear for any kind of naming.. >> If your server crashes or is stopped midway during the reindex, you >> would have to start again from scratch, and it's tedious (if it's >> possible at all) to determine which indexes were missed. I think it >> would be useful to have a two-phase mode: in the initial phase reindexdb >> computes the list of indexes to be reindexed and saves them into a work >> table somewhere. In the second phase, it reads indexes from that table >> and processes them, marking them as done in the work table. If the >> second phase crashes or is stopped, it can be restarted and consults the >> work table. I would keep the work table, as it provides a bit of an >> audit trail. It may be important to be able to run even if unable to >> create such a work table (because of the <ironic>numerous</> users that >> DROP DATABASE postgres). > > Or we could create a table locally in each database, that would fix > this problem and probably make the code simpler? > > It also raises some additional concerns about data expiration. I > guess that someone could launch the tool by mistake, kill reindexdb, > and run it again 2 months later while a lot of new objects have been > added for instance. This looks like fancy additions, still that's not the core of the problem, no? If you begin to play in this area you would need more control options, basically a "continue" mode to be able to restart a previously failed attempt, and a "reinit" mode able to restart the operation completely from scratch, and perhaps even a "reset" mode which cleans up any data already present. Not really a complexity, but this has to be maintained a database level. >> The --glibc-dependent >> switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can >> add other rules later. (Not "--exclude=foo" because we'll want to add >> the possibility to ignore specific indexes by name.) > > That's a good point, I like the --exclude-rule switch. Sounds kind of nice. -- Michael
Attachment
On 2019-07-01 22:46, Alvaro Herrera wrote: > On 2019-Jul-02, Thomas Munro wrote: >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote: >>> Even if that's just me being delusional, I'd still prefer Alvaro's >>> approach to have distinct switches for each collation system. >> >> Makes sense. But why use the name "glibc" in the code and user >> interface? The name of the collation provider in PostgreSQL is "libc" >> (for example in the CREATE COLLATION command), and the problem applies >> no matter who makes your libc. > > Makes sense. "If your libc is glibc and you go across an upgrade over > version X, please use --include-rule=libc-collation" I think it might be better to put the logic of what indexes are collation affected etc. into the backend REINDEX command. We are likely to enhance the collation version and dependency tracking over time, possibly soon, possibly multiple times, and it would be very cumbersome to have to keep updating reindexdb with this. Moreover, since for performance you likely want to reindex by table, implementing a logic of "reindex all collation-affected indexes on this table" would be much easier to do in the backend. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 1, 2019 at 11:21 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > I have a vague recollection that ICU was providing some backward > > compatibility so that even if you upgrade your lib you can still get > > the sort order that was active when you built your indexes, though > > maybe for a limited number of versions. > > That isn't built in. Another database system that uses ICU handles > this by linking to multiple versions of ICU, each with its own UCA > version and associated collations. I don't think that we want to go > there, so it makes sense to make an upgrade that crosses ICU or glibc > versions as painless as possible. > > Note that ICU does at least provide a standard way to use multiple > versions at once; the symbol names have the ICU version baked in. > You're actually calling the functions using the versioned symbol names > without realizing it, because there is macro trickery involved. Ah, thanks for the clarification!
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: >Hi, > >With the glibc 2.28 coming, all users will have to reindex almost >every indexes after a glibc upgrade to guarantee the lack of >corruption. Unfortunately, reindexdb is not ideal for that as it's >processing everything using a single connexion and isn't able to >discard indexes that doesn't depend on a glibc collation. > >PFA a patchset to add parallelism to reindexdb (reusing the >infrastructure in vacuumdb with some additions) and an option to >discard indexes that doesn't depend on glibc (without any specific >collation filtering or glibc version detection), with updated >regression tests. Note that this should be applied on top of the >existing reindexdb cleanup & refactoring patch >(https://commitfest.postgresql.org/23/2115/). > >This was sponsored by VMware, and has been discussed internally with >Kevin and Michael, in Cc. I wonder why this is necessary: pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); What's the reasoning behind that? It seems like a valid use case to me - imagine you have a bug database, but only a couple of tables are used by the application regularly (the rest may be archive tables, for example). Why not to allow rebuilding glibc-dependent indexes on the used tables, so that the database can be opened for users sooner. BTW now that we allow rebuilding only some of the indexes, it'd be great to have a dry-run mode, were we just print which indexes will be rebuilt without actually rebuilding them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 2, 2019 at 9:19 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-07-01 22:46, Alvaro Herrera wrote: > > On 2019-Jul-02, Thomas Munro wrote: > >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > >>> Even if that's just me being delusional, I'd still prefer Alvaro's > >>> approach to have distinct switches for each collation system. > >> > >> Makes sense. But why use the name "glibc" in the code and user > >> interface? The name of the collation provider in PostgreSQL is "libc" > >> (for example in the CREATE COLLATION command), and the problem applies > >> no matter who makes your libc. > > > > Makes sense. "If your libc is glibc and you go across an upgrade over > > version X, please use --include-rule=libc-collation" > > I think it might be better to put the logic of what indexes are > collation affected etc. into the backend REINDEX command. We are likely > to enhance the collation version and dependency tracking over time, > possibly soon, possibly multiple times, and it would be very cumbersome > to have to keep updating reindexdb with this. Moreover, since for > performance you likely want to reindex by table, implementing a logic of > "reindex all collation-affected indexes on this table" would be much > easier to do in the backend. That's a great idea, and would make the parallelism in reindexdb much simpler. There's however a downside, as users won't have a way to benefit from index filtering until they upgrade to this version. OTOH glibc 2.28 is already there, and a hypothetical fancy reindexdb is far from being released.
On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > I wonder why this is necessary: > > pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); > > What's the reasoning behind that? It seems like a valid use case to me - > imagine you have a bug database, but only a couple of tables are used by > the application regularly (the rest may be archive tables, for example). > Why not to allow rebuilding glibc-dependent indexes on the used tables, so > that the database can be opened for users sooner. It just seemed wrong to me to allow a partial processing for something that's aimed to prevent corruption. I'd think that if users are knowledgeable enough to only reindex a subset of indexes/tables in such cases, they can also discard indexes that don't get affected by a collation lib upgrade. I'm not strongly opposed to supporting if though, as there indeed can be valid use cases. > BTW now that we allow rebuilding only some of the indexes, it'd be great > to have a dry-run mode, were we just print which indexes will be rebuilt > without actually rebuilding them. +1. If we end up doing the filter in the backend, we'd have to add such option in the REINDEX command, and actually issue all the orders to retrieve the list.
On Tue, Jul 02, 2019 at 10:45:44AM +0200, Julien Rouhaud wrote: >On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> I wonder why this is necessary: >> >> pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); >> >> What's the reasoning behind that? It seems like a valid use case to me - >> imagine you have a bug database, but only a couple of tables are used by >> the application regularly (the rest may be archive tables, for example). >> Why not to allow rebuilding glibc-dependent indexes on the used tables, so >> that the database can be opened for users sooner. > >It just seemed wrong to me to allow a partial processing for something >that's aimed to prevent corruption. I'd think that if users are >knowledgeable enough to only reindex a subset of indexes/tables in >such cases, they can also discard indexes that don't get affected by a >collation lib upgrade. I'm not strongly opposed to supporting if >though, as there indeed can be valid use cases. > I don't know, it just seems like an unnecessary limitation. >> BTW now that we allow rebuilding only some of the indexes, it'd be great >> to have a dry-run mode, were we just print which indexes will be rebuilt >> without actually rebuilding them. > >+1. If we end up doing the filter in the backend, we'd have to add >such option in the REINDEX command, and actually issue all the orders >to retrieve the list. Hmmm, yeah. FWIW I'm not requesting v0 to have that feature, but it'd be good to design the feature in a way that allows adding it later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-07-02 10:30, Julien Rouhaud wrote: > That's a great idea, and would make the parallelism in reindexdb much > simpler. There's however a downside, as users won't have a way to > benefit from index filtering until they upgrade to this version. OTOH > glibc 2.28 is already there, and a hypothetical fancy reindexdb is far > from being released. Isn't that also the case for your proposal? We are not going to release a new reindexdb before a new REINDEX. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-07-02 10:45, Julien Rouhaud wrote: > It just seemed wrong to me to allow a partial processing for something > that's aimed to prevent corruption. I'd think that if users are > knowledgeable enough to only reindex a subset of indexes/tables in > such cases, they can also discard indexes that don't get affected by a > collation lib upgrade. I'm not strongly opposed to supporting if > though, as there indeed can be valid use cases. We are moving in this direction. Thomas Munro has proposed an approach for tracking collation versions on a per-object level rather than per-database. So then we'd need a way to reindex not those indexes affected by collation but only those affected by collation and not yet fixed. One could also imagine a behavior where not-yet-fixed indexes are simply ignored by the planner. So the gradual upgrading approach that Tomas described is absolutely a possibility. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-07-02 10:30, Julien Rouhaud wrote: > > That's a great idea, and would make the parallelism in reindexdb much > > simpler. There's however a downside, as users won't have a way to > > benefit from index filtering until they upgrade to this version. OTOH > > glibc 2.28 is already there, and a hypothetical fancy reindexdb is far > > from being released. > > Isn't that also the case for your proposal? We are not going to release > a new reindexdb before a new REINDEX. Sure, but my point was that once the new reindexdb is released (or if you're so desperate, using a nightly build or compiling your own), it can be used against any previous major version. There is probably a large fraction of users who don't perform a postgres upgrade when they upgrade their OS, so that's IMHO also something to consider.
On Fri, Jul 05, 2019 at 07:25:41PM +0200, Julien Rouhaud wrote: > On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> Isn't that also the case for your proposal? We are not going to release >> a new reindexdb before a new REINDEX. > > Sure, but my point was that once the new reindexdb is released (or if > you're so desperate, using a nightly build or compiling your own), it > can be used against any previous major version. There is probably a > large fraction of users who don't perform a postgres upgrade when they > upgrade their OS, so that's IMHO also something to consider. I think that we need to think long-term here and be confident in the fact we will still see breakages with collations and glibc, using a solution that we think is the right API. Peter's idea to make the backend-aware command of the filtering is cool. On top of that, there is no need to add any conflict logic in reindexdb and we can live with restricting --jobs support for non-index objects. -- Michael
Attachment
On Mon, Jul 8, 2019 at 9:57 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 05, 2019 at 07:25:41PM +0200, Julien Rouhaud wrote: > > On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > >> Isn't that also the case for your proposal? We are not going to release > >> a new reindexdb before a new REINDEX. > > > > Sure, but my point was that once the new reindexdb is released (or if > > you're so desperate, using a nightly build or compiling your own), it > > can be used against any previous major version. There is probably a > > large fraction of users who don't perform a postgres upgrade when they > > upgrade their OS, so that's IMHO also something to consider. > > I think that we need to think long-term here and be confident in the > fact we will still see breakages with collations and glibc, using a > solution that we think is the right API. Peter's idea to make the > backend-aware command of the filtering is cool. On top of that, there > is no need to add any conflict logic in reindexdb and we can live with > restricting --jobs support for non-index objects. Don't get me wrong, I do agree that implementing filtering in the backend is a better design. What's bothering me is that I also agree that there will be more glibc breakage, and if that happens within a few years, a lot of people will still be using pg12- version, and they still won't have an efficient way to rebuild their indexes. Now, it'd be easy to publish an external tools that does a simple parallel-and-glic-filtering reindex tool that will serve that purpose for the few years it'll be needed, so everyone can be happy. For now, I'll resubmit the parallel patch using per-table only approach, and will submit the filtering in the backend using a new REINDEX option in a different thread.
On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > I'll resubmit the parallel patch using per-table only > approach Attached.
Attachment
On Mon, Jul 1, 2019 at 09:51:12AM -0400, Alvaro Herrera wrote: > Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to > gain more popularity. > > Please don't reuse a file name as generic as "parallel.c" -- it's > annoying when navigating source. Maybe conn_parallel.c multiconn.c > connscripts.c admconnection.c ...? > > If your server crashes or is stopped midway during the reindex, you > would have to start again from scratch, and it's tedious (if it's > possible at all) to determine which indexes were missed. I think it > would be useful to have a two-phase mode: in the initial phase reindexdb > computes the list of indexes to be reindexed and saves them into a work > table somewhere. In the second phase, it reads indexes from that table > and processes them, marking them as done in the work table. If the > second phase crashes or is stopped, it can be restarted and consults the > work table. I would keep the work table, as it provides a bit of an > audit trail. It may be important to be able to run even if unable to > create such a work table (because of the <ironic>numerous</> users that > DROP DATABASE postgres). > > Maybe we'd have two flags in the work table for each index: > "reindex requested", "reindex done". I think we have a similar issue with adding checksums, so let's address with a generic framework and use it for all cases, like vacuumdb too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote: > On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote: >> I'll resubmit the parallel patch using per-table only >> approach > > Attached. I have done a lookup of this patch set with a focus on the refactoring part, and the split is a bit confusing. +void +DisconnectDatabase(ParallelSlot *slot) +{ + char errbuf[256]; common.c has already an API to connect to a database. It would be more natural to move the disconnection part also to common.c and have the caller of DisconnectDatabase reset the slot connection by itself? disconnectDatabase() (lower case for the first character) would make the set more consistent. We could also have a wrapper like say DiscardSlot() which does this work, but that seems like an overkill for a single slot if one API could do the cleanup of the full set. $ git grep select_loop scripts_parallel.c: /* We must reconstruct the fd_set for each call to select_loop */ scripts_parallel.c: i = select_loop(maxFd, &slotset, &aborting); scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool *aborting) scripts_parallel.h:extern int select_loop(int maxFd, fd_set *workerset, bool *aborting); select_loop is only present in scripts_parallel.c, so it can remain static. + slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons); + init_slot(slots, conn); + if (parallel) + { + for (i = 1; i < concurrentCons; i++) + { + conn = connectDatabase(dbname, host, port, username, prompt_password, + progname, echo, false, true); + init_slot(slots + i, conn); + } + } This comes from 0002 and could be more refactored as vacuumdb does the same thing. Based on 0001, init_slot() is called now in vacuumdb.c and initializes a set of slots while connecting to a given database. In short, in input we have a set of parameters and the ask to open connections with N slots, and the return result is an pg_malloc'd array of slots ready to be used. We could just call that ParallelSlotInit() (if you have a better name feel free). + /* + * Get the connection slot to use. If in parallel mode, here we wait + * for one connection to become available if none already is. In + * non-parallel mode we simply use the only slot we have, which we + * know to be free. + */ + if (parallel) This business also is duplicated in both reindexdb.c and vacuumdb.c. +bool +GetQueryResult(PGconn *conn, const char *progname) +{ This also does not stick with the parallel stuff, as that's basically only getting a query result. We could stick that into common.c. Patch 2 has no documentation. The option as well as the restrictions in place need to be documented properly. Here is a small idea about the set of routines we could have for the parallel stuff, with only three of them needed to work on the parallel slots and get free connections: - Initialization of the full slot set. - Cleanup and disconnection of the slots. - Fetch an idle connection and wait for one until available. -- Michael
Attachment
Thanks for the review. On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote: > > On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > >> I'll resubmit the parallel patch using per-table only > >> approach > > > > Attached. > > I have done a lookup of this patch set with a focus on the refactoring > part, and the split is a bit confusing. Yes, that wasn't a smart split :( > +void > +DisconnectDatabase(ParallelSlot *slot) > +{ > + char errbuf[256]; > common.c has already an API to connect to a database. It would be > more natural to move the disconnection part also to common.c and have > the caller of DisconnectDatabase reset the slot connection by itself? Ok. > $ git grep select_loop > scripts_parallel.c: /* We must reconstruct the fd_set for each > call to select_loop */ > scripts_parallel.c: i = select_loop(maxFd, &slotset, &aborting); > scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool > *aborting) > scripts_parallel.h:extern int select_loop(int maxFd, fd_set > *workerset, bool *aborting); > > select_loop is only present in scripts_parallel.c, so it can remain > static. Good point. > + slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * > concurrentCons); > + init_slot(slots, conn); > + if (parallel) > + { > + for (i = 1; i < concurrentCons; i++) > + { > + conn = connectDatabase(dbname, host, port, > username, prompt_password, > + > progname, echo, false, true); > + init_slot(slots + i, conn); > + } > + } > > This comes from 0002 and could be more refactored as vacuumdb does the > same thing. Based on 0001, init_slot() is called now in vacuumdb.c > and initializes a set of slots while connecting to a given database. > In short, in input we have a set of parameters and the ask to open > connections with N slots, and the return result is an pg_malloc'd > array of slots ready to be used. We could just call that > ParallelSlotInit() (if you have a better name feel free). Given how the rest of the functions are named, I'll probably use InitParallelSlots(). > > + /* > + * Get the connection slot to use. If in parallel mode, here we wait > + * for one connection to become available if none already is. In > + * non-parallel mode we simply use the only slot we have, which we > + * know to be free. > + */ > + if (parallel) > This business also is duplicated in both reindexdb.c and vacuumdb.c. > > +bool > +GetQueryResult(PGconn *conn, const char *progname) > +{ > This also does not stick with the parallel stuff, as that's basically > only getting a query result. We could stick that into common.c. This function also has a bad name, as it's discarding the result via ProcessQueryResult. Maybe we should rename them to GetQuerySuccess() and ConsumeAndTrashQueryResult()? > Patch 2 has no documentation. The option as well as the restrictions > in place need to be documented properly. I forgot that I had forgotten to add documentation :( will fix this time.
On 2019-07-08 21:08, Julien Rouhaud wrote: > Don't get me wrong, I do agree that implementing filtering in the > backend is a better design. What's bothering me is that I also agree > that there will be more glibc breakage, and if that happens within a > few years, a lot of people will still be using pg12- version, and they > still won't have an efficient way to rebuild their indexes. Now, it'd > be easy to publish an external tools that does a simple > parallel-and-glic-filtering reindex tool that will serve that purpose > for the few years it'll be needed, so everyone can be happy. You can already do that: Run a query through psql to get a list of affected tables or indexes and feed those to reindexdb using -i or -t options. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 9, 2019 at 9:52 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > I have done a lookup of this patch set with a focus on the refactoring > > part, and the split is a bit confusing. > [...] I finished to do a better refactoring, and ended up with this API in scripts_parallel: extern ParallelSlot *ConsumeIdleSlot(ParallelSlot *slots, int numslots, const char *progname); extern ParallelSlot *SetupParallelSlots(const char *dbname, const char *host, const char *port, const char *username, bool prompt_password, const char *progname, bool echo, PGconn *conn, int numslots); extern bool WaitForSlotsCompletion(ParallelSlot *slots, int numslots, const char *progname); ConsumeIdleSlot() being a wrapper on top of (now static) GetIdleSlot, which handles parallelism and possible failure. Attached v3, including updated documentation for the new -j option.
Attachment
On Tue, Jul 09, 2019 at 01:09:38PM +0200, Peter Eisentraut wrote: > You can already do that: Run a query through psql to get a list of > affected tables or indexes and feed those to reindexdb using -i or -t > options. Sure, but that's limited if one can only afford a limited amount of downtime for an upgrade window and you still need to handle properly the index-level conflicts when doing the processing in parallel. -- Michael
Attachment
On 2019-Jul-09, Julien Rouhaud wrote: > I finished to do a better refactoring, and ended up with this API in > scripts_parallel: Looking good! I'm not sure about the "Consume" word in ConsumeIdleSlot; maybe "Reserve"? "Obtain"? "Get"? Code commentary: I think the comment that sits atop the function should describe what the function does without getting too much in how it does it. For example in ConsumeIdleSlot you have "If there are multiples slots, here we wait for one connection to become available if none already is, returning NULL if an error occured. Otherwise, we simply use the only slot we have, which we know to be free." which seems like it should be in another comment *inside* the function; make the external one something like "Reserve and return a connection that is currently idle, waiting until one becomes idle if none is". Maybe you can put the part I first quoted as a second paragraph in the comment at top of function and keeping the second part I quoted as first paragraph; we seem to use that style too. Placement: I think it's good if related functions stay together, or there is some other rationale for placement within the file. I have two favorite approaches: one is to put all externally callable functions at top of file, followed by all the static helpers in the lower half of the file. The other is to put each externally accessible immediately followed by its specific static helpers. If you choose one of those, that means that SetupParallelSlots should either move upwards, or move downwards. The current ordering seems a dartboard kind of thing where the thrower is not Green Arrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro, Thanks a lot for the review On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Jul-09, Julien Rouhaud wrote: > > > I finished to do a better refactoring, and ended up with this API in > > scripts_parallel: > > Looking good! Thanks! > I'm not sure about the "Consume" word in ConsumeIdleSlot; > maybe "Reserve"? "Obtain"? "Get"? Yes, Consume is maybe a little bit weird. I wanted to point out the make it clear that this function is actually removing a slot from the free list, especially since there's a (historical) get_idle_slot(). I like Reserve, but Obtain and Get are probably too ambiguous. > Code commentary: I think the comment that sits atop the function should > describe what the function does without getting too much in how it does > it. For example in ConsumeIdleSlot you have "If there are multiples > slots, here we wait for one connection to become available if none > already is, returning NULL if an error occured. Otherwise, we simply > use the only slot we have, which we know to be free." which seems like > it should be in another comment *inside* the function; make the external > one something like "Reserve and return a connection that is currently > idle, waiting until one becomes idle if none is". Maybe you can put the > part I first quoted as a second paragraph in the comment at top of > function and keeping the second part I quoted as first paragraph; we > seem to use that style too. Good point, I'll fix as you say. > Placement: I think it's good if related functions stay together, or > there is some other rationale for placement within the file. I have two > favorite approaches: one is to put all externally callable functions at > top of file, followed by all the static helpers in the lower half of the > file. The other is to put each externally accessible immediately > followed by its specific static helpers. If you choose one of those, > that means that SetupParallelSlots should either move upwards, or move > downwards. The current ordering seems a dartboard kind of thing where > the thrower is not Green Arrow. :) I tried to put everything in alphabetic order as it was previously being done, but I apparently failed again at sorting more than 3 characters. I usually prefer to group exported functions together and static ones together, as I find it more maintainable in the long term, so upwards it'll be.
On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote: > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Looking good! > > Thanks! Confirmed. The last set is much easier to go through. >> I'm not sure about the "Consume" word in ConsumeIdleSlot; >> maybe "Reserve"? "Obtain"? "Get"? > > Yes, Consume is maybe a little bit weird. I wanted to point out the > make it clear that this function is actually removing a slot from the > free list, especially since there's a (historical) get_idle_slot(). I > like Reserve, but Obtain and Get are probably too ambiguous. The refactoring patch is getting in shape. Now reindex_one_database() is the only place setting and manipulating the slots. I am wondering if we should have a wrapper which disconnects all the slots (doing conn = NULL after the disconnectDatabase() call does not matter). Get* would be my choice, because we look at the set of parallel slots, and get an idle one. It would be nice to have more consistency in the names for the routines, say: - ParallelSlotInit() instead of SetupParallelSlots (still my suggestion is not perfect either as that sounds like one single slot, but we have a set of these). - ParallelSlotGetIdle() instead of ConsumeIdleSlot(). Still that's more a wait-then-get behavior. - ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion() - ParallelSlotDisconnect, as a wrapper for the calls to DisconnectDatabase(). >> Placement: I think it's good if related functions stay together, or >> there is some other rationale for placement within the file. I have two >> favorite approaches: one is to put all externally callable functions at >> top of file, followed by all the static helpers in the lower half of the >> file. The other is to put each externally accessible immediately >> followed by its specific static helpers. If you choose one of those, >> that means that SetupParallelSlots should either move upwards, or move >> downwards. The current ordering seems a dartboard kind of thing where >> the thrower is not Green Arrow. > > I usually prefer to group exported functions together and static ones > together, as I find it more maintainable in the long term, so upwards > it'll be. That's mainly a matter of taste. Depending on the code path in the tree, sometimes the two approaches from above are used. We have some other files where the static routines are listed first at the top, followed by the exported ones at the bottom as it saves from some declarations of the functions at the top of the file. Keeping the footprint of the author is not that bad either, and that depends also on the context. For this one, as the static functions are linked to the exported ones in a close manner, I would keep each set grouped though. + /* + * Database-wide parallel reindex requires special processing. If + * multiple jobs were asked, we have to reindex system catalogs first, + * as they can't be processed in parallel. + */ + if (process_type == REINDEX_DATABASE) In patch 0002, a parallel database REINDEX first processes the catalog relations in a serializable fashion, and then all the other relations in parallel, which is right Could we have schema-level reindexes also process things in parallel with all the relations from all the schemas listed? This would be profitable in particular for callers listing multiple schemas with an unbalanced number of tables in each, and we'd need to be careful of the same where pg_catalog is listed. Actually, your patch breaks if we do a parallel run with pg_catalog and another schema, no? -- Michael
Attachment
On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote: > > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> Looking good! > > > > Thanks! > > Confirmed. The last set is much easier to go through. > > >> I'm not sure about the "Consume" word in ConsumeIdleSlot; > >> maybe "Reserve"? "Obtain"? "Get"? > > > > Yes, Consume is maybe a little bit weird. I wanted to point out the > > make it clear that this function is actually removing a slot from the > > free list, especially since there's a (historical) get_idle_slot(). I > > like Reserve, but Obtain and Get are probably too ambiguous. > > The refactoring patch is getting in shape. Now reindex_one_database() > is the only place setting and manipulating the slots. I am wondering > if we should have a wrapper which disconnects all the slots (doing > conn = NULL after the disconnectDatabase() call does not matter). You already mentioned that in a previous mail. I was afraid it'd be overkill, but it'll make caller code easier, so let's do it. > Get* would be my choice, because we look at the set of parallel slots, > and get an idle one. That's what the former GetIdleSlot (that I renamed to get_idle_slot as it's not static) is doing. ConsumeIdleSlot() actually get an idle slot and mark it as being used. That's probably some leakage of internal implementation, but having a GetIdleParallelSlot (or ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and I don't have a better idea on how to rename get_idle_slot. If you really prefer Get* and you're fine with a static get_idle_slot, that's fine by me. > It would be nice to have more consistency in the > names for the routines, say: > - ParallelSlotInit() instead of SetupParallelSlots (still my > suggestion is not perfect either as that sounds like one single slot, > but we have a set of these). > - ParallelSlotGetIdle() instead of ConsumeIdleSlot(). Still that's > more a wait-then-get behavior. > - ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion() > - ParallelSlotDisconnect, as a wrapper for the calls to > DisconnectDatabase(). I don't have an opinion on whether to use parallel slot as prefix or postfix, so I'm fine with postfixing. > + /* > + * Database-wide parallel reindex requires special processing. If > + * multiple jobs were asked, we have to reindex system catalogs first, > + * as they can't be processed in parallel. > + */ > + if (process_type == REINDEX_DATABASE) > > In patch 0002, a parallel database REINDEX first processes the catalog > relations in a serializable fashion, and then all the other relations > in parallel, which is right Could we have schema-level reindexes also > process things in parallel with all the relations from all the schemas > listed? This would be profitable in particular for callers listing > multiple schemas with an unbalanced number of tables in each It would also be beneficial for a parallel reindex of a single schema. > and we'd > need to be careful of the same where pg_catalog is listed. Actually, > your patch breaks if we do a parallel run with pg_catalog and another > schema, no? It can definitely cause problems if you ask for pg_catalog and other schema, same as if you ask a parallel reindex of some catalog tables (possibly with other tables). There's a --system switch for that need, so I don't know if documenting the limitation to avoid some extra code to deal with it is a good enough solution?
On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote: > On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote: >> Get* would be my choice, because we look at the set of parallel slots, >> and get an idle one. > > That's what the former GetIdleSlot (that I renamed to get_idle_slot as > it's not static) is doing. ConsumeIdleSlot() actually get an idle > slot and mark it as being used. That's probably some leakage of > internal implementation, but having a GetIdleParallelSlot (or > ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and > I don't have a better idea on how to rename get_idle_slot. If you > really prefer Get* and you're fine with a static get_idle_slot, that's > fine by me. Do we actually need get_idle_slot? ConsumeIdleSlot is its only caller. >> and we'd >> need to be careful of the same where pg_catalog is listed. Actually, >> your patch breaks if we do a parallel run with pg_catalog and another >> schema, no? > > It can definitely cause problems if you ask for pg_catalog and other > schema, same as if you ask a parallel reindex of some catalog tables > (possibly with other tables). There's a --system switch for that > need, so I don't know if documenting the limitation to avoid some > extra code to deal with it is a good enough solution? vacuumdb --full still has limitations in this area and we had some reports on the matter about this behavior being annoying. Its documentation also mentions that mixing catalog relations with --full can cause deadlocks. Documenting it may be fine at the end, but my take is that it would be nice to make sure that we don't have deadlocks if we can avoid them easily. It is also a matter of balance. If for example the patch gets 3 times bigger in size because of that we may have an argument for not doing it and keep the code simple. What do people think about that? I would be nice to get more opinions here. And while scanning the code... + * getQuerySucess Typo here. - * Pump the conn till it's dry of results; return false if any are errors. This comment could be improved on the way, like "Go through all the connections and make sure to consume any remaining results. If any error is found, false is returned after processing all the parallel slots." -- Michael
Attachment
On Thu, Jul 11, 2019 at 3:34 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote: > > On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael@paquier.xyz> wrote: > >> Get* would be my choice, because we look at the set of parallel slots, > >> and get an idle one. > > > > That's what the former GetIdleSlot (that I renamed to get_idle_slot as > > it's not static) is doing. ConsumeIdleSlot() actually get an idle > > slot and mark it as being used. That's probably some leakage of > > internal implementation, but having a GetIdleParallelSlot (or > > ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and > > I don't have a better idea on how to rename get_idle_slot. If you > > really prefer Get* and you're fine with a static get_idle_slot, that's > > fine by me. > > Do we actually need get_idle_slot? ConsumeIdleSlot is its only > caller. I think t hat it makes the code quite cleaner to have the selection outside ConsumeIdleSlot. > And while scanning the code... > > + * getQuerySucess > Typo here. Argh, I thought I caught all of them, thanks! > - * Pump the conn till it's dry of results; return false if any are errors. > This comment could be improved on the way, like "Go through all the > connections and make sure to consume any remaining results. If any > error is found, false is returned after processing all the parallel > slots." You're talking about getQuerySuccess right? That was actually the original comment of a function I renamed. +1 to improve it, but this function is in common.c and doesn't deal with parallel slot at all, so I'll just drop the slang parts.
On Thu, Jul 11, 2019 at 06:22:25PM +0200, Julien Rouhaud wrote: > I think t hat it makes the code quite cleaner to have the selection > outside ConsumeIdleSlot. Actually, you have an issue with ConsumeIdleSlot() if there is only one parallel slot, no? In this case the current patch returns immediately the slot available without waiting. I think that we should wait until the slot becomes free in that case as well, and switch isFree to false. If you want to keep things splitted, that's fine by me, I would still use "Get" within the name for the routine, and rename the other to get_idle_slot_internal() or get_idle_slot_guts() to point out that it has an internal role. > You're talking about getQuerySuccess right? That was actually the > original comment of a function I renamed. +1 to improve it, but this > function is in common.c and doesn't deal with parallel slot at all, so > I'll just drop the slang parts. If we can design a clean interface with better comments, we can use this occasion to browse the whole thing and make it better. -- Michael
Attachment
On Fri, Jul 12, 2019 at 3:20 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 11, 2019 at 06:22:25PM +0200, Julien Rouhaud wrote: > > I think t hat it makes the code quite cleaner to have the selection > > outside ConsumeIdleSlot. > > Actually, you have an issue with ConsumeIdleSlot() if there is only > one parallel slot, no? In this case the current patch returns > immediately the slot available without waiting. I think that we > should wait until the slot becomes free in that case as well, and > switch isFree to false. It shouldn't be a problem, I reused the same infrastructure as for vacuumdb. so run_reindex_command has a new "async" parameter, so when there's no parallelism it's using executeMaintenanceCommand (instead of PQsendQuery) which will block until query completion. That's why there's no isFree usage at all in this case. > If you want to keep things splitted, that's > fine by me, I would still use "Get" within the name for the routine, > and rename the other to get_idle_slot_internal() or > get_idle_slot_guts() to point out that it has an internal role. Ok, I'll change to get_idle_slot_internal then.
On Fri, Jul 12, 2019 at 07:49:13AM +0200, Julien Rouhaud wrote: > It shouldn't be a problem, I reused the same infrastructure as for > vacuumdb. so run_reindex_command has a new "async" parameter, so when > there's no parallelism it's using executeMaintenanceCommand (instead > of PQsendQuery) which will block until query completion. That's why > there's no isFree usage at all in this case. My point is more about consistency and simplification with the case where n > 1 and that we could actually move the async/sync code paths into the same banner as the async mode waits as well until a slot is free, or in short when the query completes. -- Michael
Attachment
On Fri, Jul 12, 2019 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 12, 2019 at 07:49:13AM +0200, Julien Rouhaud wrote: > > It shouldn't be a problem, I reused the same infrastructure as for > > vacuumdb. so run_reindex_command has a new "async" parameter, so when > > there's no parallelism it's using executeMaintenanceCommand (instead > > of PQsendQuery) which will block until query completion. That's why > > there's no isFree usage at all in this case. > > My point is more about consistency and simplification with the case > where n > 1 and that we could actually move the async/sync code paths > into the same banner as the async mode waits as well until a slot is > free, or in short when the query completes. I attach v4 with all previous comment addressed. I also changed to handle parallel and non-parallel case the same way. I kept the possibility for synchronous behavior in reindexdb, as there's an early need to run some queries in case of parallel database-wide reindex. It avoids to open all the connections in case anything fails during this preliminary work, and it also avoids another call for the async wait function. If we add parallelism to clusterdb (I'll probably work on that next time I have spare time), reindexdb would be the only caller left of executeMaintenanceCommand(), so that's something we may want to change. I didn't change the behavior wrt. possible deadlock if user specify catalog objects using --index or --table and ask for multiple connection, as I'm afraid that it'll add too much code for a little benefit. Please shout if you think otherwise.
Attachment
On Fri, Jul 12, 2019 at 11:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > I didn't change the behavior wrt. possible deadlock if user specify > catalog objects using --index or --table and ask for multiple > connection, as I'm afraid that it'll add too much code for a little > benefit. Please shout if you think otherwise. Sorry I meant schemas, not indexes. After more thinking about schema and multiple jobs, I think that erroring out is quite user unfriendly, as it's entirely ok to ask for multiple indexes and multiple object that do support parallelism in a single call. So I think it's better to remove the error, ignore the given --jobs options for indexes and document this behavior.
On Tue, Jul 16, 2019 at 02:03:16PM +0200, Julien Rouhaud wrote: > After more thinking about schema and multiple jobs, I think that > erroring out is quite user unfriendly, as it's entirely ok to ask > for > multiple indexes and multiple object that do support parallelism in > a > single call. So I think it's better to remove the error, ignore the > given --jobs options for indexes and document this behavior. No objections to that. I still need to study a bit more 0002 though to come to a clear conclusion. Actually, from patch 0002: + free_slot = ParallelSlotsGetIdle(slots, concurrentCons, progname); + if (!free_slot) + { + failed = true; + goto finish; + } + + run_reindex_command(conn, process_type, objname, progname, echo, + verbose, concurrently, true); The same connection gets reused, shouldn't the connection come from the free slot? On top of that quick lookup, I have done an in-depth review on 0001 to bring it to a committable state, fixing a couple of typos, incorrect comments (description of ParallelSlotsGetIdle was for example incorrect) on the way. Other things include that connectDatabase should have an assertion for a non-NULL connection, calling pg_free() on the slots terminate is more consistent as pg_malloc is used first. A comment at the top of processQueryResult still referred to vacuuming of a missing relation. Most of the patch was in a clean state, with a clear interface for parallel slots, the place of the new routines also makes sense, so I did not have much to do :) Another thing I have noticed is that we don't really need to pass down progname across all those layers as we finish by using pg_log_error() when processing results, so more simplifications can be done. Let's handle all that in the same patch as we are messing with the area. connectDatabase() and connectMaintenanceDatabase() still need it though as this is used in the connection string, so ParallelSlotsSetup() also needs it. This part is not really your fault but as I am looking at it, it does not hurt to clean up what we can. Attached is an updated version of 0001 that I am comfortable with. I'd like to commit that with the cleanups included and then let's move to the real deal with 0002. -- Michael
Attachment
On Wed, Jul 17, 2019 at 9:59 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jul 16, 2019 at 02:03:16PM +0200, Julien Rouhaud wrote: > > After more thinking about schema and multiple jobs, I think that > > erroring out is quite user unfriendly, as it's entirely ok to ask > > for > > multiple indexes and multiple object that do support parallelism in > > a > > single call. So I think it's better to remove the error, ignore the > > given --jobs options for indexes and document this behavior. > > No objections to that. I still need to study a bit more 0002 though > to come to a clear conclusion. > > Actually, from patch 0002: > + free_slot = ParallelSlotsGetIdle(slots, concurrentCons, progname); > + if (!free_slot) > + { > + failed = true; > + goto finish; > + } > + > + run_reindex_command(conn, process_type, objname, progname, echo, > + verbose, concurrently, true); > The same connection gets reused, shouldn't the connection come from > the free slot? Ouch indeed. > On top of that quick lookup, I have done an in-depth review on 0001 to > bring it to a committable state, fixing a couple of typos, incorrect > comments (description of ParallelSlotsGetIdle was for example > incorrect) on the way. Other things include that connectDatabase > should have an assertion for a non-NULL connection, disconnectDatabase you mean? Fine by me. > calling pg_free() > on the slots terminate is more consistent as pg_malloc is used first. > A comment at the top of processQueryResult still referred to > vacuuming of a missing relation. Most of the patch was in a clean > state, with a clear interface for parallel slots, the place of the new > routines also makes sense, so I did not have much to do :) Thanks :) > Another thing I have noticed is that we don't really need to pass down > progname across all those layers as we finish by using pg_log_error() > when processing results, so more simplifications can be done. Let's > handle all that in the same patch as we are messing with the area. > connectDatabase() and connectMaintenanceDatabase() still need it > though as this is used in the connection string, so > ParallelSlotsSetup() also needs it. This part is not really your > fault but as I am looking at it, it does not hurt to clean up what we > can. Attached is an updated version of 0001 that I am comfortable > with. I'd like to commit that with the cleanups included and then > let's move to the real deal with 0002. Good catch, I totally missed this progname change. I read the patch you attached, I have a few comments: +/* + * Disconnect the given connection, canceling any statement if one is active. + */ +void +disconnectDatabase(PGconn *conn) Nitpicking, but this comment doesn't follow the style of other functions' comments (it's also the case for existing comment on executeQuery at least). While reading the comments you added on ParallelSlotsSetup(), I wondered if we couldn't also add an Assert(conn) at the beginning? +void +ParallelSlotsTerminate(ParallelSlot *slots, int numslots) +{ + int i; + + for (i = 0; i < numslots; i++) + { + PGconn *conn = slots[i].connection; + + if (conn == NULL) + continue; + + disconnectDatabase(conn); + } + + pg_free(slots); +} Is it ok to call pg_free(slots) and let caller have a pointer pointing to freed memory?
On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: > On Wed, Jul 17, 2019 at 9:59 AM Michael Paquier <michael@paquier.xyz> wrote: >> On top of that quick lookup, I have done an in-depth review on 0001 to >> bring it to a committable state, fixing a couple of typos, incorrect >> comments (description of ParallelSlotsGetIdle was for example >> incorrect) on the way. Other things include that connectDatabase >> should have an assertion for a non-NULL connection, > > disconnectDatabase you mean? Fine by me. Oops, yes. I meant disconnectDatabase() here. The patch does so, not my words. > +/* > + * Disconnect the given connection, canceling any statement if one is active. > + */ > +void > +disconnectDatabase(PGconn *conn) > > Nitpicking, but this comment doesn't follow the style of other > functions' comments (it's also the case for existing comment on > executeQuery at least). connectDatabase, connectMaintenanceDatabase, executeQuery and most of the others follow that style, so I am just going to simplify consumeQueryResult and processQueryResult to keep a consistent style. > While reading the comments you added on ParallelSlotsSetup(), I > wondered if we couldn't also add an Assert(conn) at the beginning? That makes sense as this gets associated to the first slot. There could be an argument for making a set of slots extensible to simplify this interface, but that complicates the logic for the scripts. > Is it ok to call pg_free(slots) and let caller have a pointer pointing > to freed memory? The interface has a Setup call which initializes the whole thing, and Terminate is the logical end point, so having the free logic within the termination looks more consistent to me. We could now have actual Init() and Free() but I am not sure that this justifies the move as this complicates the scripts using it. -- Michael
Attachment
On Thu, Jul 18, 2019 at 09:45:14AM +0900, Michael Paquier wrote: > On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: >> Is it ok to call pg_free(slots) and let caller have a pointer pointing > to freed memory? > > The interface has a Setup call which initializes the whole thing, and > Terminate is the logical end point, so having the free logic within > the termination looks more consistent to me. We could now have actual > Init() and Free() but I am not sure that this justifies the move as > this complicates the scripts using it. I have reconsidered this point, moved the pg_free() call out of the termination logic, and committed the first patch after an extra lookup and more polishing. For the second patch, could you send a rebase with a fix for the connection slot when processing the reindex commands? -- Michael
Attachment
On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 18, 2019 at 09:45:14AM +0900, Michael Paquier wrote: > > On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: > >> Is it ok to call pg_free(slots) and let caller have a pointer pointing > > to freed memory? > > > > The interface has a Setup call which initializes the whole thing, and > > Terminate is the logical end point, so having the free logic within > > the termination looks more consistent to me. We could now have actual > > Init() and Free() but I am not sure that this justifies the move as > > this complicates the scripts using it. > > I have reconsidered this point, moved the pg_free() call out of the > termination logic, and committed the first patch after an extra lookup > and more polishing. Thanks! > For the second patch, could you send a rebase with a fix for the > connection slot when processing the reindex commands? Attached, I also hopefully removed all the now unneeded progname usage.
Attachment
On Fri, Jul 19, 2019 at 08:29:27AM +0200, Julien Rouhaud wrote: > On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier <michael@paquier.xyz> wrote: >> For the second patch, could you send a rebase with a fix for the >> connection slot when processing the reindex commands? > > Attached, I also hopefully removed all the now unneeded progname usage. + Note that this mode is not compatible the <option>-i / --index</option> + or the <option>-s / --system</option> options. Nits: this is not a style consistent with the documentation. When referring to both the long and short options the formulation "-i or --index" gets used. Here we could just use the long option. This sentence is missing a "with". simple_string_list_append(&tables, optarg); + tbl_count++; break; The number of items in a simple list is not counted, and vacuumdb does the same thing to count objects. What do you think about extending simple lists to track the number of items stored? +$node->issues_sql_like([qw(reindexdb -j2)], + qr/statement: REINDEX TABLE public.test1/, + 'Global and parallel reindex will issue per-table REINDEX'); Would it make sense to have some tests for schemas here? One of my comments in [1] has not been answered. What about the decomposition of a list of schemas into a list of tables when using the parallel mode? [1]: https://www.postgresql.org/message-id/20190711040433.GG4500@paquier.xyz -- Michael
Attachment
On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 19, 2019 at 08:29:27AM +0200, Julien Rouhaud wrote: > > On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier <michael@paquier.xyz> wrote: > >> For the second patch, could you send a rebase with a fix for the > >> connection slot when processing the reindex commands? > > > > Attached, I also hopefully removed all the now unneeded progname usage. > > + Note that this mode is not compatible the <option>-i / --index</option> > + or the <option>-s / --system</option> options. > Nits: this is not a style consistent with the documentation. When > referring to both the long and short options the formulation "-i or > --index" gets used. Here we could just use the long option. This > sentence is missing a "with". Right, so I kept the long option. Also this comment was outdated, as the --jobs is now just ignored with a list of indexes, so I fixed that too. > > simple_string_list_append(&tables, optarg); > + tbl_count++; > break; > The number of items in a simple list is not counted, and vacuumdb does > the same thing to count objects. What do you think about extending > simple lists to track the number of items stored? I considered this, but it would require to adapt all code that declare SimpleStringList stack variable (vacuumdb.c, clusterdb.c, createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't have a strong opinion here, so I can go for it if you prefer. > > +$node->issues_sql_like([qw(reindexdb -j2)], > + qr/statement: REINDEX TABLE public.test1/, > + 'Global and parallel reindex will issue per-table REINDEX'); > Would it make sense to have some tests for schemas here? > > One of my comments in [1] has not been answered. What about > the decomposition of a list of schemas into a list of tables when > using the parallel mode? I did that in attached v6, and added suitable regression tests.
Attachment
On 2019-Jul-22, Julien Rouhaud wrote: > On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > simple_string_list_append(&tables, optarg); > > + tbl_count++; > > break; > > The number of items in a simple list is not counted, and vacuumdb does > > the same thing to count objects. What do you think about extending > > simple lists to track the number of items stored? > > I considered this, but it would require to adapt all code that declare > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much > trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't > have a strong opinion here, so I can go for it if you prefer. Can we use List for this instead? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Jul-19, Julien Rouhaud wrote: > > For the second patch, could you send a rebase with a fix for the > > connection slot when processing the reindex commands? > > Attached, I also hopefully removed all the now unneeded progname usage. BTW "progname" is a global variable in logging.c, and it's initialized by pg_logging_init(), so there's no point in having a local variable in main() that's called the same and initialized the same way. You could just remove it from the signature of all those functions (connectDatabase and callers), and there would be no visible change. Also: [see attached] -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Jul-22, Julien Rouhaud wrote: > > > On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > simple_string_list_append(&tables, optarg); > > > + tbl_count++; > > > break; > > > The number of items in a simple list is not counted, and vacuumdb does > > > the same thing to count objects. What do you think about extending > > > simple lists to track the number of items stored? > > > > I considered this, but it would require to adapt all code that declare > > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > > createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much > > trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't > > have a strong opinion here, so I can go for it if you prefer. > > Can we use List for this instead? Isn't that for backend code only?
On 2019-Jul-22, Julien Rouhaud wrote: > On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > > I considered this, but it would require to adapt all code that declare > > > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > > > createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much > > > trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't > > > have a strong opinion here, so I can go for it if you prefer. > > > > Can we use List for this instead? > > Isn't that for backend code only? Well, we already have palloc() on the frontend side, and list.c doesn't have any elog()/ereport(), so it should be possible to use it ... I do see that it uses MemoryContextAlloc() in a few places. Maybe we can just #define that to palloc()? (Maybe we can use the impulse to get rid of these "simple lists" altogether?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Jul-22, Julien Rouhaud wrote: >> On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> Can we use List for this instead? >> Isn't that for backend code only? > Well, we already have palloc() on the frontend side, and list.c doesn't > have any elog()/ereport(), so it should be possible to use it ... I do > see that it uses MemoryContextAlloc() in a few places. Maybe we can > just #define that to palloc()? I'm not happy about either the idea of pulling all of list.c into frontend programs, or restricting it to be frontend-safe. That's very fundamental infrastructure and I don't want it laboring under such a restriction. Furthermore, List usage generally leaks memory like mad (cf nearby list_concat discussion) which doesn't seem like something we want for frontend code. regards, tom lane
On 2019-Jul-22, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2019-Jul-22, Julien Rouhaud wrote: > >> On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >>> Can we use List for this instead? > > >> Isn't that for backend code only? > > > Well, we already have palloc() on the frontend side, and list.c doesn't > > have any elog()/ereport(), so it should be possible to use it ... I do > > see that it uses MemoryContextAlloc() in a few places. Maybe we can > > just #define that to palloc()? > > I'm not happy about either the idea of pulling all of list.c into > frontend programs, or restricting it to be frontend-safe. That's > very fundamental infrastructure and I don't want it laboring under > such a restriction. Furthermore, List usage generally leaks memory > like mad (cf nearby list_concat discussion) which doesn't seem like > something we want for frontend code. Fair enough. List has gotten quite sophisticated now, so I understand the concern. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 22, 2019 at 11:18:06AM -0400, Alvaro Herrera wrote: > BTW "progname" is a global variable in logging.c, and it's initialized > by pg_logging_init(), so there's no point in having a local variable in > main() that's called the same and initialized the same way. You could > just remove it from the signature of all those functions > (connectDatabase and callers), and there would be no visible change. Sure, and I was really tempted to do that until I noticed that we pass down progname for fallback_application_name in the connection string and that we would basically need to externalize progname in logging.h, as well as switch all the callers of pg_logging_init to now include their own definition of progname, which was much more invasive than the initial refactoring intended. I am also under the impression that we had better keep get_progname() and pg_logging_init() as rather independent things. > Also: [see attached] Missed those in the initial cleanup. Applied, thanks! -- Michael
Attachment
On Mon, Jul 22, 2019 at 02:40:19PM +0200, Julien Rouhaud wrote: > Right, so I kept the long option. Also this comment was outdated, as > the --jobs is now just ignored with a list of indexes, so I fixed that > too. + if (!parallel) + { + if (user_list == NULL) + { + /* + * Create a dummy list with an empty string, as user requires an + * element. + */ + process_list = pg_malloc0(sizeof(SimpleStringList)); + simple_string_list_append(process_list, ""); + } + } This part looks a bit hacky and this is needed because we don't have a list of objects when doing a non-parallel system or database reindex. The deal is that we just want a list with one element: the database of the connection. Wouldn't it be more natural to assign the database name here using PQdb(conn)? Then add an assertion at the top of run_reindex_command() checking for a non-NULL name? > I considered this, but it would require to adapt all code that declare > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much > trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't > have a strong opinion here, so I can go for it if you prefer. Okay. This is a tad wider than the original patch proposal, and this adds two lines. So let's discard that for now and keep it simple. >> +$node->issues_sql_like([qw(reindexdb -j2)], >> + qr/statement: REINDEX TABLE public.test1/, >> + 'Global and parallel reindex will issue per-table REINDEX'); >> Would it make sense to have some tests for schemas here? >> >> One of my comments in [1] has not been answered. What about >> the decomposition of a list of schemas into a list of tables when >> using the parallel mode? > > I did that in attached v6, and added suitable regression tests. The two tests for "reindexdb -j2" can be combined into a single call, checking for both commands to be generated in the same output. The second command triggering a reindex on two schemas cannot be used to check for the generation of both s1.t1 and s2.t2 as the ordering may not be guaranteed. The commands arrays also looked inconsistent with the rest. Attached is an updated patch with some format modifications and the fixes for the tests. -- Michael
Attachment
Sorry for the late answer, On Tue, Jul 23, 2019 at 9:38 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jul 22, 2019 at 02:40:19PM +0200, Julien Rouhaud wrote: > > Right, so I kept the long option. Also this comment was outdated, as > > the --jobs is now just ignored with a list of indexes, so I fixed that > > too. > > + if (!parallel) > + { > + if (user_list == NULL) > + { > + /* > + * Create a dummy list with an empty string, as user requires an > + * element. > + */ > + process_list = pg_malloc0(sizeof(SimpleStringList)); > + simple_string_list_append(process_list, ""); > + } > + } > This part looks a bit hacky and this is needed because we don't have a > list of objects when doing a non-parallel system or database reindex. > The deal is that we just want a list with one element: the database of > the connection. Wouldn't it be more natural to assign the database > name here using PQdb(conn)? Then add an assertion at the top of > run_reindex_command() checking for a non-NULL name? Good point, fixed it that way. > > > I considered this, but it would require to adapt all code that declare > > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > > createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much > > trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't > > have a strong opinion here, so I can go for it if you prefer. > > Okay. This is a tad wider than the original patch proposal, and this > adds two lines. So let's discard that for now and keep it simple. Ok! > >> +$node->issues_sql_like([qw(reindexdb -j2)], > >> + qr/statement: REINDEX TABLE public.test1/, > >> + 'Global and parallel reindex will issue per-table REINDEX'); > >> Would it make sense to have some tests for schemas here? > >> > >> One of my comments in [1] has not been answered. What about > >> the decomposition of a list of schemas into a list of tables when > >> using the parallel mode? > > > > I did that in attached v6, and added suitable regression tests. > > The two tests for "reindexdb -j2" can be combined into a single call, > checking for both commands to be generated in the same output. The > second command triggering a reindex on two schemas cannot be used to > check for the generation of both s1.t1 and s2.t2 as the ordering may > not be guaranteed. The commands arrays also looked inconsistent with > the rest. Attached is an updated patch with some format modifications > and the fixes for the tests. Attached v8 based on your v7 + the fix for the dummy part.
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: not tested Documentation: tested, passed Hi I did some review and have few notes about behavior. reindex database does not work with concurrently option: > ./inst/bin/reindexdb --echo -d postgres -j 8 --concurrently > SELECT pg_catalog.set_config('search_path', '', false); > REINDEX SYSTEM CONCURRENTLY postgres; > reindexdb: error: reindexing of system catalogs on database "postgres" failed: ERROR: cannot reindex system catalogs concurrently I think we need print message and skip system catalogs for concurrently reindex. Or we can disallow concurrently database reindex with multiple jobs. I prefer first option. > + reindex_one_database(dbname, REINDEX_SCHEMA, &schemas, host, > + port, username, prompt_password, progname, > + echo, verbose, concurrently, > + Min(concurrentCons, nsp_count)); Should be just concurrentCons instead of Min(concurrentCons, nsp_count) reindex_one_database for REINDEX_SCHEMA will build tables list and then processing by available workers. So: -j 8 -S public -S public -S public -S poblic -S public -S public - will work with 6 jobs (and without multiple processingfor same table) -j 8 -S public - will have only one worker regardless tables count > if (concurrentCons > FD_SETSIZE - 1) "if (concurrentCons >= FD_SETSIZE)" would not cleaner? Well, pgbench uses >= condition, vacuumdb uses > FD_SETSIZE - 1. Nomore FD_SETSIZE in conditions =) regards, Sergei
Thanks for the review! On Thu, Jul 25, 2019 at 10:17 AM Sergei Kornilov <sk@zsrv.org> wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, failed > Spec compliant: not tested > Documentation: tested, passed > > Hi > > I did some review and have few notes about behavior. > > reindex database does not work with concurrently option: > > > ./inst/bin/reindexdb --echo -d postgres -j 8 --concurrently > > SELECT pg_catalog.set_config('search_path', '', false); > > REINDEX SYSTEM CONCURRENTLY postgres; > > reindexdb: error: reindexing of system catalogs on database "postgres" failed: ERROR: cannot reindex system catalogsconcurrently > > I think we need print message and skip system catalogs for concurrently reindex. > Or we can disallow concurrently database reindex with multiple jobs. I prefer first option. Good point. I agree with 1st option, as that's already what would happen without the --jobs switch: $ reindexdb -d postgres --concurrently WARNING: cannot reindex system catalogs concurrently, skipping all (although this is emitted by the backend) I modified the client code to behave the same and added a regression test. > > + reindex_one_database(dbname, REINDEX_SCHEMA, &schemas, host, > > + port, username, prompt_password, progname, > > + echo, verbose, concurrently, > > + Min(concurrentCons, nsp_count)); > > Should be just concurrentCons instead of Min(concurrentCons, nsp_count) Indeed, that changed with v8 and I forgot to update it, fixed. > reindex_one_database for REINDEX_SCHEMA will build tables list and then processing by available workers. So: > -j 8 -S public -S public -S public -S poblic -S public -S public - will work with 6 jobs (and without multiple processingfor same table) > -j 8 -S public - will have only one worker regardless tables count > > > if (concurrentCons > FD_SETSIZE - 1) > > "if (concurrentCons >= FD_SETSIZE)" would not cleaner? Well, pgbench uses >= condition, vacuumdb uses > FD_SETSIZE - 1.No more FD_SETSIZE in conditions =) I don't have a strong opinion here. If we change for >=, it'd be better to also adapt vacuumdb for consistency. I didn't change it for now, to stay consistent with vacuumdb.
Attachment
Hi Thank you, v9 code and behavior looks good for me. Builds cleanly, works with older releases. I'll mark Ready to Commiter. > I don't have a strong opinion here. If we change for >=, it'd be > better to also adapt vacuumdb for consistency. I didn't change it for > now, to stay consistent with vacuumdb. Yep, no strong opinion from me too. regards, Sergei
On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote: > Thank you, v9 code and behavior looks good for me. Builds cleanly, > works with older releases. I'll mark Ready to Commiter. The restriction with --jobs and --system is not documented, and that's good to have from the start. This actually makes the parallel job handling with --index inconsistent as we mask parallelism in this case by enforcing the use of one connection. I think that we should revisit the interactions with --index and --jobs actually, because, assuming that users never read the documentation, they may actually be surprised to see that something like --index idx1 .. --index idxN --jobs=N does not lead to any improvements at all, until they find out the reason why. It is also much easier to have an error as starting point because it can be lifted later one. There is an argument that we may actually not have this restriction at all on --index as long as the user knows what it is doing and does not define indexes from the same relation, still I would keep an error. Thinking deeper about it, there is also a point of gathering first all the relations if one associates --schemas and --tables in the same call of reindexdb and then pass down a list of decomposed relations which are processed in parallel. The code as currently presented is rather straight-forward, and I don't think that this is worth the extra complication, but this was not mentioned until now on this thread :) For the non-parallel case in reindex_one_database(), I would add an Assert on REINDEX_DATABASE and REINDEX_SYSTEM with a comment to mention that a NULL list of objects can just be passed down only in those two cases when the single-object list with the database name is built. >> I don't have a strong opinion here. If we change for >=, it'd be >> better to also adapt vacuumdb for consistency. I didn't change it for >> now, to stay consistent with vacuumdb. > > Yep, no strong opinion from me too. My opinion tends towards consistency. Consistency sounds good. -- Michael
Attachment
Hi >>> I don't have a strong opinion here. If we change for >=, it'd be >>> better to also adapt vacuumdb for consistency. I didn't change it for >>> now, to stay consistent with vacuumdb. >> >> Yep, no strong opinion from me too. > > My opinion tends towards consistency. Consistency sounds good. Which one consistency you prefer? Currently we have just one >= FD_SETSIZE in pgbench and one > FD_SETSIZE -1 in vacuumdb.That's all. regards, Sergei
On Thu, Jul 25, 2019 at 12:03 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote: > > Thank you, v9 code and behavior looks good for me. Builds cleanly, > > works with older releases. I'll mark Ready to Commiter. > > The restriction with --jobs and --system is not documented, and that's > good to have from the start. That's documented in the patch: + Note that this option is ignored with the <option>--index</option> + option to prevent deadlocks when processing multiple indexes from + the same relation, and incompatible with the <option>--system</option> + option. The restriction with --jobs and --concurrently is indeed not specifically documented in reindexdb.sgml, there's only a mention in reindex.sgml: <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY</command> since system catalogs cannot be reindexed The behavior doesn't really change with this patch, though we could enhance the documentation. > This actually makes the parallel job > handling with --index inconsistent as we mask parallelism in this case > by enforcing the use of one connection. I think that we should > revisit the interactions with --index and --jobs actually, because, > assuming that users never read the documentation, they may actually be > surprised to see that something like --index idx1 .. --index idxN > --jobs=N does not lead to any improvements at all, until they find out > the reason why. The problem is that a user doing something like: reindexdb -j48 -i some_index -S s1 -S s2 -S s3.... will probably be disappointed to learn that he has to run a specific command for the index(es) that should be reindexed. Maybe we can issue a warning that parallelism isn't used when an index list is processed and user asked for multiple jobs? > Thinking deeper about it, there is also a point of gathering first all > the relations if one associates --schemas and --tables in the same > call of reindexdb and then pass down a list of decomposed relations > which are processed in parallel. The code as currently presented is > rather straight-forward, and I don't think that this is worth the > extra complication, but this was not mentioned until now on this > thread :) +1 > For the non-parallel case in reindex_one_database(), I would add an > Assert on REINDEX_DATABASE and REINDEX_SYSTEM with a comment to > mention that a NULL list of objects can just be passed down only in > those two cases when the single-object list with the database name is > built. Something like that? if (!parallel) { - if (user_list == NULL) + /* + * Database-wide and system catalogs processing should omit the list + * of objects to process. + */ + if (process_type == REINDEX_DATABASE || process_type == REINDEX_SYSTEM) { + Assert(user_list == NULL); + process_list = pg_malloc0(sizeof(SimpleStringList)); simple_string_list_append(process_list, PQdb(conn)); } There's another assert after the else-parallel that checks that a list is present, so there's no need to also check it here. I don't send a new patch since the --index wanted behavior is not clear yet.
On Thu, Jul 25, 2019 at 01:00:34PM +0200, Julien Rouhaud wrote: > The problem is that a user doing something like: > > reindexdb -j48 -i some_index -S s1 -S s2 -S s3.... > > will probably be disappointed to learn that he has to run a specific > command for the index(es) that should be reindexed. Maybe we can > issue a warning that parallelism isn't used when an index list is > processed and user asked for multiple jobs? Arguments go in both directions as some other users may be surprised by the performance of indexes as serialization is enforced. > I don't send a new patch since the --index wanted behavior is not > clear yet. So I am sending one patch (actually two) after a closer review that I have spent time shaping into a committable state. And for this part I have another suggestion that is to use a switch/case without a default so as any newly-added object types would allow somebody to think about those code paths as this would generate compiler warnings. While reviewing I have found an extra bug in the logic: when using a list of tables, the number of parallel slots is the minimum between concurrentCons and tbl_count, but this does not get applied after building a list of tables for a schema or database reindex, so we had better recompute the number of items in reindex_one_database() before allocating the number of parallel slots. There was also a small gem in the TAP tests for one of the commands using "-j2" in one of the command arguments. So here we go: - 0001 is your original thing, with --jobs enforced to 1 for the index part. - 0002 is my addition to forbid --index with --jobs. I am fine to be outvoted regarding 0002, and it is the case based on the state of this thread with 2:1. We could always revisit this decision in this development cycle anyway. -- Michael
Attachment
On Fri, Jul 26, 2019 at 5:27 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 25, 2019 at 01:00:34PM +0200, Julien Rouhaud wrote: > > The problem is that a user doing something like: > > > > reindexdb -j48 -i some_index -S s1 -S s2 -S s3.... > > > > will probably be disappointed to learn that he has to run a specific > > command for the index(es) that should be reindexed. Maybe we can > > issue a warning that parallelism isn't used when an index list is > > processed and user asked for multiple jobs? > > Arguments go in both directions as some other users may be surprised > by the performance of indexes as serialization is enforced. Sure, but there is no easy solution in that case, as you'd have to do all the work of spawning multiple reindexdb according to the underlying table, so probably what will happen here is that there'll just be two simple calls to reindexdb, one for the indexes, serialized anyway, and one for everything else. My vote is still to allow it, possibly emitting a notice or a warning. > > I don't send a new patch since the --index wanted behavior is not > > clear yet. > > So I am sending one patch (actually two) after a closer review that I > have spent time shaping into a committable state. And for this part I > have another suggestion that is to use a switch/case without a default > so as any newly-added object types would allow somebody to think about > those code paths as this would generate compiler warnings. Thanks for that! I'm fine with using switch to avoid future bad surprises. > While reviewing I have found an extra bug in the logic: when using a > list of tables, the number of parallel slots is the minimum between > concurrentCons and tbl_count, but this does not get applied after > building a list of tables for a schema or database reindex, so we had > better recompute the number of items in reindex_one_database() before > allocating the number of parallel slots. I see that you iterate over the SimpleStringList after it's generated. Why not computing that while building it in get_parallel_object_list (and keep the provided table list count) instead?
On Fri, Jul 26, 2019 at 09:36:32AM +0200, Julien Rouhaud wrote: > I see that you iterate over the SimpleStringList after it's generated. > Why not computing that while building it in get_parallel_object_list > (and keep the provided table list count) instead? Yeah. I was hesitating to do that, or just break out of the counting loop if there are more objects than concurrent jobs, but that's less intuitive. -- Michael
Attachment
Hi > So here we go: > - 0001 is your original thing, with --jobs enforced to 1 for the index > part. > - 0002 is my addition to forbid --index with --jobs. > > I am fine to be outvoted regarding 0002, and it is the case based on > the state of this thread with 2:1. We could always revisit this > decision in this development cycle anyway. Explicit is better than implicit, so I am +1 to commit both patches. regards, Sergei
On Fri, Jul 26, 2019 at 10:53:03AM +0300, Sergei Kornilov wrote: > Explicit is better than implicit, so I am +1 to commit both patches. Hence my count is incorrect: - Forbid --jobs and --index: Michael P, Sergei K. - Enforce --jobs=1 with --index: Julien R. - Have no restrictions: 0. -- Michael
Attachment
On Fri, Jul 26, 2019 at 9:41 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 26, 2019 at 09:36:32AM +0200, Julien Rouhaud wrote: > > I see that you iterate over the SimpleStringList after it's generated. > > Why not computing that while building it in get_parallel_object_list > > (and keep the provided table list count) instead? > > Yeah. I was hesitating to do that, or just break out of the counting > loop if there are more objects than concurrent jobs, but that's less > intuitive. That's probably still more intuitive than having the count coming from either main() or from get_parallel_object_list() depending on the process type, so I'm fine with that alternative. Maybe we could bite the bullet and add a count meber to Simple*List, also providing a macro to initialize a new list so that next time a field is added there won't be a massive boilerplate code change?
On Sat, Jul 27, 2019 at 11:44:47AM +0200, Julien Rouhaud wrote: > That's probably still more intuitive than having the count coming from > either main() or from get_parallel_object_list() depending on the > process type, so I'm fine with that alternative. Maybe we could bite > the bullet and add a count meber to Simple*List, also providing a > macro to initialize a new list so that next time a field is added > there won't be a massive boilerplate code change? Perhaps, we could discuss about that on a separate thread. For now I have gone with the simplest approach of counting the items, and stopping the count if there are more items than jobs. While reviewing I have found a double-free in your patch when building a list of relations for schemas or databases. If the list finishes empty, PQfinish() was called twice on the connection, leading to a crash. I have added a test for that, done an extra pass on the patch adjusting a couple of things then committed the patch with the restriction on --index and --jobs. This entry is now marked as committed in the CF app. -- Michael
Attachment
On Sat, Jul 27, 2019 at 3:27 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jul 27, 2019 at 11:44:47AM +0200, Julien Rouhaud wrote: > > That's probably still more intuitive than having the count coming from > > either main() or from get_parallel_object_list() depending on the > > process type, so I'm fine with that alternative. Maybe we could bite > > the bullet and add a count meber to Simple*List, also providing a > > macro to initialize a new list so that next time a field is added > > there won't be a massive boilerplate code change? > > Perhaps, we could discuss about that on a separate thread. Agreed. > For now I > have gone with the simplest approach of counting the items, and > stopping the count if there are more items than jobs. While reviewing > I have found a double-free in your patch when building a list of > relations for schemas or databases. If the list finishes empty, > PQfinish() was called twice on the connection, leading to a crash. I > have added a test for that Oops, thanks for spotting and fixing. > , done an extra pass on the patch adjusting > a couple of things then committed the patch with the restriction on > --index and --jobs. This entry is now marked as committed in the CF > app. Thanks!
On Mon, Jul 22, 2019 at 01:05:32PM -0400, Alvaro Herrera wrote: > Fair enough. List has gotten quite sophisticated now, so I understand > the concern. Just wondering something... List cells include one pointer, one signed integer and an OID. The two last entries are basically 4-byte each, hence could we reduce a bit the bloat by unifying both of them? I understand that the distinction exists because both may not be of the same size.. /me runs and hides -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Just wondering something... List cells include one pointer, one > signed integer and an OID. The two last entries are basically 4-byte > each, hence could we reduce a bit the bloat by unifying both of them? We couldn't really simplify the API that way; for example, lfirst_int and lfirst_oid *must* be different because they must return different types. I think it'd be a bad idea to have some parts of the API that distinguish the two types while others pretend they're the same, so there's not much room for shortening that. You could imagine unifying the implementations of many of the _int and _oid functions, but I can't get excited about that. It would add confusion for not a lot of code savings. > I understand that the distinction exists because both may not be of > the same size.. Well, even more to the point, one's signed and one isn't. In the long run, might we ever switch to 64-bit OIDs? I dunno. Now that we kicked them out of user tables, it might be feasible, but by the same token there's not much pressure to do it. regards, tom lane
Hi, On 2019-07-28 10:07:27 -0400, Tom Lane wrote: > In the long run, might we ever switch to 64-bit OIDs? I dunno. > Now that we kicked them out of user tables, it might be feasible, > but by the same token there's not much pressure to do it. Depends on the the table, I'd say. Having toast tables have 64bit ids, and not advance the oid counter, would be quite the advantage over the current situation. Toasting performance craters once the oid counter has wrapped. But obviously there are upgrade problems there - presumably we'd need 'narrow" and 'wide' toast tables, or such. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-07-28 10:07:27 -0400, Tom Lane wrote: >> In the long run, might we ever switch to 64-bit OIDs? I dunno. > Depends on the the table, I'd say. Having toast tables have 64bit ids, > and not advance the oid counter, would be quite the advantage over the > current situation. Toasting performance craters once the oid counter has > wrapped. But obviously there are upgrade problems there - presumably > we'd need 'narrow" and 'wide' toast tables, or such. Yeah, but I'd be inclined to fix toast tables as a special case, rather than widening OIDs in general. We could define the chunk number as being int8 not OID for the "wide" style. regards, tom lane