Thread: missing indexes in indexlist with partitioned tables
Hi,
I did a bit of testing today and noticed that we don't set indexlist properly at the right time in some cases when using partitioned tables.
I attached a simple case where the indexlist doesn't seems to be set at the right time. get_relation_info in plancat.c seems to process it only after analyzejoins.c checked for it.
Can someone explain to me why it is deferred at all?
Regards
Arne
Attachment
Hi,
I stumbled across a few places that depend on the inheritance appends being applied at a later date, so I quickly abandoned that idea. I thought a bit about the indexlist, in particular the inhparent, and I am not sure what depends on get_relation_info working in that way.
Therefore I propose a new attribute partIndexlist of RelOptInfo to include information about uniqueness, in the case the executor can't use the structure that causes the uniqueness to begin with. Said attribute can be used by relation_has_unique_index_for and rel_supports_distinctness.
The attached patch takes that route. I'd appreciate feedback!
Regards
Arne
Attachment
Hi, On Thu, Oct 28, 2021 at 01:44:31PM +0000, Arne Roland wrote: > > The attached patch takes that route. I'd appreciate feedback! The cfbot reports that the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_3452.log === Applying patches on top of PostgreSQL commit ID 025b920a3d45fed441a0a58fdcdf05b321b1eead === === applying patch ./partIndexlistClean.patch patching file src/backend/access/heap/vacuumlazy.c Hunk #1 FAILED at 2375. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/access/heap/vacuumlazy.c.rej patching file src/backend/access/transam/xlog.c Hunk #1 succeeded at 911 with fuzz 1 (offset 5 lines). Hunk #2 FAILED at 5753. [...] 1 out of 6 hunks FAILED -- saving rejects to file src/backend/access/transam/xlog.c.rej [...] patching file src/backend/commands/publicationcmds.c Hunk #1 FAILED at 813. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/commands/publicationcmds.c.rej patching file src/include/nodes/pathnodes.h Hunk #9 FAILED at 1516. [...] 1 out of 17 hunks FAILED -- saving rejects to file src/include/nodes/pathnodes.h.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
Hi,
thank you for the heads up! Those files ended up accidentally in the dump from me running pg_indent. The file count was truly excessive, I should have noticed this sooner.
I attached the patch without the excessive modifications. That should be way easier to read.
Regards
Arne
Attachment
Using some valuable feedback from Zhihong Yu, I fixed a flipped negation error and updated the comments.
Regards
Arne
Sent: Monday, January 17, 2022 12:25
To: Julien Rouhaud
Cc: pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tables
Hi,
thank you for the heads up! Those files ended up accidentally in the dump from me running pg_indent. The file count was truly excessive, I should have noticed this sooner.
I attached the patch without the excessive modifications. That should be way easier to read.
Regards
Arne
Attachment
Hmm, can you show cases of queries for which having this new partIndexlist changes plans? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Saca el libro que tu religión considere como el indicado para encontrar la oración que traiga paz a tu alma. Luego rebootea el computador y ve si funciona" (Carlos Duclós)
Hi!
Afaiac the join pruning where the outer table is a partitioned table is the relevant case.
I am not sure whether there are other cases.
The join pruning, which works great for plain relations since 9.0, falls short for partitioned tables, since the optimizer fails to prove uniqueness there.
In practical cases inner and outer tables are almost surely different ones, but I reattached a simpler example. It's the one, I came up with back in September.
I've seen this can be a reason to avoid partitioning for the time being, if the application relies on join pruning. I think generic views make it almost necessary to have it. If you had a different answer in mind, please don't hesitate to ask again.
Regards
Arne
Sent: Monday, January 17, 2022 7:16:08 PM
To: Arne Roland
Cc: Julien Rouhaud; pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tables
partIndexlist changes plans?
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)
Attachment
Hi, On Mon, Jan 17, 2022 at 08:32:40PM +0000, Arne Roland wrote: > > Afaiac the join pruning where the outer table is a partitioned table is the relevant case. The last version of the patch now fails on all platform, with plan changes. For instance: https://cirrus-ci.com/task/4825629131538432 https://api.cirrus-ci.com/v1/artifact/task/4825629131538432/regress_diffs/src/test/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/partition_join.out /tmp/cirrus-ci-build/src/test/regress/results/partition_join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/partition_join.out 2022-01-17 23:08:47.158198249 +0000 +++ /tmp/cirrus-ci-build/src/test/regress/results/partition_join.out 2022-01-17 23:12:34.163488567 +0000 @@ -4887,37 +4887,23 @@ SET enable_partitionwise_join = on; EXPLAIN (COSTS OFF) SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; - QUERY PLAN ------------------------------------------------------------------------ + QUERY PLAN +----------------------------------------------------------------- Limit - -> Merge Append - Sort Key: x.id - -> Merge Left Join - Merge Cond: (x_1.id = y_1.id) - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 - -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 - -> Merge Left Join - Merge Cond: (x_2.id = y_2.id) - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 - -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 -(11 rows) + -> Append + -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 +(4 rows) EXPLAIN (COSTS OFF) SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; - QUERY PLAN --------------------------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------------------------- Limit - -> Merge Append - Sort Key: x.id DESC - -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 - -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 - Index Cond: (id = x_1.id) - -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 - -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 - Index Cond: (id = x_2.id) -(11 rows) + -> Append + -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 +(4 rows)
On 2022-Jan-18, Julien Rouhaud wrote: > SET enable_partitionwise_join = on; > EXPLAIN (COSTS OFF) > SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; > - QUERY PLAN > ------------------------------------------------------------------------ > + QUERY PLAN > +----------------------------------------------------------------- > Limit > - -> Merge Append > - Sort Key: x.id > - -> Merge Left Join > - Merge Cond: (x_1.id = y_1.id) > - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 > - -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 > - -> Merge Left Join > - Merge Cond: (x_2.id = y_2.id) > - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 > - -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 > -(11 rows) > + -> Append > + -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 > + -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 > +(4 rows) Hmm, these plan changes look valid to me. A left self-join using the primary key column? That looks optimizable all right. I suspect that the author of partition-wise joins would want to change these queries so that whatever was being tested by these self-joins is tested by some other means (maybe just create an identical partitioned table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM fract_t). But at the same time, the author of this patch should a) make sure that the submitted patch updates these test results so that the test pass, and also b) add some test cases to verify that his desired behavior is tested somewhere, not just in a test case that's incidentally broken by his patch. What I still don't know is whether this patch is actually desirable or not. If the only cases it affects is self-joins, is there much actual value? -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Hi!
> [...]
> Hmm, these plan changes look valid to me. A left self-join using the
> primary key column? That looks optimizable all right.
> [...]
> What I still don't know is whether this patch is actually desirable or
> not. If the only cases it affects is self-joins, is there much actual
> value?
This is not really about self joins. That was just the most simple example, because otherwise we need a second table.
It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning.
> these queries so that whatever was being tested by these self-joins is
> tested by some other means (maybe just create an identical partitioned
> table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> fract_t). But at the same time, the author of this patch should
Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way.
> b) add some test cases to verify that his desired
> behavior is tested somewhere, not just in a test case that's
> incidentally broken by his patch.
My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql
Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future.
a) make
> test pass [...]
Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago.
I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests.
If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis.
Is there a simple way for me to check for that?
I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later.
Regards
Arne
Hi,
I came up with a slightly more intuitive way to force the different outcome for the fractional test, that does hardly change anything.
I'm not sure, whether the differentiation between fract_x and fract_t is worth it, since there shouldn't be any difference, but as mentioned before I added it for potential future clarity.
Thanks for your feedback again!
Regards
Arne
Sent: Wednesday, January 19, 2022 10:13:55 PM
To: Alvaro Herrera; Julien Rouhaud
Cc: pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tables
Hi!
> [...]
> Hmm, these plan changes look valid to me. A left self-join using the
> primary key column? That looks optimizable all right.
> [...]
> What I still don't know is whether this patch is actually desirable or
> not. If the only cases it affects is self-joins, is there much actual
> value?
This is not really about self joins. That was just the most simple example, because otherwise we need a second table.
It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning.
> these queries so that whatever was being tested by these self-joins is
> tested by some other means (maybe just create an identical partitioned
> table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> fract_t). But at the same time, the author of this patch should
Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way.
> b) add some test cases to verify that his desired
> behavior is tested somewhere, not just in a test case that's
> incidentally broken by his patch.
My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql
Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future.
a) make
> test pass [...]
Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago.
I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests.
If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis.
Is there a simple way for me to check for that?
I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later.
Regards
Arne
Attachment
Hi,
I came up with a slightly more intuitive way to force the different outcome for the fractional test, that does hardly change anything.
I'm not sure, whether the differentiation between fract_x and fract_t is worth it, since there shouldn't be any difference, but as mentioned before I added it for potential future clarity.
Thanks for your feedback again!
Regards
Arne
From: Arne Roland
Sent: Wednesday, January 19, 2022 10:13:55 PM
To: Alvaro Herrera; Julien Rouhaud
Cc: pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tablesHi!
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> [...]
> Hmm, these plan changes look valid to me. A left self-join using the
> primary key column? That looks optimizable all right.
> [...]
> What I still don't know is whether this patch is actually desirable or
> not. If the only cases it affects is self-joins, is there much actual
> value?
This is not really about self joins. That was just the most simple example, because otherwise we need a second table.
It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning.> I suspect that the author of partition-wise joins would want to change
> these queries so that whatever was being tested by these self-joins is
> tested by some other means (maybe just create an identical partitioned
> table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> fract_t). But at the same time, the author of this patch should
Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way.
> b) add some test cases to verify that his desired
> behavior is tested somewhere, not just in a test case that's
> incidentally broken by his patch.
My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql
Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future.
a) make> sure that the submitted patch updates these test results so that the
> test pass [...]
Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago.
I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests.@Álvaro Herrera:
If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis.
Is there a simple way for me to check for that?
I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later.
Regards
Arne
On 2022-Jan-19, Arne Roland wrote: > > a) make sure that the submitted patch updates these test results so > > that the test pass [...] > > Just for the record: I did run the tests, but I did miss that the > commit of Tomas fix for fractional optimization is already on the > master. Please note that this is a very new test case from a patch > committed less than one week ago. Ah, apologies, I didn't realize that that test was so new. > If anything I obviously need practical help, on how > I can catch on recent changes quicker and without manual intervention. > I don't have a modified buildfarm animal running, that tries to apply > my patch and run regression tests for my patch on a daily basis. See src/tools/ci/README (for multi-platform testing of patches on several platforms) and http://commitfest.cputube.org/ -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
> Subject: Re: missing indexes in indexlist with partitioned tables
>
> Hi,
>
> - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
> + if (inhparent && (!index->indisunique || indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX))
>
> The check on RELKIND_PARTITIONED_INDEX seems to negate what the comment above says:
>
> + * Don't add partitioned indexes to the indexlist
>
> Cheers
/*
* Don't add partitioned indexes to the indexlist, since they are
* not usable by the executor. If they are unique add them to the
* partindexlist instead, to use for further pruning. If they
* aren't that either, simply skip them.
*/
Sent: Wednesday, January 19, 2022 23:26
> Ah, apologies, I didn't realize that that test was so new.
No offense taken. Unless one was involved in the creation of the corresponding patch, it's unreasonable to know that. I like the second part of your message very much:
> See src/tools/ci/README (for multi-platform testing of patches on
> several platforms) and http://commitfest.cputube.org/
Thanks, I didn't know of cputube. Neat! That's pretty much what I was looking for!
Is there a way to get an email notification if some machine fails (turns bright red)? For the threads I'm explicitly subscribed to, that would seem helpful to me.
Regards
Arne
Hi, On Mon, Jan 24, 2022 at 9:30 PM Arne Roland <A.Roland@index.de> wrote: > The comment at my end goes on: > > /* > * Don't add partitioned indexes to the indexlist, since they are > * not usable by the executor. If they are unique add them to the > * partindexlist instead, to use for further pruning. If they > * aren't that either, simply skip them. > */ "partindexlist" really made me think about a list of "partial indexes" for some reason. I think maybe "partedindexlist" is what you are looking for; "parted" is commonly used as short for "partitioned" when naming variables. The comment only mentions "further pruning" as to what partitioned indexes are to be remembered in RelOptInfo, but it's not clear what that means. It may help to be more specific. Finally, I don't understand why we need a separate field to store indexes found in partitioned base relations. AFAICS, nothing but the sites you are interested in (relation_has_unique_index_for() and rel_supports_distinctness()) would ever bother to look at a partitioned base relation's indexlist. Do you think putting them into in indexlist might break something? > Regarding the semantics: This is sort of what the statement checks for (skip for inhparent, if not unique or not partitionedindex), i.e. it checks for the case, where the index shouldn't be added to either list. > > Side note: I personally think the name inhparent is mildly confusing, since it's not really about inheritance. I don'thave a significantly better idea though. Partitioned tables are "inheritance parent", so share the same code as what traditional inheritance parents have always used for planning. -- Amit Langote EDB: http://www.enterprisedb.com
Sent: Tuesday, January 25, 2022 09:04
Subject: Re: missing indexes in indexlist with partitioned tables
> [...]
> "partindexlist" really made me think about a list of "partial indexes"
> for some reason. I think maybe "partedindexlist" is what you are
> looking for; "parted" is commonly used as short for "partitioned" when
> naming variables.
>
> The comment only mentions "further pruning" as to what partitioned
> indexes are to be remembered in RelOptInfo, but it's not clear what
> that means. It may help to be more specific.
Thanks for the feedback! I've changed that. The current version is attached.
> Finally, I don't understand why we need a separate field to store
> indexes found in partitioned base relations. AFAICS, nothing but the
> sites you are interested in (relation_has_unique_index_for() and
> rel_supports_distinctness()) would ever bother to look at a
> partitioned base relation's indexlist. Do you think putting them into
> in indexlist might break something?
I have thought about that before. AFAICT there is nothing in core, which breaks. However I am not sure, I want to mix those two kinds of index nodes. First of all the structure is different, partedIndexes don't have physical attributes after all. This is technical implementation detail relating to the current promise, that entries of the indexlist are indexes we can use. And by use, I mean use for statistics or the executor.
I'm more concerned about future changes regarding the order and optimization of processing harder here. The order in which we do things in the planner is a bit messy, and I wouldn't mind seeing details about that change. Looking at the current wacky order in the optimizer, I'm not convinced, that nothing will want to have a look at the indexlist, before partitioned tables are unpacked.
Since it would be easy to introduce this new variable later, wouldn't mind adding it to the indexlist directly for now. But changing the underlying promise of what it contains, seems noteworthy and more intrusive to me.
> > Side note: I personally think the name inhparent is mildly confusing, since it's not really about inheritance. I don't have a significantly better idea though.
>
> Partitioned tables are "inheritance parent", so share the same code as
> what traditional inheritance parents have always used for planning.
I recall that manual partitioning via inheritance, that was cumbersome. Though that minor historical detail was not, what I was referring to. There are a lot of other cases, that cause us to set inhparent. IIRC we use this flag in some ddl commands, which have nothing to do with inheritance. It essentially is used as a variant to skip the indexlist creation. If such hacks weren't there, we could simply check for the relkind and indisunique.
Regards
Arne
Attachment
Hi!
Attached a rebased version of the patch.
Regards
Arne
Sent: Monday, January 31, 2022 19:14
To: Amit Langote
Cc: Zhihong Yu; Alvaro Herrera; Julien Rouhaud; pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tables
Sent: Tuesday, January 25, 2022 09:04
Subject: Re: missing indexes in indexlist with partitioned tables
> [...]
> "partindexlist" really made me think about a list of "partial indexes"
> for some reason. I think maybe "partedindexlist" is what you are
> looking for; "parted" is commonly used as short for "partitioned" when
> naming variables.
>
> The comment only mentions "further pruning" as to what partitioned
> indexes are to be remembered in RelOptInfo, but it's not clear what
> that means. It may help to be more specific.
Thanks for the feedback! I've changed that. The current version is attached.
> Finally, I don't understand why we need a separate field to store
> indexes found in partitioned base relations. AFAICS, nothing but the
> sites you are interested in (relation_has_unique_index_for() and
> rel_supports_distinctness()) would ever bother to look at a
> partitioned base relation's indexlist. Do you think putting them into
> in indexlist might break something?
I have thought about that before. AFAICT there is nothing in core, which breaks. However I am not sure, I want to mix those two kinds of index nodes. First of all the structure is different, partedIndexes don't have physical attributes after all. This is technical implementation detail relating to the current promise, that entries of the indexlist are indexes we can use. And by use, I mean use for statistics or the executor.
I'm more concerned about future changes regarding the order and optimization of processing harder here. The order in which we do things in the planner is a bit messy, and I wouldn't mind seeing details about that change. Looking at the current wacky order in the optimizer, I'm not convinced, that nothing will want to have a look at the indexlist, before partitioned tables are unpacked.
Since it would be easy to introduce this new variable later, wouldn't mind adding it to the indexlist directly for now. But changing the underlying promise of what it contains, seems noteworthy and more intrusive to me.
> > Side note: I personally think the name inhparent is mildly confusing, since it's not really about inheritance. I don't have a significantly better idea though.
>
> Partitioned tables are "inheritance parent", so share the same code as
> what traditional inheritance parents have always used for planning.
I recall that manual partitioning via inheritance, that was cumbersome. Though that minor historical detail was not, what I was referring to. There are a lot of other cases, that cause us to set inhparent. IIRC we use this flag in some ddl commands, which have nothing to do with inheritance. It essentially is used as a variant to skip the indexlist creation. If such hacks weren't there, we could simply check for the relkind and indisunique.
Regards
Arne
Attachment
On Wed, 3 Aug 2022 at 11:07, Arne Roland <A.Roland@index.de> wrote: > Attached a rebased version of the patch. Firstly, I agree that we should fix the issue of join removals not working with partitioned tables. I had a quick look over this and the first thing that I thought was the same as what Amit mentioned in: On Tue, 25 Jan 2022 at 21:04, Amit Langote <amitlangote09@gmail.com> wrote: > Finally, I don't understand why we need a separate field to store > indexes found in partitioned base relations. AFAICS, nothing but the > sites you are interested in (relation_has_unique_index_for() and > rel_supports_distinctness()) would ever bother to look at a > partitioned base relation's indexlist. Do you think putting them into > in indexlist might break something? I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is the place to store these details. That commit added the following comment: /* * Ignore partitioned indexes, since they are not usable for * queries. */ But neither are hypothetical indexes either, yet they're added to RelOptInfo.indexlist. I think the patch should be changed so that the existing list is used and we find another fix for the problems Alvaro fixed in 05fb5d661. Unfortunately, there was no discussion marked on that commit message, so it's not quite clear what the problem was. I'm unsure if there was anything other than CLUSTER that was broken. I see that cfdd03f45 added CLUSTER for partitioned tables in v15. I think the patch would need to go over the usages of RelOptInfo.indexlist to make sure that we don't need to add any further conditions to skip their usage for partitioned tables. I wrote the attached patch as I wanted to see what would break if we did this. The only problem I got from running make check was in get_actual_variable_range(), so I just changed that so it returns false when the given rel is a partitioned table. I only quickly did the changes to get_relation_info() and didn't give much thought to what the can* bool flags should be set to. I just mostly skipped all that code because it was crashing on relation->rd_tableam->scan_bitmap_next_block due to the rd_tableam being NULL. Also, just a friendly tip, Arne; I saw you named your patch 0006 for version 6. You'll see many 000n patches around the list, but those are generally done with git format-patch. That number normally means the patch in the patch series, not the version of the patch. I'm not sure if it'll help any, but my workflow for writing new patches against master tends to be: git checkout master git checkout -b some_feature_branch # write some code git commit -a # maybe more code git commit -a git format-patch -v1 master That'll create v1-0001 and v1-0002 patches. When I'm onto v2, I just change the version number from -v1 to -v2. David
Attachment
On 2022-Sep-16, David Rowley wrote: > I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is > the place to store these details. That commit added the following > comment: > > /* > * Ignore partitioned indexes, since they are not usable for > * queries. > */ > > But neither are hypothetical indexes either, yet they're added to > RelOptInfo.indexlist. > > I think the patch should be changed so that the existing list is used > and we find another fix for the problems Alvaro fixed in 05fb5d661. > Unfortunately, there was no discussion marked on that commit message, > so it's not quite clear what the problem was. I'm unsure if there was > anything other than CLUSTER that was broken. After a bit of trawling through the archives, I found it here: https://www.postgresql.org/message-id/20180124162006.pmapfiznhgngwtjf%40alvherre.pgsql I think there was insufficient discussion and you're probably right that it wasn't the best fix. I don't object to finding another fix. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Sep-16, David Rowley wrote: >> I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is >> the place to store these details. That commit added the following >> comment: >> >> /* >> * Ignore partitioned indexes, since they are not usable for >> * queries. >> */ >> >> But neither are hypothetical indexes either, yet they're added to >> RelOptInfo.indexlist. >> >> I think the patch should be changed so that the existing list is used >> and we find another fix for the problems Alvaro fixed in 05fb5d661. >> Unfortunately, there was no discussion marked on that commit message, >> so it's not quite clear what the problem was. I'm unsure if there was >> anything other than CLUSTER that was broken. > After a bit of trawling through the archives, I found it here: > https://www.postgresql.org/message-id/20180124162006.pmapfiznhgngwtjf%40alvherre.pgsql > I think there was insufficient discussion and you're probably right that > it wasn't the best fix. I don't object to finding another fix. FWIW, I don't see any big problem with what you did. We'd need to do something more like what David suggests if the planner ever has a reason to consider partitioned indexes. But as long as it does not, why expend the time to build data structures representing them? And we'd have to add code in quite a few places to ignore them, once they're in indexlist. regards, tom lane
On Sun, 18 Sept 2022 at 07:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > After a bit of trawling through the archives, I found it here: > > https://www.postgresql.org/message-id/20180124162006.pmapfiznhgngwtjf%40alvherre.pgsql > > I think there was insufficient discussion and you're probably right that > > it wasn't the best fix. I don't object to finding another fix. > > FWIW, I don't see any big problem with what you did. We'd need to > do something more like what David suggests if the planner ever has > a reason to consider partitioned indexes. But as long as it does > not, why expend the time to build data structures representing them? Did you miss the report about left join removals not working with partitioned tables due to lack of unique proofs? That seems like a good enough reason to me. > And we'd have to add code in quite a few places to ignore them, > once they're in indexlist. I think the same is true for "hypothetical" indexes. Maybe that would be a good field to grep on to find the places that need to be addressed. David
On Fri, Sep 16, 2022 at 1:08 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, 3 Aug 2022 at 11:07, Arne Roland <A.Roland@index.de> wrote: > > Attached a rebased version of the patch. > Firstly, I agree that we should fix the issue of join removals not > working with partitioned tables. Agreed, though the patch's changes to tests does not seem to have to do with join removal? I don't really understand what the test changes are all about. I wonder why the patch doesn't instead add the test case that Arne showed in the file he attached with [1]. > I think the patch should be changed so that the existing list is used > and we find another fix for the problems Alvaro fixed in 05fb5d661. > Unfortunately, there was no discussion marked on that commit message, > so it's not quite clear what the problem was. I'm unsure if there was > anything other than CLUSTER that was broken. I see that cfdd03f45 > added CLUSTER for partitioned tables in v15. I think the patch would > need to go over the usages of RelOptInfo.indexlist to make sure that > we don't need to add any further conditions to skip their usage for > partitioned tables. > > I wrote the attached patch as I wanted to see what would break if we > did this. The only problem I got from running make check was in > get_actual_variable_range(), so I just changed that so it returns > false when the given rel is a partitioned table. I only quickly did > the changes to get_relation_info() and didn't give much thought to > what the can* bool flags should be set to. I just mostly skipped all > that code because it was crashing on > relation->rd_tableam->scan_bitmap_next_block due to the rd_tableam > being NULL. Yeah, it makes sense to just skip the portion of the code that reads from rd_indam, as your patch does. + if (indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) { + /* We copy just the fields we need, not all of rd_indam */ + amroutine = indexRelation->rd_indam; Maybe you're intending to add one before committing but there should be a comment mentioning why the am* initializations are being skipped over for partitioned index IndexOptInfos. I'd think that's because we are not going to be building any paths using them for now. The following portion of the top comment of get_relation_info() perhaps needs an update. * If inhparent is true, all we need to do is set up the attr arrays: * the RelOptInfo actually represents the appendrel formed by an inheritance * tree, and so the parent rel's physical size and index information isn't * important for it. */ -- Thanks, Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/2641568c18de40e8b1528fc9d4d80127%40index.de
Thank you for having a look at the patch. On Tue, 20 Sept 2022 at 18:41, Amit Langote <amitlangote09@gmail.com> wrote: > Agreed, though the patch's changes to tests does not seem to have to > do with join removal? I don't really understand what the test changes > are all about. I wonder why the patch doesn't instead add the test > case that Arne showed in the file he attached with [1]. > [1] https://www.postgresql.org/message-id/2641568c18de40e8b1528fc9d4d80127%40index.de I adjusted a test in partition_join.sql to add an additional column to the fract_t table. Before the change that table only had a single column and due to the query's join condition being USING(id), none of the columns from the left joined table were being used. That resulted in the updated code performing a left join removal as it was passing the checks for no columns being used in the left joined table in analyzejoins.c. The test in partition_join.sql claims to be testing "partitionwise join with fractional paths", so I thought we'd better not have a query that the planner removes the join when we're meant to be testing joins. It probably wouldn't hurt to have a new test to ensure left join removals work with a partitioned table. That should go in join.sql along with the other join removal tests. I didn't study Arne's patch to see what test he added. I was only interested in writing enough code so I could check there was no good reason not to add the partitioned index into RelOptInfo.indexlist. Arne sent me an off-list message to say he's planning on working on the patch that uses the existing field instead of the new one he originally added. Let's hold off for that patch. David
On Tue, Sep 20, 2022 at 4:53 PM David Rowley <dgrowleyml@gmail.com> wrote: > Thank you for having a look at the patch. > > On Tue, 20 Sept 2022 at 18:41, Amit Langote <amitlangote09@gmail.com> wrote: > > Agreed, though the patch's changes to tests does not seem to have to > > do with join removal? I don't really understand what the test changes > > are all about. I wonder why the patch doesn't instead add the test > > case that Arne showed in the file he attached with [1]. > > [1] https://www.postgresql.org/message-id/2641568c18de40e8b1528fc9d4d80127%40index.de > > I adjusted a test in partition_join.sql to add an additional column to > the fract_t table. Before the change that table only had a single > column and due to the query's join condition being USING(id), none of > the columns from the left joined table were being used. That resulted > in the updated code performing a left join removal as it was passing > the checks for no columns being used in the left joined table in > analyzejoins.c. The test in partition_join.sql claims to be testing > "partitionwise join with fractional paths", so I thought we'd better > not have a query that the planner removes the join when we're meant to > be testing joins. Ah, got it, thanks for the explanation. > It probably wouldn't hurt to have a new test to ensure left join > removals work with a partitioned table. That should go in join.sql > along with the other join removal tests. Makes sense. > I didn't study Arne's patch > to see what test he added. I was only interested in writing enough > code so I could check there was no good reason not to add the > partitioned index into RelOptInfo.indexlist. Arne sent me an off-list > message to say he's planning on working on the patch that uses the > existing field instead of the new one he originally added. Let's hold > off for that patch. Ok, sure. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
On Tue, Sep 20, 2022 at 4:53 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Arne sent me an off-list
> message to say he's planning on working on the patch that uses the
> existing field instead of the new one he originally added. Let's hold
> off for that patch.
I wouldn't say, I explicitly stated that. But I ended up doing something, that resulted in the attached patch. :)
For my own sanity I greped one last time for the usage of indexlist.
Most of the (untouched) usages have comments that, they are only called for baserels/plain tables.
Namely all but the cluster of partitioned tables. I had to reread that section. There we are just traversing the tree and omitting partitioned tables.
There is now a test section in join.sql for partitioned tables, that tests very similar to the
Regards
Arne
Attachment
On Sun, 2 Oct 2022 at 05:34, Arne Roland <A.Roland@index.de> wrote: > > On Tue, Sep 20, 2022 at 4:53 PM David Rowley <dgrowleyml@gmail.com> wrote: > > Arne sent me an off-list > > message to say he's planning on working on the patch that uses the > > existing field instead of the new one he originally added. Let's hold > > off for that patch. > > I wouldn't say, I explicitly stated that. But I ended up doing something, that resulted in the attached patch. :) I stand corrected. You said you'd think about it, not do it. Anyway, thanks for doing it :) > Further feedback would be appreciated! I had a quick look through the patch. Here are a few things that would be good to adjust. I understand that some of these things were how I left them in the patch I sent. In my defence, I mostly did that very quickly just so I could see if there was some issue with having the partitioned indexes in indexlist. I didn't actually put much effort into addressing the fine details of how that should be done. * In the header comment in get_relation_info(), I don't think we need to mention join removals explicitly. At a stretch, maybe mentioning "unique proofs" might be ok, but "various optimizations" might be better. If you mention "join removal", I fear that will just become outdated too quickly as further optimisations are added. Likewise for the comment about "join pruning" you've added in the function body. FWIW, we call these "join removals" anyway. * I think we should put RelationGetNumberOfBlocks() back to what it was and just ensure we don't call that for partitioned indexes in get_relation_info(). (Yes, I know that was my change) * I can't quite figure out why you're doing "DROP TABLE a CASCADE;" in inherits.sql. You've not changed anything else in that file. Did you mean to do this in join.sql? * The new test in join.sql. I understand that you've mostly copied the test from another place in the file and adjusted it to use a partitioned table. However, I don't really think you need to INSERT any data into those tables. I also think that using the table name of "a" is dangerous as it could conflict with another table by the same name in a parallel run of the tests. The non-partitioned version is a TEMP table. Also, it's slightly painful to look at the inconsistent capitalisation of SQL keywords in those queries you've added, again, I understand those are copied from above, but I see no need to duplicate the inconsistencies. Perhaps it's fine to copy the upper case keywords in the DDL and keep all lowercase in the queries. Also, I think you probably should just add a single simple join removal test for partitioned tables. You're not exercising any code that the non-partitioned case isn't by adding any additional tests. All that I think is worth testing here is ensuring nobody thinks that partitioned tables can get away with an empty indexlist again. * I had a bit of a 2nd thought on the test change in partition_join.sql. I know I added the "c" column so that join removals didn't apply. I'm now thinking it's probably better to just change the queries to use JOIN ON rather than JOIN USING. Also, apply the correct alias to the ORDER BY. This method should save from slowing the test down due to the additional column. We have some pretty slow buildfarm machines that this might actually make a meaningful difference to. Thanks again for working on this. David
David Rowley <dgrowleyml@gmail.com> writes: > * I can't quite figure out why you're doing "DROP TABLE a CASCADE;" in > inherits.sql. You've not changed anything else in that file. Did you > mean to do this in join.sql? Doing that would be a bad idea no matter where it's done. IIRC, those tables are intentionally set up to stress later dump/restore tests (with issues like inheritance children having column order different from their parents). regards, tom lane
> From: David Rowley <dgrowleyml@gmail.com>
> Sent: Monday, October 3, 2022 00:51
>
> * In the header comment in get_relation_info(), I don't think we need
> to mention join removals explicitly. At a stretch, maybe mentioning
> "unique proofs" might be ok, but "various optimizations" might be
> better. If you mention "join removal", I fear that will just become
> outdated too quickly as further optimisations are added. Likewise for
> the comment about "join pruning" you've added in the function body.
> FWIW, we call these "join removals" anyway.
I made them a little bit more vague. I also updated the description of indexlist in general.
> * I think we should put RelationGetNumberOfBlocks() back to what it
> was and just ensure we don't call that for partitioned indexes in
> get_relation_info(). (Yes, I know that was my change)
I don't think it's relevant who did it. I don't see much importance either way. I reverted it to the old state.
> * I can't quite figure out why you're doing "DROP TABLE a CASCADE;" in
> inherits.sql. You've not changed anything else in that file. Did you
> mean to do this in join.sql?
The problem I encountered, was that simple copy of the test wasn't possible, because the tables were named the same way. It seemed intuitive to me to make the tests , such that there are no side-effects. I added a comment to the creation of those tables to make clear, that there are intended side effects by not dropping those tables.
> * The new test in join.sql. I understand that you've mostly copied the
> test from another place in the file and adjusted it to use a
> partitioned table. However, I don't really think you need to INSERT
> any data into those tables. I also think that using the table name of
> "a" is dangerous as it could conflict with another table by the same
> name in a parallel run of the tests. The non-partitioned version is a
> TEMP table. Also, it's slightly painful to look at the inconsistent
> capitalisation of SQL keywords in those queries you've added, again, I
> understand those are copied from above, but I see no need to duplicate
> the inconsistencies. Perhaps it's fine to copy the upper case
> keywords in the DDL and keep all lowercase in the queries. Also, I
> think you probably should just add a single simple join removal test
> for partitioned tables. You're not exercising any code that the
> non-partitioned case isn't by adding any additional tests. All that I
> think is worth testing here is ensuring nobody thinks that partitioned
> tables can get away with an empty indexlist again.
I am not sure, how thorough the tests on partitioned tables need to be. I guess, I will turn up more issues in production, than any test will be able to cover.
As a general sentiment, I wouldn't agree. The empty indexlist isn't the only interesting thing to test. The more we add optimizations, the more non trivial intersections of those start to break things again, we have fixed. A notable part of the complexity of the optimizer stems from the fact, that we apply most transformations in a fixed order. We obviously have to do that for performance reasons. But as long as we have that, we are prone to have cases where the ordering breaks part. Partitioned tables are a prominent cases here, because we always have the appendrel.
> * I had a bit of a 2nd thought on the test change in
> partition_join.sql. I know I added the "c" column so that join
> removals didn't apply. I'm now thinking it's probably better to just
> change the queries to use JOIN ON rather than JOIN USING. Also, apply
> the correct alias to the ORDER BY. This method should save from
> slowing the test down due to the additional column. We have some
> pretty slow buildfarm machines that this might actually make a
> meaningful difference to.
There is no real point in changing this, because we can just access the column that is hidden anyways to make the join removal impossible.
That is sort of the tick my version v6 was going for. Tbh I don't care much either way as long as the test still tests for the fractional merge append. I just switched back.
Afaiac creating and dropping a table is sort of the worst thing we can do, when thinking about tests. There is just so much work involved there. If I am concerned about test runtimes, I'd try to minimize that. This is even more true regarding tests for partitioned tables. To get a parted table with two partitions, we always have to create three tables. I do think there is a lot of potential to speed up the test times with that, but I'd suggest to handle that in a different thread.
> Thanks again for working on this.
Thank you for your input!
Arne
Attachment
Hi, On 2022-11-02 01:50:38 +0000, Arne Roland wrote: > I mainly changed the comments, the Assert and some casing. The tests have been failing for a while https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/40/3452 https://api.cirrus-ci.com/v1/task/6190372803051520/logs/cores.log #2 0x00005645dff192f6 in ExceptionalCondition (conditionName=conditionName@entry=0x5645e014b167 "false", fileName=fileName@entry=0x5645e0196b08"../src/backend/storage/buffer/bufmgr.c", lineNumber=lineNumber@entry=2971) at ../src/backend/utils/error/assert.c:66 No locals. #3 0x00005645dfc13823 in RelationGetNumberOfBlocksInFork (relation=relation@entry=0x7fb54d54e470, forkNum=forkNum@entry=MAIN_FORKNUM)at ../src/backend/storage/buffer/bufmgr.c:2971 No locals. #4 0x00005645dfa9ac5e in get_relation_info (root=root@entry=0x5645e1ed9840, relationObjectId=16660, inhparent=<optimizedout>, rel=rel@entry=0x5645e2086b38) at ../src/backend/optimizer/util/plancat.c:442 indexoid = <optimized out> info = 0x5645e2083b28 i = <optimized out> indexRelation = 0x7fb54d54e470 index = 0x7fb54d548c48 amroutine = <optimized out> ncolumns = 1 nkeycolumns = 1 l__state = {l = <optimized out>, i = <optimized out>} indexoidlist = 0x5645e2088a98 lmode = 1 l = <optimized out> varno = 1 relation = 0x7fb54d54e680 hasindex = <optimized out> indexinfos = 0x0 __func__ = "get_relation_info" #5 0x00005645dfaa5e25 in build_simple_rel (root=0x5645e1ed9840, relid=1, parent=parent@entry=0x0) at ../src/backend/optimizer/util/relnode.c:293 rel = 0x5645e2086b38 rte = 0x5645e1ed8fc8 __func__ = "build_simple_rel" ... Greetings, Andres Freund
Thank you!
Sadly I didn't manage how to reproduce that locally. check-world doesn't seem to fail at my end.
That being said, attached patch should fix the issue reported below.
I'll have another look at the log later this week.
Regards
Arne
Sent: Tuesday, November 22, 2022 2:36:59 AM
To: Arne Roland
Cc: David Rowley; Amit Langote; pgsql-hackers; Zhihong Yu; Alvaro Herrera; Julien Rouhaud
Subject: Re: missing indexes in indexlist with partitioned tables
On 2022-11-02 01:50:38 +0000, Arne Roland wrote:
> I mainly changed the comments, the Assert and some casing.
The tests have been failing for a while
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/40/3452
cirrus-ci.com Cirrus CI makes your development cycle fast, efficient, and secure by leveraging modern cloud technologies. |
https://api.cirrus-ci.com/v1/task/6190372803051520/logs/cores.log
#2 0x00005645dff192f6 in ExceptionalCondition (conditionName=conditionName@entry=0x5645e014b167 "false", fileName=fileName@entry=0x5645e0196b08 "../src/backend/storage/buffer/bufmgr.c", lineNumber=lineNumber@entry=2971) at ../src/backend/utils/error/assert.c:66
No locals.
#3 0x00005645dfc13823 in RelationGetNumberOfBlocksInFork (relation=relation@entry=0x7fb54d54e470, forkNum=forkNum@entry=MAIN_FORKNUM) at ../src/backend/storage/buffer/bufmgr.c:2971
No locals.
#4 0x00005645dfa9ac5e in get_relation_info (root=root@entry=0x5645e1ed9840, relationObjectId=16660, inhparent=<optimized out>, rel=rel@entry=0x5645e2086b38) at ../src/backend/optimizer/util/plancat.c:442
indexoid = <optimized out>
info = 0x5645e2083b28
i = <optimized out>
indexRelation = 0x7fb54d54e470
index = 0x7fb54d548c48
amroutine = <optimized out>
ncolumns = 1
nkeycolumns = 1
l__state = {l = <optimized out>, i = <optimized out>}
indexoidlist = 0x5645e2088a98
lmode = 1
l = <optimized out>
varno = 1
relation = 0x7fb54d54e680
hasindex = <optimized out>
indexinfos = 0x0
__func__ = "get_relation_info"
#5 0x00005645dfaa5e25 in build_simple_rel (root=0x5645e1ed9840, relid=1, parent=parent@entry=0x0) at ../src/backend/optimizer/util/relnode.c:293
rel = 0x5645e2086b38
rte = 0x5645e1ed8fc8
__func__ = "build_simple_rel"
...
Greetings,
Andres Freund
Attachment
On Tue, 6 Dec 2022 at 13:43, Arne Roland <A.Roland@index.de> wrote: > That being said, attached patch should fix the issue reported below. I took a look over the v10 patch and ended up making adjustments to the tests. I didn't quite see the need for the test to be as extensive as you had them in v10. Neither join removals nor unique joins treat partitioned tables any differently from normal tables, so I think it's fine just to have a single test that makes sure join removals work on partitioned tables. I didn't feel inclined to add a test for unique joins. The test I added is mainly just there to make sure something fails if someone decides partitioned tables don't need the indexlist populated at some point in the future. The other changes I made were just cosmetic. I pushed the result. David