Re: missing indexes in indexlist with partitioned tables - Mailing list pgsql-hackers
From | Arne Roland |
---|---|
Subject | Re: missing indexes in indexlist with partitioned tables |
Date | |
Msg-id | 99fad49d2cd341f38bcbf62e495eb510@index.de Whole thread Raw |
In response to | Re: missing indexes in indexlist with partitioned tables (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: missing indexes in indexlist with partitioned tables
|
List | pgsql-hackers |
Hi!
I mainly changed the comments, the Assert and some casing.
> From: David Rowley <dgrowleyml@gmail.com>
> Sent: Monday, October 3, 2022 00:51
>
> * In the header comment in get_relation_info(), I don't think we need
> to mention join removals explicitly. At a stretch, maybe mentioning
> "unique proofs" might be ok, but "various optimizations" might be
> better. If you mention "join removal", I fear that will just become
> outdated too quickly as further optimisations are added. Likewise for
> the comment about "join pruning" you've added in the function body.
> FWIW, we call these "join removals" anyway.
I made them a little bit more vague. I also updated the description of indexlist in general.
> * I think we should put RelationGetNumberOfBlocks() back to what it
> was and just ensure we don't call that for partitioned indexes in
> get_relation_info(). (Yes, I know that was my change)
I don't think it's relevant who did it. I don't see much importance either way. I reverted it to the old state.
> * I can't quite figure out why you're doing "DROP TABLE a CASCADE;" in
> inherits.sql. You've not changed anything else in that file. Did you
> mean to do this in join.sql?
The problem I encountered, was that simple copy of the test wasn't possible, because the tables were named the same way. It seemed intuitive to me to make the tests , such that there are no side-effects. I added a comment to the creation of those tables to make clear, that there are intended side effects by not dropping those tables.
> * The new test in join.sql. I understand that you've mostly copied the
> test from another place in the file and adjusted it to use a
> partitioned table. However, I don't really think you need to INSERT
> any data into those tables. I also think that using the table name of
> "a" is dangerous as it could conflict with another table by the same
> name in a parallel run of the tests. The non-partitioned version is a
> TEMP table. Also, it's slightly painful to look at the inconsistent
> capitalisation of SQL keywords in those queries you've added, again, I
> understand those are copied from above, but I see no need to duplicate
> the inconsistencies. Perhaps it's fine to copy the upper case
> keywords in the DDL and keep all lowercase in the queries. Also, I
> think you probably should just add a single simple join removal test
> for partitioned tables. You're not exercising any code that the
> non-partitioned case isn't by adding any additional tests. All that I
> think is worth testing here is ensuring nobody thinks that partitioned
> tables can get away with an empty indexlist again.
I am not sure, how thorough the tests on partitioned tables need to be. I guess, I will turn up more issues in production, than any test will be able to cover.
As a general sentiment, I wouldn't agree. The empty indexlist isn't the only interesting thing to test. The more we add optimizations, the more non trivial intersections of those start to break things again, we have fixed. A notable part of the complexity of the optimizer stems from the fact, that we apply most transformations in a fixed order. We obviously have to do that for performance reasons. But as long as we have that, we are prone to have cases where the ordering breaks part. Partitioned tables are a prominent cases here, because we always have the appendrel.
> From: David Rowley <dgrowleyml@gmail.com>
> Sent: Monday, October 3, 2022 00:51
>
> * In the header comment in get_relation_info(), I don't think we need
> to mention join removals explicitly. At a stretch, maybe mentioning
> "unique proofs" might be ok, but "various optimizations" might be
> better. If you mention "join removal", I fear that will just become
> outdated too quickly as further optimisations are added. Likewise for
> the comment about "join pruning" you've added in the function body.
> FWIW, we call these "join removals" anyway.
I made them a little bit more vague. I also updated the description of indexlist in general.
> * I think we should put RelationGetNumberOfBlocks() back to what it
> was and just ensure we don't call that for partitioned indexes in
> get_relation_info(). (Yes, I know that was my change)
I don't think it's relevant who did it. I don't see much importance either way. I reverted it to the old state.
> * I can't quite figure out why you're doing "DROP TABLE a CASCADE;" in
> inherits.sql. You've not changed anything else in that file. Did you
> mean to do this in join.sql?
The problem I encountered, was that simple copy of the test wasn't possible, because the tables were named the same way. It seemed intuitive to me to make the tests , such that there are no side-effects. I added a comment to the creation of those tables to make clear, that there are intended side effects by not dropping those tables.
> * The new test in join.sql. I understand that you've mostly copied the
> test from another place in the file and adjusted it to use a
> partitioned table. However, I don't really think you need to INSERT
> any data into those tables. I also think that using the table name of
> "a" is dangerous as it could conflict with another table by the same
> name in a parallel run of the tests. The non-partitioned version is a
> TEMP table. Also, it's slightly painful to look at the inconsistent
> capitalisation of SQL keywords in those queries you've added, again, I
> understand those are copied from above, but I see no need to duplicate
> the inconsistencies. Perhaps it's fine to copy the upper case
> keywords in the DDL and keep all lowercase in the queries. Also, I
> think you probably should just add a single simple join removal test
> for partitioned tables. You're not exercising any code that the
> non-partitioned case isn't by adding any additional tests. All that I
> think is worth testing here is ensuring nobody thinks that partitioned
> tables can get away with an empty indexlist again.
I am not sure, how thorough the tests on partitioned tables need to be. I guess, I will turn up more issues in production, than any test will be able to cover.
As a general sentiment, I wouldn't agree. The empty indexlist isn't the only interesting thing to test. The more we add optimizations, the more non trivial intersections of those start to break things again, we have fixed. A notable part of the complexity of the optimizer stems from the fact, that we apply most transformations in a fixed order. We obviously have to do that for performance reasons. But as long as we have that, we are prone to have cases where the ordering breaks part. Partitioned tables are a prominent cases here, because we always have the appendrel.
I removed some test cases here to half the amount of partitioned tables needed here. I don't see the value in having one simple explain less. But I do not have strong feelings about this. Are there any further opinions?
> * I had a bit of a 2nd thought on the test change in
> partition_join.sql. I know I added the "c" column so that join
> removals didn't apply. I'm now thinking it's probably better to just
> change the queries to use JOIN ON rather than JOIN USING. Also, apply
> the correct alias to the ORDER BY. This method should save from
> slowing the test down due to the additional column. We have some
> pretty slow buildfarm machines that this might actually make a
> meaningful difference to.
There is no real point in changing this, because we can just access the column that is hidden anyways to make the join removal impossible.
That is sort of the tick my version v6 was going for. Tbh I don't care much either way as long as the test still tests for the fractional merge append. I just switched back.
Afaiac creating and dropping a table is sort of the worst thing we can do, when thinking about tests. There is just so much work involved there. If I am concerned about test runtimes, I'd try to minimize that. This is even more true regarding tests for partitioned tables. To get a parted table with two partitions, we always have to create three tables. I do think there is a lot of potential to speed up the test times with that, but I'd suggest to handle that in a different thread.
> Thanks again for working on this.
Thank you for your input!
Arne
> * I had a bit of a 2nd thought on the test change in
> partition_join.sql. I know I added the "c" column so that join
> removals didn't apply. I'm now thinking it's probably better to just
> change the queries to use JOIN ON rather than JOIN USING. Also, apply
> the correct alias to the ORDER BY. This method should save from
> slowing the test down due to the additional column. We have some
> pretty slow buildfarm machines that this might actually make a
> meaningful difference to.
There is no real point in changing this, because we can just access the column that is hidden anyways to make the join removal impossible.
That is sort of the tick my version v6 was going for. Tbh I don't care much either way as long as the test still tests for the fractional merge append. I just switched back.
Afaiac creating and dropping a table is sort of the worst thing we can do, when thinking about tests. There is just so much work involved there. If I am concerned about test runtimes, I'd try to minimize that. This is even more true regarding tests for partitioned tables. To get a parted table with two partitions, we always have to create three tables. I do think there is a lot of potential to speed up the test times with that, but I'd suggest to handle that in a different thread.
> Thanks again for working on this.
Thank you for your input!
Arne
Attachment
pgsql-hackers by date: