Thread: POC: GROUP BY optimization
Hi! As far as I known, columns in GROUP BY could be reordered without loss of correctness. But reorder could give a benefits in performance. Suggested patch implements that in several ways: 1) Reorder GROUP BY columns by number of unique values in descent order. Simple example shows a performance difference by two times (see opt_group_by_demostrator.sql, on my notebook first two queries demonstrate that). The idea is: if first column is unique then sort comparator will never compute difference of following columns. 2) Reorder GROUP BY columns to match existing pathkeys which are result of index scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win. 3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER BY or merge join case it doesn't pay attention to actual order of columns because of 2) Patch doesn't change any cost estimation computation, although 1) could take an advantage of it. Some issues/problems/notices: 1) I didn't play around GROUPING SETS at all. As I can see, current coding prevents any optimization around it and, suppose, it should be addressed to another patch 2) I'm not completely satisfied with counting of number of groups per column, it looks ugly without refactoring estimate_num_groups(). At least, now it could be called multiple times for each column. May be, this number should be added to PathKey structure? 3) EquivalenceClass->ec_sortref. If planner faced with column first time in WHERE clause it doesn't fill target reference field because it is unknown yet. Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause (groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates ec_sortref field if it's not initialized yet. 4) Algorithms to reorder columns is proportional to N^2 where N is number of columns, but I hope it isn't a problem because number of GROUP BY columns isn't very big usually. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Cosmetics change: remove find_sort_group_clause_by_sortref() function added in v5 patch version because it duplicates existsing get_sortgroupref_clause(). -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Hi Teodor, Thanks for the patch. This (missing) optimization popped-up repeatedly recently, and I was planning to look into it for PG12. So now I don't have to, because you've done all the hard work ;-) I've have done only very basic review so far, but ISTM the general approach of considering additional indexes in create_index_paths is correct. That being said, I do have a couple of comments: 1) add_path() ensures that we only keep the one cheapest path sorted path for each pathkeys. This patch somewhat defeats that because it considers additional pathkeys (essentially any permutation of group keys) as interesting. So we may end up with more paths in the list. I wonder if we should make the add_path() logic smarter to recognize when two paths have different pathkeys but were generated to match the same grouping, to reduce the number of paths added by this optimization. Currently we do that for each pathkeys list independently, but we're considering many more pathkeys orderings ... That being said, maybe this concern is moot - to generate large number of new paths, there would have to be a lot of indexes matching the group by clause (as a prefix), and the group by clause would need to be fairly large (to have many permutations). That seems like a fairly rare case. I do know a couple of people who do create large numbers of indexes on tables, but those are usually single-column indexes. 2) sort reordering based on ndistinct estimates This part of the patch (reordering the columns in an explicit sort node) seems to be largely independent of the "consider other indexes" part. So I do suggest separating those two, and discussing them independently. But thinking about this optimization, I'm worried it relies on a couple of important assumptions. For now those decisions could have be made by the person writing the SQL query, but this optimization makes that impossible. So we really need to get this right. For example, it seems to disregard that different data types have different comparison costs. For example comparing bytea will be far more expensive compared to int4, so it may be much more efficient to compare int4 columns first, even if there are far fewer distinct values in them. This costing is probably related to abbreviated keys too, which is a huge improvement for text and other varlena-based types. Also, simply sorting the columns by their ndistinct estimate is somewhat naive, because it assumes the columns are independent. Imagine for example a table with three columns: CREATE TABLE t (a INT, b INT, c INT); INSERT INTO t SELECT mod(i,100), mod(i,100)/2, 49*random() FROM generate_series(1,1000000) s(i); Clearly, "a" has the most distinct values (100), while both "b" and "c" have 50 distinct values. But "b" does not actually split any of the "a" groups, while "c" does: select count(*) from (select 1 from t group by a,b) foo; count ------- 100 (1 row) select count(*) from (select 1 from t group by a,c) foo; count ------- 5000 (1 row) So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using per-column ndistinct values. Luckily, we now have ndistinct multi-column coefficients which could be used to decide this I believe (but I haven't tried). The real issue however is that this decision can't be made entirely locally. Consider for example this: explain select a,b,c, count(*) from t group by a,b,c order by c,b,a; QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=156010.16..156260.16 rows=100000 width=20) Sort Key: c, b, a -> GroupAggregate (cost=132154.34..145654.34 rows=100000 width=20) Group Key: a, c, b -> Sort (cost=132154.34..134654.34 rows=1000000 width=12) Sort Key: a, c, b -> Seq Scan on t (cost=0.00..15406.00 rows=1000000 width=12) (7 rows) while without the patch, this would happen: explain select a,b,c, count(*) from t group by a,b,c order by c,b,a; QUERY PLAN ------------------------------------------------------------------------ GroupAggregate (cost=132154.34..145654.34 rows=100000 width=20) Group Key: c, b, a -> Sort (cost=132154.34..134654.34 rows=1000000 width=12) Sort Key: c, b, a -> Seq Scan on t (cost=0.00..15406.00 rows=1000000 width=12) (5 rows) Which is clearly cheaper (at least according to the optimizer) than doing two separate sorts. So the optimization actually does the wrong thing here, and it needs to somehow consider the other ordering requirements (in this case ORDER BY) - either by generating multiple paths with different orderings or some heuristics. I'm also wondering how freely we can change the group by result ordering. Assume you disable hash aggregate and parallel query - currently we are guaranteed to use group aggregate that produces exactly the ordering as specified in GROUP BY, but this patch removes that "guarantee" and we can produce arbitrary permutation of the ordering. But as I argued in other threads, such implicit guarantees are really fragile, can silently break for arbitrary reasons (say, parallel query will do just that) and can be easily fixed by adding a proper ORDER BY. So I don't think this is an issue. The other random thought is how will/could this interact with the incremental sorts patch. I know Alexander wanted to significantly limit where we actually consider the incremental sort, but I'm not sure if this was one of those places or not (is sure looks like a place where we would greatly benefit from it). So to summarize this initial review - I do suggest splitting the patch into two parts. One that does the index magic, and one for this reordering optimization. The first part (additional indexes) seems quite fairly safe, likely to get committable soon. The other part (ndistinct reordering) IMHO requires more thought regarding costing and interaction with other query parts. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> Thanks for the patch. This (missing) optimization popped-up repeatedly recently, > and I was planning to look into it for PG12. So now I don't have to, because > you've done all the hard work ;-) You are welcome. Actually one of out customers faced the problem with GROUP BY column order and exactly with reordering without any indexes, you mean it as problem 2). Index optimization was noticed by me later. But based on your suggested patch's order I split the patch to index and non-index part and second part depends of first one. They touch the same part of code and they could not be independent > 1) add_path() ensures that we only keep the one cheapest path sorted path for > each pathkeys. This patch somewhat defeats that because it considers additional > pathkeys (essentially any permutation of group keys) as interesting. So we may > end up with more paths in the list. Seems, index scan here could give benefits here only if: 1) it's a index only scan 2) it's a index full (opposite to only) scan but physical order of heap is close to logical index order (ie table is clustered) In other cases costs of disk seeking will be very high. But on this stage of planing we don't know that facts yet. So we couldn't make a good decision here and should believe in add_path() logic. > I wonder if we should make the add_path() logic smarter to recognize when two > paths have different pathkeys but were generated to match the same grouping, > to reduce the number of paths added by this optimization. Currently we do that > for each pathkeys list independently, but we're considering many more pathkeys > orderings ... Hm. I tend to say no. select .. from t1 group by a, b union select .. from t2 group by a, b t1 and t2 could have different set of indexes and different distribution, so locally it could be cheaper to use one index (for example, one defined as (b, a) and second as (a,b,c,d) - second is larger) but totally - another (example: second table doesn't have (b,a) index) > 2) sort reordering based on ndistinct estimates > But thinking about this optimization, I'm worried it relies on a couple of > important assumptions. For now those decisions could have be made by the person > writing the SQL query, but this optimization makes that impossible. So we really > need to get this right. Hm, sql by design should not be used that way, but, of course, it's used :( > For example, it seems to disregard that different data types have different > comparison costs. For example comparing bytea will be far more expensive > compared to int4, so it may be much more efficient to compare int4 columns > first, even if there are far fewer distinct values in them. as I can see cost_sort() doesn't pay attention to this details. And it should be a separated patch to improve that. > Also, simply sorting the columns by their ndistinct estimate is somewhat naive, > because it assumes the columns are independent. Imagine for example a table with > three columns: > So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use > "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using > per-column ndistinct values. Luckily, we now have ndistinct multi-column > coefficients which could be used to decide this I believe (but I haven't tried). Agree, but I don't know how to use it here. Except, may be: 1) first column - the column with bigger estimated number of groups 2) second column - the pair of (first, second) with bigger estimated number of groups 3) third column - the triple of (first, second, third) with bigger ... But seems even with that algorithm, ISTM, it could be implemented in cheaper manner. > The real issue however is that this decision can't be made entirely locally. > Consider for example this: > > explain select a,b,c, count(*) from t group by a,b,c order by c,b,a; > > Which is clearly cheaper (at least according to the optimizer) than doing two > separate sorts. So the optimization actually does the wrong thing here, and it > needs to somehow consider the other ordering requirements (in this case ORDER > BY) - either by generating multiple paths with different orderings or some > heuristics. Hm, thank you. I consider it is a bug of my implementation - basic idea was that we try to match already existing or needed order and only if we fail or have unmatched tail of pathkey list than we will try to find cheapest column order. Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by naive way: if we don't have a path pathkey first try to reorder columns accordingly to order by clause. Test for your is also added. > I'm also wondering how freely we can change the group by result ordering. Assume > you disable hash aggregate and parallel query - currently we are guaranteed to > use group aggregate that produces exactly the ordering as specified in GROUP BY, > but this patch removes that "guarantee" and we can produce arbitrary permutation > of the ordering. But as I argued in other threads, such implicit guarantees are > really fragile, can silently break for arbitrary reasons (say, parallel query > will do just that) and can be easily fixed by adding a proper ORDER BY. So I > don't think this is an issue. Agree. SQL by design doesn't give a warranty of particular order without explicit ORDER BY clause. > The other random thought is how will/could this interact with the incremental > sorts patch. I know Alexander wanted to significantly limit where we actually > consider the incremental sort, but I'm not sure if this was one of those places > or not (is sure looks like a place where we would greatly benefit from it). Seems, they should not directly interact. Patch tries to find cheapest column order, Alexander's patch tries to make sort cheaper - they are a different tasks. But I will try. > So to summarize this initial review - I do suggest splitting the patch into two > parts. One that does the index magic, and one for this reordering optimization. > The first part (additional indexes) seems quite fairly safe, likely to get > committable soon. The other part (ndistinct reordering) IMHO requires more > thought regarding costing and interaction with other query parts. Thank you for review! -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 06/05/2018 07:56 PM, Teodor Sigaev wrote: >> Thanks for the patch. This (missing) optimization popped-up repeatedly >> recently, and I was planning to look into it for PG12. So now I don't >> have to, because you've done all the hard work ;-) > You are welcome. Actually one of out customers faced the problem with > GROUP BY column order and exactly with reordering without any indexes, > you mean it as problem 2). Index optimization was noticed by me later. > But based on your suggested patch's order I split the patch to index > and non-index part and second part depends of first one. They touch the > same part of code and they could not be independent > The way I see it the patch does two different things: a) consider additional indexes by relaxing the pathkeys check b) if there are no indexes, i.e. when an explicit Sort is needed, consider reordering the columns by ndistinct Not sure why those two parts couldn't be separated. I haven't tried splitting the patch, of course, so I may be missing something. In the worst case, one part will depend on the other, which is OK. It still allows us to commit the first part and continue working on the other one, for example. >> 1) add_path() ensures that we only keep the one cheapest path sorted >> path for each pathkeys. This patch somewhat defeats that because it >> considers additional pathkeys (essentially any permutation of group >> keys) as interesting. So we may end up with more paths in the list. > Seems, index scan here could give benefits here only if: > 1) it's a index only scan > 2) it's a index full (opposite to only) scan but physical order of > heap is > close to logical index order (ie table is clustered) > > In other cases costs of disk seeking will be very high. But on this > stage of planing we don't know that facts yet. So we couldn't make a > good decision here and should believe in add_path() logic. > Not sure what you mean? Surely we do costing of the paths at this stage, so we can decide which one is cheaper etc. The decision which paths to keep is done by add_path(), and it should stay like this, of course. I wasn't suggesting to move the logic elsewhere. > > I wonder if we should make the add_path() logic smarter to recognize > when two > > paths have different pathkeys but were generated to match the same > grouping, > > to reduce the number of paths added by this optimization. Currently > we do that > for each pathkeys list independently, but we're considering > many more pathkeys > orderings ... > > Hm. I tend to say no. > select .. from t1 group by a, b > union > select .. from t2 group by a, b > > t1 and t2 could have different set of indexes and different > distribution, so locally it could be cheaper to use one index (for > example, one defined as (b, a) and second as (a,b,c,d) - second is > larger) but totally - another (example: second table doesn't have (b,a) > index) > But add_path() treats each of the relations independently, why couldn't we pick a different index for each of the two relations? >> 2) sort reordering based on ndistinct estimates > >> But thinking about this optimization, I'm worried it relies on a >> couple of important assumptions. For now those decisions could have be >> made by the person writing the SQL query, but this optimization makes >> that impossible. So we really need to get this right. > Hm, sql by design should not be used that way, but, of course, it's used :( > Well, yes and no. I'm not worried about people relying on us to give them some ordering - they can (and should) add an ORDER BY clause to fix that. I'm more worried about the other stuff. >> For example, it seems to disregard that different data types have >> different comparison costs. For example comparing bytea will be far >> more expensive compared to int4, so it may be much more efficient to >> compare int4 columns first, even if there are far fewer distinct >> values in them. > as I can see cost_sort() doesn't pay attention to this details. And it > should be a separated patch to improve that. > But sort also does not reorder columns. Imagine you have a custom data type that is expensive for comparisons. You know that, so you place it at the end of GROUP BY clauses, to reduce the number of comparisons on that field. And then we come along and just reorder the columns, placing it first, because it happens to have a high ndistinct statistic. And there's no way to get the original behavior :-( >> Also, simply sorting the columns by their ndistinct estimate is >> somewhat naive, because it assumes the columns are independent. >> Imagine for example a table with three columns: >> So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to >> use "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely >> using per-column ndistinct values. Luckily, we now have ndistinct >> multi-column coefficients which could be used to decide this I believe >> (but I haven't tried). > Agree, but I don't know how to use it here. Except, may be: > 1) first column - the column with bigger estimated number of groups > 2) second column - the pair of (first, second) with bigger estimated > number of groups > 3) third column - the triple of (first, second, third) with bigger ... > > But seems even with that algorithm, ISTM, it could be implemented in > cheaper manner. > Maybe. I do have some ideas, although I haven't tried implementing it. If there's no extended statistic on the columns, you can do the current thing (assuming independence etc.). There's not much we can do here. If there's an extended statistic, you can do either a greedy search (get the next column with the highest ndistinct coefficient) or exhaustive search (computing the estimated number of comparisons). Another challenge is that using only the ndistinct coefficient assumes uniform distribution of the values. But you can have a column with 1M distinct values, where a single value represents 99% of the rows. And another column with 100k distinct values, with actual uniform distribution. I'm pretty sure it'd be more efficient to place the 100k column first. > >> The real issue however is that this decision can't be made entirely >> locally. Consider for example this: >> >> explain select a,b,c, count(*) from t group by a,b,c order by c,b,a; >> >> Which is clearly cheaper (at least according to the optimizer) than >> doing two separate sorts. So the optimization actually does the wrong >> thing here, and it needs to somehow consider the other ordering >> requirements (in this case ORDER BY) - either by generating multiple >> paths with different orderings or some heuristics. > Hm, thank you. I consider it is a bug of my implementation - basic idea > was that we try to match already existing or needed order and only if we > fail or have unmatched tail of pathkey list than we will try to find > cheapest column order. > Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by > naive way: if we don't have a path pathkey first try to reorder columns > accordingly to order by clause. Test for your is also added. > OK. I'll give it a try. > >> I'm also wondering how freely we can change the group by result >> ordering. Assume you disable hash aggregate and parallel query - >> currently we are guaranteed to use group aggregate that produces >> exactly the ordering as specified in GROUP BY, but this patch removes >> that "guarantee" and we can produce arbitrary permutation of the >> ordering. But as I argued in other threads, such implicit guarantees >> are really fragile, can silently break for arbitrary reasons (say, >> parallel query will do just that) and can be easily fixed by adding a >> proper ORDER BY. So I don't think this is an issue. > Agree. SQL by design doesn't give a warranty of particular order without > explicit ORDER BY clause. > >> The other random thought is how will/could this interact with the >> incremental sorts patch. I know Alexander wanted to significantly >> limit where we actually consider the incremental sort, but I'm not >> sure if this was one of those places or not (is sure looks like a >> place where we would greatly benefit from it). > Seems, they should not directly interact. Patch tries to find cheapest > column order, Alexander's patch tries to make sort cheaper - they are a > different tasks. But I will try. > Thanks. >> So to summarize this initial review - I do suggest splitting the patch >> into two parts. One that does the index magic, and one for this >> reordering optimization. The first part (additional indexes) seems >> quite fairly safe, likely to get committable soon. The other part >> (ndistinct reordering) IMHO requires more thought regarding costing >> and interaction with other query parts. > Thank you for review! > ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> problem 2). Index optimization was noticed by me later. But based on your >> suggested patch's order I split the patch to index and non-index part and >> second part depends of first one. They touch the same part of code and they >> could not be independent > The way I see it the patch does two different things: Yes > a) consider additional indexes by relaxing the pathkeys check Yes, but not only. Patch could reorder GROUP BY clause to match existing pathkeys which could come from index scan (suggested by patch or not) or some another way to get ordered output. > b) if there are no indexes, i.e. when an explicit Sort is needed, consider > reordering the columns by ndistinct Yes. But again, this description is a bit short. First it works after first patch and might get some preordered leading pathkeys. Second, it tries to match ORDER BY clause order if there is no preordered leading pathkeys from first patch (it was introduced in v7). And third, if there is a tail of unmatched pathkeys on previous stages then it will reorder that tail. > In the worst case, one part will depend on the other, which is OK. It still > allows us to commit the first part and continue working on the other one, for > example. Exactly it's our case. Of course, it's possible to split first patch for two again: just suggestion of index (1.1) and reordering by existing pathkeys (1.2). Then 1.1 will be independent but 2 still should be applied after 1.2. But patch 1.1 is rather useless. > can decide which one is cheaper etc. The decision which paths to keep is done by > add_path(), and it should stay like this, of course. I wasn't suggesting to move > the logic elsewhere. Cool, I haven't intention to modify it too. > >> > I wonder if we should make the add_path() logic smarter to recognize when two >> > paths have different pathkeys but were generated to match the same grouping, >> > to reduce the number of paths added by this optimization. Currently we do >> that > for each pathkeys list independently, but we're considering many more >> pathkeys > orderings ... >> >> Hm. I tend to say no. >> select .. from t1 group by a, b >> union >> select .. from t2 group by a, b >> >> t1 and t2 could have different set of indexes and different distribution, so >> locally it could be cheaper to use one index (for example, one defined as (b, >> a) and second as (a,b,c,d) - second is larger) but totally - another (example: >> second table doesn't have (b,a) index) >> > > But add_path() treats each of the relations independently, why couldn't we pick > a different index for each of the two relations? Having of sorted output of both subselect could be cheaper that sorting one of them even if index scan was cheaper. But it seems to me that is not deal of suggested here optimization. >>> For example, it seems to disregard that different data types have different >>> comparison costs. For example comparing bytea will be far more expensive >>> compared to int4, so it may be much more efficient to compare int4 columns >>> first, even if there are far fewer distinct values in them. >> as I can see cost_sort() doesn't pay attention to this details. And it should >> be a separated patch to improve that. >> > > But sort also does not reorder columns. Yes. But estimation of cost of comparing function/number of unique values in column could be not very accurate and so planner could make a wrong choice. I saw 2 times difference in real-world application. Again, improving sort cost estimation is a separate task. > > Imagine you have a custom data type that is expensive for comparisons. You know > that, so you place it at the end of GROUP BY clauses, to reduce the number of > comparisons on that field. And then we come along and just reorder the columns, > placing it first, because it happens to have a high ndistinct statistic. And > there's no way to get the original behavior :-( Hm. that it could be, but I don't know how to improve here. Current cost_sort() will return the same cost for any columns order. Okay, here we know an estimation of nrow, we could somehow find a comparing function oid and find a its procost field. And then? we also need to know width of tuple here and mostly repeat the cost_sort. Another option is an introducing enable_groupby_reorder GUC variable although personally I don't like an idea to add new GUC variable. >> Agree, but I don't know how to use it here. Except, may be: >> 1) first column - the column with bigger estimated number of groups >> 2) second column - the pair of (first, second) with bigger estimated number of >> groups >> 3) third column - the triple of (first, second, third) with bigger ... >> >> But seems even with that algorithm, ISTM, it could be implemented in cheaper >> manner. >> > > Maybe. I do have some ideas, although I haven't tried implementing it. Implemented, pls, have a look. > > If there's no extended statistic on the columns, you can do the current thing > (assuming independence etc.). There's not much we can do here. Agree > > If there's an extended statistic, you can do either a greedy search (get the > next column with the highest ndistinct coefficient) or exhaustive search > (computing the estimated number of comparisons). > > Another challenge is that using only the ndistinct coefficient assumes uniform > distribution of the values. But you can have a column with 1M distinct values, > where a single value represents 99% of the rows. And another column with 100k > distinct values, with actual uniform distribution. I'm pretty sure it'd be more > efficient to place the 100k column first. Interesting. Will think, thank you -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 06/06/2018 08:04 PM, Teodor Sigaev wrote: > >>> problem 2). Index optimization was noticed by me later. But based on >>> your suggested patch's order I split the patch to index and non-index >>> part and second part depends of first one. They touch the same part >>> of code and they could not be independent >> The way I see it the patch does two different things: > Yes > >> a) consider additional indexes by relaxing the pathkeys check > Yes, but not only. Patch could reorder GROUP BY clause to match existing > pathkeys which could come from index scan (suggested by patch or not) or > some another way to get ordered output. > >> b) if there are no indexes, i.e. when an explicit Sort is needed, >> consider reordering the columns by ndistinct > Yes. But again, this description is a bit short. First it works after > first patch and might get some preordered leading pathkeys. Second, it > tries to match ORDER BY clause order if there is no preordered leading > pathkeys from first patch (it was introduced in v7). And third, if there > is a tail of unmatched pathkeys on previous stages then it will reorder > that tail. > OK, I haven't looked at v7 yet, but if I understand correctly it tries to maintain the ordering as much as possible? Does that actually help? I mean, the incremental sort patch allows the sorting to happen by pieces, but here we still need to sort all the data, right? Can you give an example demonstrating the benefit? >> In the worst case, one part will depend on the other, which is OK. It >> still allows us to commit the first part and continue working on the >> other one, for example. > Exactly it's our case. Of course, it's possible to split first patch for > two again: just suggestion of index (1.1) and reordering by existing > pathkeys (1.2). Then 1.1 will be independent but 2 still should be > applied after 1.2. But patch 1.1 is rather useless. > >> can decide which one is cheaper etc. The decision which paths to keep >> is done by add_path(), and it should stay like this, of course. I >> wasn't suggesting to move the logic elsewhere. > Cool, I haven't intention to modify it too. > >> >>> > I wonder if we should make the add_path() logic smarter to >>> recognize when two >>> > paths have different pathkeys but were generated to match the same >>> grouping, >>> > to reduce the number of paths added by this optimization. >>> Currently we do that > for each pathkeys list independently, but >>> we're considering many more pathkeys > orderings ... >>> >>> Hm. I tend to say no. >>> select .. from t1 group by a, b >>> union >>> select .. from t2 group by a, b >>> >>> t1 and t2 could have different set of indexes and different >>> distribution, so locally it could be cheaper to use one index (for >>> example, one defined as (b, a) and second as (a,b,c,d) - second is >>> larger) but totally - another (example: second table doesn't have >>> (b,a) index) >>> >> >> But add_path() treats each of the relations independently, why >> couldn't we pick a different index for each of the two relations? > > Having of sorted output of both subselect could be cheaper that sorting > one of them even if index scan was cheaper. But it seems to me that is > not deal of suggested here optimization. > OK, I think I understand what you're trying to say. We may have a query like this: SELECT a, SUM(x) FROM ( SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b UNION ALL SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b ) foo GROUP BY a; and indexes on (a,b) and (b,a) for both relations. The "deduplication" by pathkeys I suggested would mean we might keep only index (a,b) on t1 and (b,a) on t2, which means the grouping by "a" can't leverage index scans on both relations. But if we keep paths for both indexes on each relation, we can. That's a good point, yes. > >>>> For example, it seems to disregard that different data types have >>>> different comparison costs. For example comparing bytea will be far >>>> more expensive compared to int4, so it may be much more efficient to >>>> compare int4 columns first, even if there are far fewer distinct >>>> values in them. >>> as I can see cost_sort() doesn't pay attention to this details. And >>> it should be a separated patch to improve that. >>> >> >> But sort also does not reorder columns. > Yes. But estimation of cost of comparing function/number of unique > values in column could be not very accurate and so planner could make > a wrong choice. Isn't "estimation of cost of comparing function/number of unique values in column could be not very accurate and so planner could make a wrong choice" is more an argument against relying on it when doing these optimizations? FWIW it's one of the arguments Tom made in the incremental sort patch, which relies on it too when computing cost of the incremental sort. I'm sure it's going to be an obstacle there too. > I saw 2 times difference in real-world application. Again, improving > sort cost estimation is a separate task. Sure. But we also need to ask the other question, i.e. how many people would be negatively affected by the optimization. And I admit I don't know the answer to that, the next example is entirely made up. >> >> Imagine you have a custom data type that is expensive for comparisons. >> You know that, so you place it at the end of GROUP BY clauses, to >> reduce the number of comparisons on that field. And then we come along >> and just reorder the columns, placing it first, because it happens to >> have a high ndistinct statistic. And there's no way to get the >> original behavior :-( > Hm. that it could be, but I don't know how to improve here. Current > cost_sort() will return the same cost for any columns order. > > Okay, here we know an estimation of nrow, we could somehow find a > comparing function oid and find a its procost field. And then? we also > need to know width of tuple here and mostly repeat the cost_sort. > > Another option is an introducing enable_groupby_reorder GUC variable > although personally I don't like an idea to add new GUC variable. > I don't like the idea of yet another GUC either, as it's rather blunt instrument. Which is why I'm suggesting to split the patch into two parts. Then we can apply the index stuff (which seems relatively straightforward) and keep working on this second part. FWIW I think it would be useful to have "development GUC" that would allow us to enable/disable these options during development, because that makes experiments much easier. But then remove them before commit. >>> Agree, but I don't know how to use it here. Except, may be: >>> 1) first column - the column with bigger estimated number of groups >>> 2) second column - the pair of (first, second) with bigger estimated >>> number of groups >>> 3) third column - the triple of (first, second, third) with bigger ... >>> >>> But seems even with that algorithm, ISTM, it could be implemented in >>> cheaper manner. >>> >> >> Maybe. I do have some ideas, although I haven't tried implementing it. > Implemented, pls, have a look. > >> >> If there's no extended statistic on the columns, you can do the >> current thing (assuming independence etc.). There's not much we can do >> here. > Agree > >> >> If there's an extended statistic, you can do either a greedy search >> (get the next column with the highest ndistinct coefficient) or >> exhaustive search (computing the estimated number of comparisons). >> >> Another challenge is that using only the ndistinct coefficient assumes >> uniform distribution of the values. But you can have a column with 1M >> distinct values, where a single value represents 99% of the rows. And >> another column with 100k distinct values, with actual uniform >> distribution. I'm pretty sure it'd be more efficient to place the 100k >> column first. > > Interesting. Will think, thank you > You're welcome. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > >>>> For example, it seems to disregard that different data types have > >>>> different comparison costs. For example comparing bytea will be far > >>>> more expensive compared to int4, so it may be much more efficient to > >>>> compare int4 columns first, even if there are far fewer distinct > >>>> values in them. > >>> as I can see cost_sort() doesn't pay attention to this details. And > >>> it should be a separated patch to improve that. > >>> > >> > >> But sort also does not reorder columns. > > Yes. But estimation of cost of comparing function/number of unique > > values in column could be not very accurate and so planner could make > > a wrong choice. ... > > >> Imagine you have a custom data type that is expensive for comparisons. > >> You know that, so you place it at the end of GROUP BY clauses, to > >> reduce the number of comparisons on that field. And then we come along > >> and just reorder the columns, placing it first, because it happens to > >> have a high ndistinct statistic. And there's no way to get the > >> original behavior :-( > > Hm. that it could be, but I don't know how to improve here. Current > > cost_sort() will return the same cost for any columns order. > > > > Okay, here we know an estimation of nrow, we could somehow find a > > comparing function oid and find a its procost field. And then? we also > > need to know width of tuple here and mostly repeat the cost_sort. > > > > Another option is an introducing enable_groupby_reorder GUC variable > > although personally I don't like an idea to add new GUC variable. > > > > I don't like the idea of yet another GUC either, as it's rather blunt > instrument. Which is why I'm suggesting to split the patch into two > parts. Then we can apply the index stuff (which seems relatively > straightforward) and keep working on this second part. > > FWIW I think it would be useful to have "development GUC" that would > allow us to enable/disable these options during development, because > that makes experiments much easier. But then remove them before commit. Correct me if I'm wrong, but doesn't this patch concern itself with precisely sort performance? As such, estimating sort performance correctly in the various plan variants being considered seems to be a very central aspect of it. This means it's pretty much time to consider the effect of operator cost *and* distinct values in the cost computation. Assuming cost_sort does its thing, it's just a matter of building the desired variants and picking the one with the smallest cost_sort. One can use heuristics to build a subset of all possible column orderings with a guiding principle that tries to maximize the likelihook of finding a better order than the one in the query (like sorting by ndistinct), but I'd suggest: - Always considering the sort order verbatim as given in the SQL (ie: what the user requests) - Relying on cost_sort to distinguish among them, and pick a winner, and - When in doubt (if cost_sort doesn't produce a clear winner), use the order given by the user Comparison cost can be approximated probabilistically as: cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) Where cost_op_n is the cost of the comparison function for column N, and ndistinct(col_1_to_n) is an estimation of the number of distinct values for columns prior to N in the sort order. You can approximate ndistinct as the product of ndistinct of previous columns, or use extended statistics when available. I think that should give a good enough approximation of ndistinct-enriched sort costs that considers operator cost appropriately. That operator cost is basically an average and can be used as a constant, so it's just a matter of passing the right comparison_cost to cost_sort. Even if ndistinct estimates are off, the fact that operator cost is involved should be a good enough tool for the user should the planner pick the wrong path - it's only a matter of boosting operator cost to let the planner know that sorting that way is expensive. Priorization of the user-provided order can be as simple as giving that comparison_cost a small handicap.
On 06/06/2018 11:22 PM, Claudio Freire wrote: > On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> >>>>>> For example, it seems to disregard that different data types have >>>>>> different comparison costs. For example comparing bytea will be far >>>>>> more expensive compared to int4, so it may be much more efficient to >>>>>> compare int4 columns first, even if there are far fewer distinct >>>>>> values in them. >>>>> as I can see cost_sort() doesn't pay attention to this details. And >>>>> it should be a separated patch to improve that. >>>>> >>>> >>>> But sort also does not reorder columns. >>> Yes. But estimation of cost of comparing function/number of unique >>> values in column could be not very accurate and so planner could make >>> a wrong choice. > > ... >> >>>> Imagine you have a custom data type that is expensive for comparisons. >>>> You know that, so you place it at the end of GROUP BY clauses, to >>>> reduce the number of comparisons on that field. And then we come along >>>> and just reorder the columns, placing it first, because it happens to >>>> have a high ndistinct statistic. And there's no way to get the >>>> original behavior :-( >>> Hm. that it could be, but I don't know how to improve here. Current >>> cost_sort() will return the same cost for any columns order. >>> >>> Okay, here we know an estimation of nrow, we could somehow find a >>> comparing function oid and find a its procost field. And then? we also >>> need to know width of tuple here and mostly repeat the cost_sort. >>> >>> Another option is an introducing enable_groupby_reorder GUC variable >>> although personally I don't like an idea to add new GUC variable. >>> >> >> I don't like the idea of yet another GUC either, as it's rather blunt >> instrument. Which is why I'm suggesting to split the patch into two >> parts. Then we can apply the index stuff (which seems relatively >> straightforward) and keep working on this second part. >> >> FWIW I think it would be useful to have "development GUC" that would >> allow us to enable/disable these options during development, because >> that makes experiments much easier. But then remove them before commit. > > Correct me if I'm wrong, but doesn't this patch concern itself with > precisely sort performance? > Yes, the second part (reordering pathkeys by ndistinct) does. > As such, estimating sort performance correctly in the various plan > variants being considered seems to be a very central aspect of it. > > This means it's pretty much time to consider the effect of operator > cost *and* distinct values in the cost computation. > Yes, until now that was not really needed because the optimizer does not consider reordering of the pathkeys - it simply grabs either GROUP BY or ORDER BY pathkeys and runs with that. So the costing was fairly trivial, we simply do something like comparison_cost = 2.0 * cpu_operator_cost; sort_cost = comparison_cost * tuples * LOG2(tuples); which essentially ignores that there might be multiple columns, or that the columns may have sort operator with different costs. The question is how reliable the heuristics can be. The current patch uses just plain ndistinct, but that seems rather unreliable but I don't have a clear idea how to improve that - we may have MCV for the columns and perhaps some extended statistics, but I'm not sure how far we can run with that. Essentially what we need to estimate the number of comparisons for each column, to compute better comparison_cost. > > Assuming cost_sort does its thing, it's just a matter of building the > desired variants and picking the one with the smallest cost_sort. One > can use heuristics to build a subset of all possible column orderings > with a guiding principle that tries to maximize the likelihook of > finding a better order than the one in the query (like sorting by > ndistinct), but I'd suggest: > > - Always considering the sort order verbatim as given in the SQL (ie: > what the user requests) > - Relying on cost_sort to distinguish among them, and pick a winner, and > - When in doubt (if cost_sort doesn't produce a clear winner), use the > order given by the user > I don't see why not to generate all possible orderings (keeping only the cheapest path for each pathkey variant) and let the optimizer to sort it out. If the user specified an explicit ORDER BY, we'll slap an extra Sort by at the end, increasing the cost. > Comparison cost can be approximated probabilistically as: > > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) > > Where cost_op_n is the cost of the comparison function for column N, > and ndistinct(col_1_to_n) is an estimation of the number of distinct > values for columns prior to N in the sort order. > > You can approximate ndistinct as the product of ndistinct of previous > columns, or use extended statistics when available. > Sure. The trouble is this also assumes uniform distribution, which is not always the case. > I think that should give a good enough approximation of > ndistinct-enriched sort costs that considers operator cost > appropriately. That operator cost is basically an average and can be > used as a constant, so it's just a matter of passing the right > comparison_cost to cost_sort. > > Even if ndistinct estimates are off, the fact that operator cost is > involved should be a good enough tool for the user should the > planner pick the wrong path - it's only a matter of boosting operator > cost to let the planner know that sorting that way is expensive. > There are far to many "should" in these two paragraphs. > Priorization of the user-provided order can be as simple as giving > that comparison_cost a small handicap. I see no point in doing that, and I don't recall a single place in the planner where we do that. If the user specified ORDER BY, we'll slap an explicit Sort on top when needed (which acts as the handicap, but in a clear way). Otherwise we don't do such things - it'd be just plain confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data types, ndistinct etc. but unexpectedly different costs). Also, what would be a good value for the handicap? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On 06/06/2018 11:22 PM, Claudio Freire wrote: > > On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra > > As such, estimating sort performance correctly in the various plan > > variants being considered seems to be a very central aspect of it. > > > > This means it's pretty much time to consider the effect of operator > > cost *and* distinct values in the cost computation. > > > > Yes, until now that was not really needed because the optimizer does not > consider reordering of the pathkeys - it simply grabs either GROUP BY or > ORDER BY pathkeys and runs with that. > > So the costing was fairly trivial, we simply do something like > > comparison_cost = 2.0 * cpu_operator_cost; > > sort_cost = comparison_cost * tuples * LOG2(tuples); > > which essentially ignores that there might be multiple columns, or that > the columns may have sort operator with different costs. > > The question is how reliable the heuristics can be. The current patch > uses just plain ndistinct, but that seems rather unreliable but I don't > have a clear idea how to improve that - we may have MCV for the columns > and perhaps some extended statistics, but I'm not sure how far we can > run with that. > > Essentially what we need to estimate the number of comparisons for each > column, to compute better comparison_cost. This could be approached by adjusting statistics by any relevant restrictions applicable to the columns being grouped, as is done for selectivity estimations. I'm not sure how far that would get us, though. It would be rather easy to lose sight of those restrictions if there are complex operations involved. > > Assuming cost_sort does its thing, it's just a matter of building the > > desired variants and picking the one with the smallest cost_sort. One > > can use heuristics to build a subset of all possible column orderings > > with a guiding principle that tries to maximize the likelihook of > > finding a better order than the one in the query (like sorting by > > ndistinct), but I'd suggest: > > > > - Always considering the sort order verbatim as given in the SQL (ie: > > what the user requests) > > - Relying on cost_sort to distinguish among them, and pick a winner, and > > - When in doubt (if cost_sort doesn't produce a clear winner), use the > > order given by the user > > > > I don't see why not to generate all possible orderings (keeping only the > cheapest path for each pathkey variant) and let the optimizer to sort it > out. I'm assuming costing the full N! possible orderings would be prohibitively expensive. > If the user specified an explicit ORDER BY, we'll slap an extra > Sort by at the end, increasing the cost. I think you misunderstood me. I'm saying the exact opposite. When I mention handicap, I mean to give the explicitly requested group by order a *reduced* cost, to give it an advantage over the heuristics. This is to try to attack the problem you mentioned where users already accounting for operator costs in their queries would get overridden by the planner, perhaps in detriment of overall execution time. In essence: - If the user requested that order, we assume it "somewhat subjectively better" (by giving it a slightly reduced cost). - If there is an explicit order by clause that differs from a considered path, the required sort node will already penalize it appropriately, nothing needs to be done in relation to sort costs. - All other things being equal, cost_sort will make the decision. If a plan beats the user-provided order in spite of the handicap, it will be used. So it's still optimizing clear cases. > > > Comparison cost can be approximated probabilistically as: > > > > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) > > > > Where cost_op_n is the cost of the comparison function for column N, > > and ndistinct(col_1_to_n) is an estimation of the number of distinct > > values for columns prior to N in the sort order. > > > > You can approximate ndistinct as the product of ndistinct of previous > > columns, or use extended statistics when available. > > > > Sure. The trouble is this also assumes uniform distribution, which is > not always the case. Well, (1.0 / ndistinct) = p(left == right). If you can compute a better p(left == right) with an MCV, you can get a better estimate. If you have an MCV. But that'd be a bit expensive, and I'm not sure it's all that relevant. To what degree does improving this produce better plans? As long as average expected cost is relatively well estimated (as in, one sort order relative to another sort order), it won't affect the decision. But if needed, the MCV can be used for this. So, in essence, you need to account for: - restrictions on that column that constrain the domain - distribution skew (MCV, nulls) - ndistinct To compute p(left == right). Maybe something as simple as the following? p_special = sum(mcv_i * mcv_i) + null_frac * null_frac p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr) > > I think that should give a good enough approximation of > > ndistinct-enriched sort costs that considers operator cost > > appropriately. That operator cost is basically an average and can be > > used as a constant, so it's just a matter of passing the right > > comparison_cost to cost_sort. > > > > Even if ndistinct estimates are off, the fact that operator cost is > > involved should be a good enough tool for the user should the > > planner pick the wrong path - it's only a matter of boosting operator > > cost to let the planner know that sorting that way is expensive. > > > > There are far to many "should" in these two paragraphs. Aren't all planner decisions riddled with "should"s? Anyway, my point is that, since the user can control operator cost, and operator cost is involved in the computation, and the ordering given by the user is explicitly tested for and given a slight advantage, it should (will?) be easy for the user to overcome any poor planner decisions by giving the planner the relevant information. > > Priorization of the user-provided order can be as simple as giving > > that comparison_cost a small handicap. > > I see no point in doing that, and I don't recall a single place in the > planner where we do that. If the user specified ORDER BY, we'll slap an > explicit Sort on top when needed (which acts as the handicap, but in a > clear way). Otherwise we don't do such things - it'd be just plain > confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data > types, ndistinct etc. but unexpectedly different costs). Also, what > would be a good value for the handicap? As I said, I'm not talking about explicit order by, but about the case where the user knows which group by order is the most efficient somehow. I'm proposing the planner shouldn't override the user without clear evidence that the other plan is actually better, above the noise better. That's what I mean by handicap.
On Wed, Jun 6, 2018 at 7:18 PM Claudio Freire <klaussfreire@gmail.com> wrote: > > > Comparison cost can be approximated probabilistically as: > > > > > > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) > > > > > > Where cost_op_n is the cost of the comparison function for column N, > > > and ndistinct(col_1_to_n) is an estimation of the number of distinct > > > values for columns prior to N in the sort order. > > > > > > You can approximate ndistinct as the product of ndistinct of previous > > > columns, or use extended statistics when available. > > > > > > > Sure. The trouble is this also assumes uniform distribution, which is > > not always the case. > > Well, (1.0 / ndistinct) = p(left == right). > > If you can compute a better p(left == right) with an MCV, you can get > a better estimate. If you have an MCV. But that'd be a bit expensive, > and I'm not sure it's all that relevant. > > To what degree does improving this produce better plans? As long as > average expected cost is relatively well estimated (as in, one sort > order relative to another sort order), it won't affect the decision. > > But if needed, the MCV can be used for this. > > So, in essence, you need to account for: > > - restrictions on that column that constrain the domain > - distribution skew (MCV, nulls) > - ndistinct > > To compute p(left == right). > > Maybe something as simple as the following? > > p_special = sum(mcv_i * mcv_i) + null_frac * null_frac > p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr) Well, that came out with a slew of math errors, but the idea is clear: compute p(left == right) given the available statistics and constrained by any applicable restrictions.
On 06/07/2018 12:18 AM, Claudio Freire wrote: > On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> >> On 06/06/2018 11:22 PM, Claudio Freire wrote: >>> On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra >>> As such, estimating sort performance correctly in the various plan >>> variants being considered seems to be a very central aspect of it. >>> >>> This means it's pretty much time to consider the effect of operator >>> cost *and* distinct values in the cost computation. >>> >> >> Yes, until now that was not really needed because the optimizer does not >> consider reordering of the pathkeys - it simply grabs either GROUP BY or >> ORDER BY pathkeys and runs with that. >> >> So the costing was fairly trivial, we simply do something like >> >> comparison_cost = 2.0 * cpu_operator_cost; >> >> sort_cost = comparison_cost * tuples * LOG2(tuples); >> >> which essentially ignores that there might be multiple columns, or that >> the columns may have sort operator with different costs. >> >> The question is how reliable the heuristics can be. The current patch >> uses just plain ndistinct, but that seems rather unreliable but I don't >> have a clear idea how to improve that - we may have MCV for the columns >> and perhaps some extended statistics, but I'm not sure how far we can >> run with that. >> >> Essentially what we need to estimate the number of comparisons for >> each column, to compute better comparison_cost. > > This could be approached by adjusting statistics by any relevant > restrictions applicable to the columns being grouped, as is done for > selectivity estimations. > I don't follow how is this related to restrictions? Can you elaborate? > I'm not sure how far that would get us, though. It would be rather > easy to lose sight of those restrictions if there are complex > operations involved. > >>> Assuming cost_sort does its thing, it's just a matter of building the >>> desired variants and picking the one with the smallest cost_sort. One >>> can use heuristics to build a subset of all possible column orderings >>> with a guiding principle that tries to maximize the likelihook of >>> finding a better order than the one in the query (like sorting by >>> ndistinct), but I'd suggest: >>> >>> - Always considering the sort order verbatim as given in the SQL (ie: >>> what the user requests) >>> - Relying on cost_sort to distinguish among them, and pick a winner, and >>> - When in doubt (if cost_sort doesn't produce a clear winner), use the >>> order given by the user >>> >> >> I don't see why not to generate all possible orderings (keeping only the >> cheapest path for each pathkey variant) and let the optimizer to sort it >> out. > > I'm assuming costing the full N! possible orderings would be > prohibitively expensive. > That's possible, yes - particularly for large N values. Perhaps there's a simpler algorithm to compute a sufficient approximation. In a greedy way, starting from some likely-good combination of columns, or so. I haven't thought about this part very much ... >> If the user specified an explicit ORDER BY, we'll slap an extra >> Sort by at the end, increasing the cost. > > I think you misunderstood me. I'm saying the exact opposite. > > When I mention handicap, I mean to give the explicitly requested group > by order a *reduced* cost, to give it an advantage over the > heuristics. > I did understand you. I'm saying that if the ordering does not match the one requested by the user (in ORDER BY), we end up adding an explicit Sort node on top of it, which increases the cost of all the other paths, acting as a disadvantage. But now I realize this only works for when there is an ORDER BY clause, not for GROUP BY (in which case we don't add any Sort node). > This is to try to attack the problem you mentioned where users already > accounting for operator costs in their queries would get overridden by > the planner, perhaps in detriment of overall execution time. > > In essence: > > - If the user requested that order, we assume it "somewhat > subjectively better" (by giving it a slightly reduced cost). > - If there is an explicit order by clause that differs from a > considered path, the required sort node will already penalize it > appropriately, nothing needs to be done in relation to sort costs. > - All other things being equal, cost_sort will make the decision. If a > plan beats the user-provided order in spite of the handicap, it will > be used. So it's still optimizing clear cases. > OK. I still don't believe this is a good approach, because we don't know how good our estimate of the costs is, so I have no idea how good large the handicap needs to be. >> >>> Comparison cost can be approximated probabilistically as: >>> >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) >>> >>> Where cost_op_n is the cost of the comparison function for column N, >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct >>> values for columns prior to N in the sort order. >>> >>> You can approximate ndistinct as the product of ndistinct of previous >>> columns, or use extended statistics when available. >>> >> >> Sure. The trouble is this also assumes uniform distribution, which is >> not always the case. > > Well, (1.0 / ndistinct) = p(left == right). > > If you can compute a better p(left == right) with an MCV, you can get > a better estimate. If you have an MCV. But that'd be a bit expensive, > and I'm not sure it's all that relevant. > > To what degree does improving this produce better plans? As long as > average expected cost is relatively well estimated (as in, one sort > order relative to another sort order), it won't affect the decision. > I'd bet I can construct corner-case examples with vastly different behavior depending on column data distributions, so it's not entirely insignificant. How important is it in practice I don't know. > But if needed, the MCV can be used for this. > > So, in essence, you need to account for: > > - restrictions on that column that constrain the domain > - distribution skew (MCV, nulls) > - ndistinct > > To compute p(left == right). > > Maybe something as simple as the following? > > p_special = sum(mcv_i * mcv_i) + null_frac * null_frac > p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr) > >>> I think that should give a good enough approximation of >>> ndistinct-enriched sort costs that considers operator cost >>> appropriately. That operator cost is basically an average and can be >>> used as a constant, so it's just a matter of passing the right >>> comparison_cost to cost_sort. >>> >>> Even if ndistinct estimates are off, the fact that operator cost is >>> involved should be a good enough tool for the user should the >>> planner pick the wrong path - it's only a matter of boosting operator >>> cost to let the planner know that sorting that way is expensive. >>> >> >> There are far to many "should" in these two paragraphs. > > Aren't all planner decisions riddled with "should"s? > > Anyway, my point is that, since the user can control operator cost, > and operator cost is involved in the computation, and the ordering > given by the user is explicitly tested for and given a slight > advantage, it should (will?) be easy for the user to overcome any poor > planner decisions by giving the planner the relevant information. > I really don't want to force users to tweak operator cost in order to tune queries. It's about an order of magnitude less convenient than a GUC, for various reasons. >>> Priorization of the user-provided order can be as simple as giving >>> that comparison_cost a small handicap. >> >> I see no point in doing that, and I don't recall a single place in the >> planner where we do that. If the user specified ORDER BY, we'll slap an >> explicit Sort on top when needed (which acts as the handicap, but in a >> clear way). Otherwise we don't do such things - it'd be just plain >> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data >> types, ndistinct etc. but unexpectedly different costs). Also, what >> would be a good value for the handicap? > > As I said, I'm not talking about explicit order by, but about the case > where the user knows which group by order is the most efficient > somehow. > > I'm proposing the planner shouldn't override the user without clear > evidence that the other plan is actually better, above the noise > better. That's what I mean by handicap. > OK, gotcha. But I'm really unsure how large the handicap should be. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 6, 2018 at 8:06 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > >>> Comparison cost can be approximated probabilistically as: > >>> > >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) > >>> > >>> Where cost_op_n is the cost of the comparison function for column N, > >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct > >>> values for columns prior to N in the sort order. > >>> > >>> You can approximate ndistinct as the product of ndistinct of previous > >>> columns, or use extended statistics when available. > >>> > >> > >> Sure. The trouble is this also assumes uniform distribution, which is > >> not always the case. > > > > Well, (1.0 / ndistinct) = p(left == right). > > > > If you can compute a better p(left == right) with an MCV, you can get > > a better estimate. If you have an MCV. But that'd be a bit expensive, > > and I'm not sure it's all that relevant. > > > > To what degree does improving this produce better plans? As long as > > average expected cost is relatively well estimated (as in, one sort > > order relative to another sort order), it won't affect the decision. > > > > I'd bet I can construct corner-case examples with vastly different > behavior depending on column data distributions, so it's not entirely > insignificant. How important is it in practice I don't know. I guess that can only be answered by building that solution and testing it against the corner cases.
> OK, I haven't looked at v7 yet, but if I understand correctly it tries > to maintain the ordering as much as possible? Does that actually help? I > mean, the incremental sort patch allows the sorting to happen by pieces, > but here we still need to sort all the data, right? > > Can you give an example demonstrating the benefit? > > SELECT a, SUM(x) FROM > ( > SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b > UNION ALL > SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b > ) foo GROUP BY a; > > and indexes on (a,b) and (b,a) for both relations. The "deduplication" > by pathkeys I suggested would mean we might keep only index (a,b) on t1 > and (b,a) on t2, which means the grouping by "a" can't leverage index > scans on both relations. But if we keep paths for both indexes on each > relation, we can. yes, one of option > Isn't "estimation of cost of comparing function/number of unique values > in column could be not very accurate and so planner could make a wrong > choice" is more an argument against relying on it when doing these > optimizations? > > FWIW it's one of the arguments Tom made in the incremental sort patch, > which relies on it too when computing cost of the incremental sort. I'm > sure it's going to be an obstacle there too. > >> I saw 2 times difference in real-world application. Again, improving >> sort cost estimation is a separate task. > Sure. But we also need to ask the other question, i.e. how many people > would be negatively affected by the optimization. And I admit I don't > know the answer to that, the next example is entirely made up. Hm, seems, the best way here is a improving cost_sort estimation. Will try, but I think that is separated patch > FWIW I think it would be useful to have "development GUC" that would > allow us to enable/disable these options during development, because > that makes experiments much easier. But then remove them before commit. Will do -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> So the costing was fairly trivial, we simply do something like > > comparison_cost = 2.0 * cpu_operator_cost; > > sort_cost = comparison_cost * tuples * LOG2(tuples); > > which essentially ignores that there might be multiple columns, or that > the columns may have sort operator with different costs. Agree. And distribution of keys. > > The question is how reliable the heuristics can be. The current patch > uses just plain ndistinct, but that seems rather unreliable but I don't > have a clear idea how to improve that - we may have MCV for the columns > and perhaps some extended statistics, but I'm not sure how far we can > run with that. v8 already uses another algorithm. > > Essentially what we need to estimate the number of comparisons for each > column, to compute better comparison_cost. Exactly >> Priorization of the user-provided order can be as simple as giving >> that comparison_cost a small handicap. > > I see no point in doing that, and I don't recall a single place in the > planner where we do that. If the user specified ORDER BY, we'll slap an > explicit Sort on top when needed (which acts as the handicap, but in a > clear way). Otherwise we don't do such things - it'd be just plain > confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data > types, ndistinct etc. but unexpectedly different costs). Also, what > would be a good value for the handicap? Again agree. If we have fixed order of columns (ORDER BY) then we should not try to reorder it. Current patch follows that if I didn't a mistake. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
>> I don't see why not to generate all possible orderings (keeping only the >> cheapest path for each pathkey variant) and let the optimizer to sort it >> out. > > I'm assuming costing the full N! possible orderings would be > prohibitively expensive. That's true, but for the first step we need to improve cost_sort and only then try to find the best pathkeys order by optimal way. > - If the user requested that order, we assume it "somewhat > subjectively better" (by giving it a slightly reduced cost). I don't think so. It's not a SQL way - DB should define the optimal way to execute query. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 06/07/2018 03:41 PM, Teodor Sigaev wrote: >> >> ... snip ... >> >>> Priorization of the user-provided order can be as simple as giving >>> that comparison_cost a small handicap. >> >> I see no point in doing that, and I don't recall a single place in the >> planner where we do that. If the user specified ORDER BY, we'll slap an >> explicit Sort on top when needed (which acts as the handicap, but in a >> clear way). Otherwise we don't do such things - it'd be just plain >> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data >> types, ndistinct etc. but unexpectedly different costs). Also, what >> would be a good value for the handicap? > > Again agree. If we have fixed order of columns (ORDER BY) then we should > not try to reorder it. Current patch follows that if I didn't a mistake. > This part seems to be more a misunderstanding between me and Claudio. I believe Claudio was referring to the column order in a GROUP BY, not ORDER BY. In which case we don't add any Sort, of course. I'm still opposed to adding arbitrary handicap to prioritize the order specified by user, for the reasons I explained before. We should aim to make the heuristics/costing reliable enough to make this unnecessary. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> Again agree. If we have fixed order of columns (ORDER BY) then we should not >> try to reorder it. Current patch follows that if I didn't a mistake. >> > > This part seems to be more a misunderstanding between me and Claudio. I believe > Claudio was referring to the column order in a GROUP BY, not ORDER BY. In which > case we don't add any Sort, of course. I hope so > > I'm still opposed to adding arbitrary handicap to prioritize the order specified > by user, for the reasons I explained before. We should aim to make the > heuristics/costing reliable enough to make this unnecessary. +1 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
>> Yes. But again, this description is a bit short. First it works after >> first patch and might get some preordered leading pathkeys. Second, it >> tries to match ORDER BY clause order if there is no preordered leading >> pathkeys from first patch (it was introduced in v7). And third, if there >> is a tail of unmatched pathkeys on previous stages then it will reorder >> that tail. >> > > OK, I haven't looked at v7 yet, but if I understand correctly it tries > to maintain the ordering as much as possible? Does that actually help? I > mean, the incremental sort patch allows the sorting to happen by pieces, > but here we still need to sort all the data, right? > > Can you give an example demonstrating the benefit? See tst.sql. queries are marked with opt (optimization is on) and noopt. Query 1: select count(*) from btg group by v, r; Query 2: select count(*) from btg group by n, v, r order by n; For both queries it's possible to reorder v and r column, n column has the single distinct value. On my laptop: Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms 2opt vs 2noopt: 5800.307 ms vs 7486.967 ms So, what we see: 1) for query 1 optimization gives 2 times better performance, for query 2 only 30%. if column 'n' will be unique then time for query 1 and 2 should be the same. We could add check for preordered pathkeys in get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys) is close to 1 then we could do not reordering of tail of pathkeys. 2) Planing cost is the same for all queries. So, cost_sort() doesn't take into account even number of columns. > FWIW I think it would be useful to have "development GUC" that would > allow us to enable/disable these options during development, because > that makes experiments much easier. But then remove them before commit. Added, v9, debug_enable_group_by_match_order_by and debug_enable_cheapest_group_by. I also checked compatibility with incremental sort patch, and all works except small merge conflict which could be resolved right before committing. Next, I had a look on cost_incremental_sort() provided by incremental sort patch and, it's a pity, it doesn't solve our problem with the impact of the cost of per-column comparison function and number of its calls. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 06/07/2018 06:22 PM, Teodor Sigaev wrote: >>> Yes. But again, this description is a bit short. First it works after >>> first patch and might get some preordered leading pathkeys. Second, it >>> tries to match ORDER BY clause order if there is no preordered leading >>> pathkeys from first patch (it was introduced in v7). And third, if there >>> is a tail of unmatched pathkeys on previous stages then it will reorder >>> that tail. >>> >> >> OK, I haven't looked at v7 yet, but if I understand correctly it tries >> to maintain the ordering as much as possible? Does that actually help? I >> mean, the incremental sort patch allows the sorting to happen by pieces, >> but here we still need to sort all the data, right? >> >> Can you give an example demonstrating the benefit? > > See tst.sql. queries are marked with opt (optimization is on) and noopt. > > Query 1: select count(*) from btg group by v, r; > Query 2: select count(*) from btg group by n, v, r order by n; > > For both queries it's possible to reorder v and r column, n column has > the single distinct value. > > On my laptop: > Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms > 2opt vs 2noopt: 5800.307 ms vs 7486.967 ms > > So, what we see: > 1) for query 1 optimization gives 2 times better performance, for query > 2 only 30%. if column 'n' will be unique then time for query 1 and 2 > should be the same. We could add check for preordered pathkeys in > get_cheapest_group_keys_order() and if estimate_num_groups(reordered > pathkeys) is close to 1 then we could do not reordering of tail of > pathkeys. > OK, those are nice improvements, although the examples are somewhat extreme (only 1 or 2 distinct values). So yes, in some cases this reordering clearly makes a big difference, but I still think we need to improve the heuristics to minimize regressions. I see v9 is now calling estimate_num_groups(), so it already benefits from extended statistics. Nice! I think the algorithm is more or less the greedy option I proposed in one of the earlier messages - I don't know if it's sufficient or not, but the only other idea I have is essentially an exhaustive search through all possible permutations. So that's a nice improvement, although I think we should also consider non-uniform distributions (using the per-column MCV lists). > 2) Planing cost is the same for all queries. So, cost_sort() doesn't > take into account even number of columns. > Yeah. As I wrote before, I think we'll need to come up with a model for comparison_cost depending on the column order, which should make costs for those paths different. >> FWIW I think it would be useful to have "development GUC" that would >> allow us to enable/disable these options during development, because >> that makes experiments much easier. But then remove them before commit. > Added, v9, debug_enable_group_by_match_order_by and > debug_enable_cheapest_group_by. I also checked compatibility with > incremental sort patch, and all works except small merge conflict which > could be resolved right before committing. > OK. I think we should also have a debug GUC for the first patch (i.e. one that would allow reordering group_pathkeys to match the input path). Regarding the two GUCs introduced in v9, I'm not sure I 100% understand what they are doing. Let me try explaining what I think they do: 1) debug_cheapest_group_by - Reorder GROUP BY clauses to using ndistinct stats for columns, placing columns with more distinct values first (so that we don't need to compare the following columns as often). 2) debug_group_by_match_order_by - Try to reorder the GROUP BY clauses to match the ORDER BY, so that the group aggregate produces results in the desired output (and we don't need an explicit Sort). FWIW I doubt about the debug_group_by_match_order_by optimization. The trouble is that it's only based on comparing the pathkeys lists, and entirely ignores that the two result sets may operate on very different numbers of rows. Consider for example "SELECT * FROM t GROUP BY a,b,c,d ORDER BY c,d" where table "t" has 1B rows, but there are only ~10k rows in the result (which is fairly common in fact tables in star-schema DBs). IIUC the optimization will ensure "c,d" is placed first in the GROUP BY, and then we reorder "a,b" using ndistinct. But this may be much less efficient than simply reordering (a,b,c,d) per ndistinct and then adding explicit Sort on top of that (because on 10k rows that's going to be cheap). So I think this somewhat contradicts the order-by-ndistinct optimization and may easily undo it's benefits. The other "issue" I have with get_cheapest_group_keys_order() is how it interacts with group_keys_reorder_by_pathkeys(). I mean, we first call n_preordered = group_keys_reorder_by_pathkeys(path->pathkeys, ...); so the reordering tries to maintain ordering of the input path. Then if (n_preordered == 0) we do this: group_keys_reorder_by_pathkeys(root->sort_pathkeys, ...); Doesn't that mean that if we happen to have input path with partially matching ordering (so still requiring explicit Sort before grouping) we may end up ignoring the ORDER BY entirely? Instead of using Sort that would match the ORDER BY? I might have misunderstood this, though. I'm not sure why the first group_keys_reorder_by_pathkeys() call does not simply return 0 when the input path ordering is not sufficient for the grouping? Is there some reasoning behind that (e.g. that sorting partially sorted is faster)? Overall I think this patch introduces four different optimizations that are indeed very interesting: 1) consider additional sorted paths for grouping 2) reorder GROUP BY clauses per ndistinct to minimize comparisons 3) when adding Sort for grouping, maintain partial input ordering 4) when adding Sort for grouping, try producing the right output order (if the ORDER BY was specified) But at this point it's kinda mashed together in ad-hoc manner, using very simple heuristics / assumptions. We'll need to figure out how to combine those optimizations. BTW I get compiler warnings that n_preordered_groups may be used uninitialized in add_paths_to_grouping_rel, because it's only set in an if/else branch, and then used further down. > Next, I had a look on cost_incremental_sort() provided by > incremental sort patch and, it's a pity, it doesn't solve our problem > with the impact of the cost of per-column comparison function and > number of its calls. I currently doesn't, but that might be more an issue in the incremental sort patch and we may need to fix that. Clearly the costing in general in that patch needs more work, and I do recall Tom pointing out that the current heuristics (estimating number of sort groups using ndistincts) seems rather weak. It may not be exactly the same problem, but it seems kinda similar to what this patch does. I have a hunch that those patches will end up inventing something fairly similar. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/09/2018 08:09 PM, Tomas Vondra wrote: > > /snip/ > > 4) when adding Sort for grouping, try producing the right output order > (if the ORDER BY was specified) > BTW I've just realized we already do something similar in master. If you run a query like this: SELECT a, b, count(*) FROM t GROUP BY b, a ORDER BY a; we will actually plan it like this: QUERY PLAN --------------------------- GroupAggregate Group Key: a, b -> Sort Sort Key: a, b -> Seq Scan on t (5 rows) I.e. we already do reorder the group clauses to match ORDER BY, to only require a single sort. This happens in preprocess_groupclause(), which also explains the reasoning behind that. I wonder if some of the new code reordering group pathkeys could/should be moved here (not sure, maybe it's too early for those decisions). In any case, it might be appropriate to update some of the comments before preprocess_groupclause() which claim we don't do certain things added by the proposed patches. This probably also somewhat refutes my claim that the order of grouping keys is currently fully determined by users (and so they may pick the most efficient order), while the reorder-by-ndistinct patch would make that impossible. Apparently when there's ORDER BY, we already mess with the order of group clauses - there are ways to get around it (subquery with OFFSET 0) but it's much less clear. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> I see v9 is now calling estimate_num_groups(), so it already benefits > from extended statistics. Nice! I think the algorithm is more or less > the greedy option I proposed in one of the earlier messages - I don't > know if it's sufficient or not, but the only other idea I have is > essentially an exhaustive search through all possible permutations. All possible permutation could be slow with extremely large group by clause, so we will need some some stop limit - like join_collapse_limit. I don't like this idea. > So that's a nice improvement, although I think we should also consider > non-uniform distributions (using the per-column MCV lists). Could you clarify how to do that? > >> 2) Planing cost is the same for all queries. So, cost_sort() doesn't >> take into account even number of columns. >> > > Yeah. As I wrote before, I think we'll need to come up with a model for > comparison_cost depending on the column order, which should make costs > for those paths different. Yeah. But I still think it should be separated patch which will improve cost estimation and plan choosing in other optimizer part too. > OK. I think we should also have a debug GUC for the first patch (i.e. > one that would allow reordering group_pathkeys to match the input path). Added to 0002-opt_group_by_index_and_order-v10.patch . debug_group_by_reorder_by_pathkeys. > Regarding the two GUCs introduced in v9, I'm not sure I 100% understand > what they are doing. Let me try explaining what I think they do: > > 1) debug_cheapest_group_by - Reorder GROUP BY clauses to using ndistinct > stats for columns, placing columns with more distinct values first (so > that we don't need to compare the following columns as often). yes > > 2) debug_group_by_match_order_by - Try to reorder the GROUP BY clauses > to match the ORDER BY, so that the group aggregate produces results in > the desired output (and we don't need an explicit Sort). yes > FWIW I doubt about the debug_group_by_match_order_by optimization. The > trouble is that it's only based on comparing the pathkeys lists, and > entirely ignores that the two result sets may operate on very different > numbers of rows. > > Consider for example "SELECT * FROM t GROUP BY a,b,c,d ORDER BY c,d" > where table "t" has 1B rows, but there are only ~10k rows in the result > (which is fairly common in fact tables in star-schema DBs). IIUC the > optimization will ensure "c,d" is placed first in the GROUP BY, and then > we reorder "a,b" using ndistinct. But this may be much less efficient > than simply reordering (a,b,c,d) per ndistinct and then adding explicit > Sort on top of that (because on 10k rows that's going to be cheap). > As you pointed in next email, this optimization is already implemented in preprocess_groupclause(). And correct resolving of this issue could be done only after implementing of correct cost_sort() - which will pay attention to comparison cost, number of distinct value in columns and how often it's needed to compare second and next columns. > > The other "issue" I have with get_cheapest_group_keys_order() is how it > interacts with group_keys_reorder_by_pathkeys(). I mean, we first call > > n_preordered = group_keys_reorder_by_pathkeys(path->pathkeys, ...); > > so the reordering tries to maintain ordering of the input path. Then if > (n_preordered == 0) we do this: > > group_keys_reorder_by_pathkeys(root->sort_pathkeys, ...); > > Doesn't that mean that if we happen to have input path with partially > matching ordering (so still requiring explicit Sort before grouping) we > may end up ignoring the ORDER BY entirely? Instead of using Sort that > would match the ORDER BY? I might have misunderstood this, though. Hm. I believe ordering of input of GROUP clause is always more expensive - just because output of GROUP BY clause should have less number of rows than its input, what means more cheap ordering. So here it uses ORDER BY only if we don't have a group pathkey(s) matched by path pathkeys. > > I'm not sure why the first group_keys_reorder_by_pathkeys() call does > not simply return 0 when the input path ordering is not sufficient for > the grouping? Is there some reasoning behind that (e.g. that sorting > partially sorted is faster)? Hm, what do you mean - sufficient? group_keys_reorder_by_pathkeys() always tries to order GROUP BY columns by preexisting pathkeys. But of course, incremental sort will add more benefits here. I'd like to hope that incremental sort patch is close to commit. > > Overall I think this patch introduces four different optimizations that > are indeed very interesting: > > 1) consider additional sorted paths for grouping Agree > > 2) reorder GROUP BY clauses per ndistinct to minimize comparisons Agree (with a help of estimate_num_groups() and patch tries to maximize a number of groups on each step) > 3) when adding Sort for grouping, maintain partial input ordering Agree - and incremental sort patch will helpful here. > 4) when adding Sort for grouping, try producing the right output order > (if the ORDER BY was specified) Yes. > > But at this point it's kinda mashed together in ad-hoc manner, using > very simple heuristics / assumptions. We'll need to figure out how to > combine those optimizations. The used heuristics here: 1) if there is a some order - let's used it 2) if there is a required order - let's match that order to prevent extra sort node. Incremental sort patch will improve cases where there is partial match of order. > > BTW I get compiler warnings that n_preordered_groups may be used > uninitialized in add_paths_to_grouping_rel, because it's only set in an > if/else branch, and then used further down. Fixed, but I believe that compiler is not smart enough here. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
> I.e. we already do reorder the group clauses to match ORDER BY, to only > require a single sort. This happens in preprocess_groupclause(), which > also explains the reasoning behind that. Huh. I missed that. That means group_keys_reorder_by_pathkeys() call inside get_cheapest_group_keys_order() duplicates work of preprocess_groupclause(). And so it's possible to replace it to simple counting number of the same pathkeys in beginning of lists. Or remove most part of preprocess_groupclause() - but it seems to me we should use first option, preprocess_groupclause() is simpler, it doesn't operate with pathkeys only with SortGroupClause which is simpler. BTW, incremental sort path provides pathkeys_common(), exactly what we need. > I wonder if some of the new code reordering group pathkeys could/should > be moved here (not sure, maybe it's too early for those decisions). In > any case, it might be appropriate to update some of the comments before > preprocess_groupclause() which claim we don't do certain things added by > the proposed patches. preprocess_groupclause() is too early step to use something like group_keys_reorder_by_pathkeys() because pathkeys is not known here yet. > This probably also somewhat refutes my claim that the order of grouping > keys is currently fully determined by users (and so they may pick the > most efficient order), while the reorder-by-ndistinct patch would make > that impossible. Apparently when there's ORDER BY, we already mess with > the order of group clauses - there are ways to get around it (subquery > with OFFSET 0) but it's much less clear. I like a moment when objections go away :) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 06/13/2018 06:56 PM, Teodor Sigaev wrote: >> I.e. we already do reorder the group clauses to match ORDER BY, to only >> require a single sort. This happens in preprocess_groupclause(), which >> also explains the reasoning behind that. > Huh. I missed that. That means group_keys_reorder_by_pathkeys() call > inside get_cheapest_group_keys_order() duplicates work of > preprocess_groupclause(). > And so it's possible to replace it to simple counting number of the > same pathkeys in beginning of lists. Or remove most part of > preprocess_groupclause() - but it seems to me we should use first > option, preprocess_groupclause() is simpler, it doesn't operate with > pathkeys only with SortGroupClause which is simpler. > > BTW, incremental sort path provides pathkeys_common(), exactly what we > need. > >> I wonder if some of the new code reordering group pathkeys could/should >> be moved here (not sure, maybe it's too early for those decisions). In >> any case, it might be appropriate to update some of the comments before >> preprocess_groupclause() which claim we don't do certain things added by >> the proposed patches. > > preprocess_groupclause() is too early step to use something like > group_keys_reorder_by_pathkeys() because pathkeys is not known here yet. > > >> This probably also somewhat refutes my claim that the order of >> grouping keys is currently fully determined by users (and so they >> may pick the most efficient order), while the reorder-by-ndistinct >> patch would make that impossible. Apparently when there's ORDER BY, >> we already mess with the order of group clauses - there are ways to >> get around it (subquery with OFFSET 0) but it's much less clear. > > I like a moment when objections go away :) > I'm afraid that's a misunderstanding - my objections did not really go away. I'm just saying my claim that we're not messing with order of grouping keys is not entirely accurate, because we do that in one particular case. I still think we need to be careful when introducing new optimizations in this area - reordering the grouping keys by ndistinct, ORDER BY or whatever. In particular I don't think we should commit these patches that may quite easily cause regressions, and then hope some hypothetical future patch fixes the costing. Those objections still stand. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> I'm afraid that's a misunderstanding - my objections did not really go > away. I'm just saying my claim that we're not messing with order of > grouping keys is not entirely accurate, because we do that in one > particular case. > > I still think we need to be careful when introducing new optimizations > in this area - reordering the grouping keys by ndistinct, ORDER BY or > whatever. In particular I don't think we should commit these patches > that may quite easily cause regressions, and then hope some hypothetical > future patch fixes the costing. Those objections still stand. Pls, have a look at https://commitfest.postgresql.org/18/1706/ I tried to attack the cost_sort() issues and hope on that basis we can solve problems with 0002 patch and improve incremental sort patch. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 06/28/2018 06:52 PM, Teodor Sigaev wrote: >> I'm afraid that's a misunderstanding - my objections did not really go >> away. I'm just saying my claim that we're not messing with order of >> grouping keys is not entirely accurate, because we do that in one >> particular case. >> >> I still think we need to be careful when introducing new optimizations >> in this area - reordering the grouping keys by ndistinct, ORDER BY or >> whatever. In particular I don't think we should commit these patches >> that may quite easily cause regressions, and then hope some hypothetical >> future patch fixes the costing. Those objections still stand. > > Pls, have a look at https://commitfest.postgresql.org/18/1706/ > I tried to attack the cost_sort() issues and hope on that basis we can > solve problems with 0002 patch and improve incremental sort patch. > OK, will do. Thanks for working on this! regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> I tried to attack the cost_sort() issues and hope on that basis we can solve >> problems with 0002 patch and improve incremental sort patch. >> > > OK, will do. Thanks for working on this! I hope, now we have a better cost_sort(). The obvious way is a try all combination of pathkeys in get_cheapest_group_keys_order() and choose cheapest one by cost_sort(). But it requires N! operations and potentially could be very expensive in case of large number of pathkeys and doesn't solve the issue with user-knows-what-he-does pathkeys. We could suggest an order of pathkeys as patch suggests now and if cost_sort() estimates cost is less than 80% (arbitrary chosen) cost of user-suggested pathkeys then it use our else user pathkeys. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 06/29/2018 04:51 PM, Teodor Sigaev wrote: > >>> I tried to attack the cost_sort() issues and hope on that basis we >>> can solve problems with 0002 patch and improve incremental sort patch. >>> >> >> OK, will do. Thanks for working on this! > > I hope, now we have a better cost_sort(). The obvious way is a try all > combination of pathkeys in get_cheapest_group_keys_order() and choose > cheapest one by cost_sort(). > But it requires N! operations and potentially could be very > expensive in case of large number of pathkeys and doesn't solve the > issue with user-knows-what-he-does pathkeys. Not sure. There are N! combinations, but this seems like a good candidate for backtracking [1]. You don't have to enumerate and evaluate all N! combinations, just construct one and then abandon whole classes of combinations as soon as they get more expensive than the currently best one. That's thanks to additive nature of the comparison costing, because appending a column to the sort key can only make it more expensive. My guess is this will make this a non-issue. [1] https://en.wikipedia.org/wiki/Backtracking > > We could suggest an order of pathkeys as patch suggests now and if > cost_sort() estimates cost is less than 80% (arbitrary chosen) cost > of user-suggested pathkeys then it use our else user pathkeys. > I really despise such arbitrary thresholds. I'd much rather use a more reliable heuristics by default, even if it gets it wrong in some cases (which it will, but that's natural). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 30/06/18 03:03, Tomas Vondra wrote: > On 06/29/2018 04:51 PM, Teodor Sigaev wrote: >> >>>> I tried to attack the cost_sort() issues and hope on that basis we >>>> can solve problems with 0002 patch and improve incremental sort patch. >>>> >>> >>> OK, will do. Thanks for working on this! >> >> I hope, now we have a better cost_sort(). The obvious way is a try >> all combination of pathkeys in get_cheapest_group_keys_order() and >> choose cheapest one by cost_sort(). > >> But it requires N! operations and potentially could be very >> expensive in case of large number of pathkeys and doesn't solve the >> issue with user-knows-what-he-does pathkeys. > > Not sure. There are N! combinations, but this seems like a good > candidate for backtracking [1]. You don't have to enumerate and > evaluate all N! combinations, just construct one and then abandon > whole classes of combinations as soon as they get more expensive than > the currently best one. That's thanks to additive nature of the > comparison costing, because appending a column to the sort key can > only make it more expensive. My guess is this will make this a non-issue. > > [1] https://en.wikipedia.org/wiki/Backtracking > >> >> We could suggest an order of pathkeys as patch suggests now and if >> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost >> of user-suggested pathkeys then it use our else user pathkeys. >> > > I really despise such arbitrary thresholds. I'd much rather use a more > reliable heuristics by default, even if it gets it wrong in some cases > (which it will, but that's natural). > > regards > Additionally put an upper limit threshold on the number of combinations to check, fairly large by default? If first threshold is exceeded, could consider checking out a few more selected at random from paths not yet checked, to avoid any bias caused by stopping a systematic search. This might prove important when N! is fairly large. Cheers, Gavin
On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote: > Additionally put an upper limit threshold on the number of combinations to > check, fairly large by default? > > If first threshold is exceeded, could consider checking out a few more > selected at random from paths not yet checked, to avoid any bias caused by > stopping a systematic search. This might prove important when N! is fairly > large. Please note that the latest patch available does not apply, so this has been moved to next CF 2018-11, waiting for input from its author. -- Michael
Attachment
> On Tue, Oct 2, 2018 at 4:16 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote: > > Additionally put an upper limit threshold on the number of combinations to > > check, fairly large by default? > > > > If first threshold is exceeded, could consider checking out a few more > > selected at random from paths not yet checked, to avoid any bias caused by > > stopping a systematic search. This might prove important when N! is fairly > > large. > > Please note that the latest patch available does not apply, so this has > been moved to next CF 2018-11, waiting for input from its author. Unfortunately, patch still has some conflicts, could you please rebase it?
On 2018-11-29 17:56:53 +0100, Dmitry Dolgov wrote: > > On Tue, Oct 2, 2018 at 4:16 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote: > > > Additionally put an upper limit threshold on the number of combinations to > > > check, fairly large by default? > > > > > > If first threshold is exceeded, could consider checking out a few more > > > selected at random from paths not yet checked, to avoid any bias caused by > > > stopping a systematic search. This might prove important when N! is fairly > > > large. > > > > Please note that the latest patch available does not apply, so this has > > been moved to next CF 2018-11, waiting for input from its author. > > Unfortunately, patch still has some conflicts, could you please rebase it? As nothing has happened since, I'm marking this as returned with feedback. Greetings, Andres Freund
> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres@anarazel.de> wrote: > > As nothing has happened since, I'm marking this as returned with > feedback. This patch was on my radar for some time in the past and we've seen use cases where it could be pretty useful (probably even without the incremental sort patch). I would like to make some progress here and see if it's possible to continue it's development. I've attached the rebased version with a small changes, e.g. I've created a separate patch with group by reordering tests to make it easy to see what changes were introduced, and after some experiments removed part that seems to duplicate "group by" reordering to follow "order by". Also looks like it's possible to make these patches independent by having a base patch with the isolated group_keys_reorder_by_pathkeys (they're connected via n_preordered), but I haven't done this yet. I went through the thread to summarize the objections, that were mentioned so far. Most of them are related to the third patch in the series, where reordering based on "ndistincs" is implemented, and are about cost_sort (all the possible problems that could happen without proper cost estimation due to non uniform distribution, different comparison costs and so on) and figuring out how to limit number of possible combinations of pathkeys to compare. I haven't looked at the proposed backtracking approach, but taking into account that suggested patch for cost_sort [1] is RWF, I wonder what would be the best strategy to proceed? [1]: https://commitfest.postgresql.org/21/1706/
Attachment
On Thu, Apr 04, 2019 at 05:11:09PM +0200, Dmitry Dolgov wrote: >> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres@anarazel.de> wrote: >> >> As nothing has happened since, I'm marking this as returned with >> feedback. > >This patch was on my radar for some time in the past and we've seen use cases >where it could be pretty useful (probably even without the incremental sort >patch). I would like to make some progress here and see if it's possible to >continue it's development. I've attached the rebased version with a small >changes, e.g. I've created a separate patch with group by reordering tests to >make it easy to see what changes were introduced, and after some experiments >removed part that seems to duplicate "group by" reordering to follow "order >by". Also looks like it's possible to make these patches independent by having >a base patch with the isolated group_keys_reorder_by_pathkeys (they're >connected via n_preordered), but I haven't done this yet. > >I went through the thread to summarize the objections, that were mentioned so >far. Most of them are related to the third patch in the series, where >reordering based on "ndistincs" is implemented, and are about cost_sort (all >the possible problems that could happen without proper cost estimation due to >non uniform distribution, different comparison costs and so on) and figuring >out how to limit number of possible combinations of pathkeys to compare. I >haven't looked at the proposed backtracking approach, but taking into account >that suggested patch for cost_sort [1] is RWF, I wonder what would be the best >strategy to proceed? > >[1]: https://commitfest.postgresql.org/21/1706/ Dunno. It seems the progres on the sort-related patches was rather limited in the PG12 cycle in general :-( There's the Incremental Sort patch, GROUP BY optimization and then the cost_sort patch. Not sure about the best strategy, though. One obvious option is to rely on cost_sort patch to do all the improvements needed for the other patches, but that assumes that patch moves reasonably fast. So I personally would suggest to treat those patches as independent until the very last moment, develop the costing improvements needed by each of them, and then decide which of them are committable / in what order. At the end of PG11 cycle I've offered my help with testing / reviewing those patches, if there is progress. That still holds, if there are new patch versions I'll look at them. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > So I personally would suggest to treat those patches as independent until > the very last moment, develop the costing improvements needed by each > of them, and then decide which of them are committable / in what order. I had the same idea, but judging from the questions, raised in this thread, it's quite hard to go with reordering based only on frequency of values. I hoped that the cost_sort improvement patch would be simple enough to incorporate it here, but of course it wasn't. Having an assumption, that the amount of work, required for performing sorting, depends only on the number of distinct groups and how costly it is to compare a values of this data type, I've ended up extracting get_width_multiplier and get_func_cost parts from cost_sort patch and including them into 0003-Reorder-by-values-distribution. This allows to take into account situations when we compare e.g. long strings or a custom data type with high procost for comparison (but I've used this values directly without any adjusting coefficients yet). > On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote: > > > So that's a nice improvement, although I think we should also consider > > non-uniform distributions (using the per-column MCV lists). > > Could you clarify how to do that? Since I'm not familiar with this topic, I would like to ask the same question, how to do that and what are the advantages? > On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > I still think we need to be careful when introducing new optimizations > in this area - reordering the grouping keys by ndistinct, ORDER BY or > whatever. In particular I don't think we should commit these patches > that may quite easily cause regressions, and then hope some hypothetical > future patch fixes the costing. I'm a bit concerned about this part of the discussion. There is an idea through the whole thread about avoiding the situation, when a user knows which order is better and we generate different one by mistake. From what I see right now even if all the objections would be addressed, there is a chance that some coefficients will be not good enough (e.g. width multiplier is based on an average width, or it can suddenly happen that all the compared string have some difference at the very beginning) and the chosen order will be not optimal. Does it mean that in any case the implementation of such optimization should provide a way to override it?
Attachment
On Fri, May 03, 2019 at 10:28:21PM +0200, Dmitry Dolgov wrote: >> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> So I personally would suggest to treat those patches as independent until >> the very last moment, develop the costing improvements needed by each >> of them, and then decide which of them are committable / in what order. > >I had the same idea, but judging from the questions, raised in this thread, >it's quite hard to go with reordering based only on frequency of values. I >hoped that the cost_sort improvement patch would be simple enough to >incorporate it here, but of course it wasn't. Having an assumption, that the >amount of work, required for performing sorting, depends only on the number of >distinct groups and how costly it is to compare a values of this data type, >I've ended up extracting get_width_multiplier and get_func_cost parts from >cost_sort patch and including them into 0003-Reorder-by-values-distribution. >This allows to take into account situations when we compare e.g. long strings >or a custom data type with high procost for comparison (but I've used this >values directly without any adjusting coefficients yet). > >> On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote: >> >> > So that's a nice improvement, although I think we should also consider >> > non-uniform distributions (using the per-column MCV lists). >> >> Could you clarify how to do that? > >Since I'm not familiar with this topic, I would like to ask the same question, >how to do that and what are the advantages? > I don't recall the exact details of the discussion, since most of it happened almost a year ago, but the main concern about the original costing approach is that it very much assumes uniform distribution. For example if you have N tuples with M groups (which are not computed using estimate_num_groups IIRC, and we can hardly do much better), then the patch assumed each groups is ~N/M rows and used that for computing the number of comparisons for a particular sequence of ORDER BY clauses. But that may result in pretty significant errors in causes with a couple of very large groups. But hey - we can get some of that information from MCV lists, so maybe we can compute a safer less-optimistic estimate. I mean, we can't compute "perfect" estimate of how many comparisons we'll have to do, because we only have per-column MCV lits and maybe some multi-column MCV lists (but definitely not on all combinations of columns in the ORDER BY clause). But what I think we could do is using largest possible group instead of the average one. So for example when estimating number of comparisons for columns (c1,..,cN), you could look at MCV lists for these columns and compute m(i) = Max(largest group in MCV list for i-th column) and then use Min(m(1), ..., m(k)) when estimating the number of comparisons. Of course, this is likely to be a worst-case estimate, and it's not quite obvious that comparing worst-case estimates is necessarily safer than comparing the regular ones. So I'm not sure about it. It might be better to just compute the 'average group' in a different way, not by arithmetic mean, but say as geometric mean from each MCV list. Not sure. I guess this needs some research and experimentation - constructing cases that are likely to cause problems (non-uniform distributions, columns with a small number of exceptionally large groups, ...) and then showing how the costing deals with those. FWIW I've mentioned MCV lists in the incrementalsort thread too, in which case it was much clearer how to use them to improve the startup cost estimate - it's enough to look at the first group in per-column MCV lists (first in the ORDER BY sense, not by frequency). >> On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> I still think we need to be careful when introducing new optimizations >> in this area - reordering the grouping keys by ndistinct, ORDER BY or >> whatever. In particular I don't think we should commit these patches >> that may quite easily cause regressions, and then hope some hypothetical >> future patch fixes the costing. > >I'm a bit concerned about this part of the discussion. There is an idea through >the whole thread about avoiding the situation, when a user knows which order is >better and we generate different one by mistake. From what I see right now even >if all the objections would be addressed, there is a chance that some >coefficients will be not good enough (e.g. width multiplier is based on an >average width, or it can suddenly happen that all the compared string have some >difference at the very beginning) and the chosen order will be not optimal. >Does it mean that in any case the implementation of such optimization should >provide a way to override it? I don't think we have to provide an override no matter what. Otherwise we'd have to have an override for everything, because no optimizer decision is perfect - it's all built on estimates, after all. But I assume we agree that optimizations based on estimates thare are known to be unreliable are bound to be unreliable too. And optimizations that regularly misfire for common data sets may easily do more harm than good (and maybe should not be called optimizations at all). For example, the first patch relied on ndistinct estimates quite a bit and used them to compute multi-column estimates, which we already know is rather poor for GROUP BY estimates. The v9 uses estimate_num_groups() which works better because it leverages extended stats, when available. That's good, but we need to see how the rest of the costing works. So I think it very much depends on how reliable the costing can be made, and how bad consequences a poor choice can have. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Fri, May 3, 2019 at 11:55 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > I don't recall the exact details of the discussion, since most of it > happened almost a year ago, but the main concern about the original > costing approach is that it very much assumes uniform distribution. > > For example if you have N tuples with M groups (which are not computed > using estimate_num_groups IIRC, and we can hardly do much better), then > the patch assumed each groups is ~N/M rows and used that for computing > the number of comparisons for a particular sequence of ORDER BY clauses. > > But that may result in pretty significant errors in causes with a couple > of very large groups. Yes, you're right, the current version of the patch assumes uniform distribution of values between these M groups. After some thinking I've got an idea that maybe it's better to not try to find out what would be acceptable for both uniform and non uniform distributions, but just do not reorder at all if there are any significant deviations from what seems to be a "best case", namely when values distributed among groups relatively uniformly and there are no doubts about how complicated is to compare them. Saying that, it's possible on top of the current implementation to check: * dependency coefficient between columns (if I understand correctly, non uniform distributions of values between the groups a.k.a few large groups should be visible from an extended statistics as a significant dependency) * largest/smallest group in MCV doesn't differ too much from an expected "average" group * average width and comparison procost are similar If everything fits (which I hope would be most of the time) - apply reorder, otherwise use whatever order was specified (which leaves the room for manual reordering for relatively rare situations). Does it makes sense? > But what I think we could do is using largest possible group instead of > the average one. So for example when estimating number of comparisons > for columns (c1,..,cN), you could look at MCV lists for these columns > and compute > > m(i) = Max(largest group in MCV list for i-th column) > > and then use Min(m(1), ..., m(k)) when estimating the number of > comparisons. I see the idea, but I'm a bit confused about how to get a largest group for a MCV list? I mean there is a list of most common values per column with frequencies, and similar stuff for multi columns statistics, but how to get a size of those groups?
On Fri, May 24, 2019 at 02:10:48PM +0200, Dmitry Dolgov wrote: >> On Fri, May 3, 2019 at 11:55 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> I don't recall the exact details of the discussion, since most of it >> happened almost a year ago, but the main concern about the original >> costing approach is that it very much assumes uniform distribution. >> >> For example if you have N tuples with M groups (which are not computed >> using estimate_num_groups IIRC, and we can hardly do much better), then >> the patch assumed each groups is ~N/M rows and used that for computing >> the number of comparisons for a particular sequence of ORDER BY clauses. >> >> But that may result in pretty significant errors in causes with a couple >> of very large groups. > >Yes, you're right, the current version of the patch assumes uniform >distribution of values between these M groups. After some thinking I've got an >idea that maybe it's better to not try to find out what would be acceptable for >both uniform and non uniform distributions, but just do not reorder at all if >there are any significant deviations from what seems to be a "best case", >namely when values distributed among groups relatively uniformly and there are >no doubts about how complicated is to compare them. > >Saying that, it's possible on top of the current implementation to check: > >* dependency coefficient between columns (if I understand correctly, non > uniform distributions of values between the groups a.k.a few large groups > should be visible from an extended statistics as a significant dependency) > >* largest/smallest group in MCV doesn't differ too much from an expected > "average" group > >* average width and comparison procost are similar > >If everything fits (which I hope would be most of the time) - apply reorder, >otherwise use whatever order was specified (which leaves the room for manual >reordering for relatively rare situations). Does it makes sense? > I think those are interesting sources of information. I don't know if it will be sufficient, but I think it's worth exploring. One of the issues with this approach however will be selecting the threshold values. That is, how do you decide whether that a given functional dependency is "too strong" or the largest/smallest MCV item differs too much from the average one? If you pick a particular threshold value, there'll be a sudden behavior change at some point, and that's very annoying. For example, you might pick 0.5 as a threshold for "strong" functional dependencies. There's little reason why 0.4999 should be weak and 0.5001 should be "strong". This may lead to sudden behavior changes after ANALYZE, for example. So I think we need to find a way to make this part of the costing model, which is how we deal with such cases elsewhere. >> But what I think we could do is using largest possible group instead of >> the average one. So for example when estimating number of comparisons >> for columns (c1,..,cN), you could look at MCV lists for these columns >> and compute >> >> m(i) = Max(largest group in MCV list for i-th column) >> >> and then use Min(m(1), ..., m(k)) when estimating the number of >> comparisons. > >I see the idea, but I'm a bit confused about how to get a largest group for a >MCV list? I mean there is a list of most common values per column with >frequencies, and similar stuff for multi columns statistics, but how to get a >size of those groups? You can just multiply the frequency by the number of rows in the table, that gives you the size of the group. Or an estimate of it, because there might be a filter condition involved. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, I wonder if anyone has plans to try again with this optimization in v14 cycle? The patches no longer apply thanks to the incremental sort patch, but I suppose fixing that should not be extremely hard. The 2020-07 CF is still a couple weeks away, but it'd be good to know if there are any plans to revive this. I'm willing to spend some time on reviewing / testing this, etc. I've only quickly skimmed the old thread, but IIRC there were two main challenges in getting the optimization right: 1) deciding which orderings are interesting / worth additional work I think we need to consider these orderings, in addition to the one specified in GROUP BY: 1) as specified in ORDER BY (if different from 1) 2) one with cheapest sort on unsorted input (depending on number of distinct values, cost of comparisons, etc.) 3) one with cheapest sort on partially sorted input (essentially what we do with the incremental sort paths, but matching the pathkeys in a more elaborate way) 2) costing the alternative orderings I think we've already discussed various ways to leverage as much available info as possible (extended stats, MCVs, ...) but I think the patch only does some of it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Fri, May 15, 2020 at 01:52:20AM +0200, Tomas Vondra wrote: > > I wonder if anyone has plans to try again with this optimization in v14 > cycle? The patches no longer apply thanks to the incremental sort patch, > but I suppose fixing that should not be extremely hard. > > The 2020-07 CF is still a couple weeks away, but it'd be good to know if > there are any plans to revive this. I'm willing to spend some time on > reviewing / testing this, etc. Yes, if you believe that this patch has potential, I would love to pick it up again. > I've only quickly skimmed the old thread, but IIRC there were two main > challenges in getting the optimization right: > > > 1) deciding which orderings are interesting / worth additional work > > I think we need to consider these orderings, in addition to the one > specified in GROUP BY: > > 1) as specified in ORDER BY (if different from 1) What is the idea behind considering this ordering?
On Sat, May 16, 2020 at 02:24:31PM +0200, Dmitry Dolgov wrote: >> On Fri, May 15, 2020 at 01:52:20AM +0200, Tomas Vondra wrote: >> >> I wonder if anyone has plans to try again with this optimization in v14 >> cycle? The patches no longer apply thanks to the incremental sort patch, >> but I suppose fixing that should not be extremely hard. >> >> The 2020-07 CF is still a couple weeks away, but it'd be good to know if >> there are any plans to revive this. I'm willing to spend some time on >> reviewing / testing this, etc. > >Yes, if you believe that this patch has potential, I would love to pick >it up again. > I think it's worth another try. I've been reminded of this limitation when working on the incremental sort, which significantly increases the number of orderings that we can sort efficiently. But we can't possibly leverage that unless it matches the GROUP BY ordering. The attached WIP patch somewhat addresses this by trying to reorder the group_pathkeys so allow leveraging sort of existing ordering (even just partial, with incremental sort). >> I've only quickly skimmed the old thread, but IIRC there were two main >> challenges in getting the optimization right: >> >> >> 1) deciding which orderings are interesting / worth additional work >> >> I think we need to consider these orderings, in addition to the one >> specified in GROUP BY: >> >> 1) as specified in ORDER BY (if different from 1) > >What is the idea behind considering this ordering? I'll try to answer this in general, i.e. what I think this patch needs to consider. Hopefully that'll answer why we should look at ORDER BY pathkeys too ... Reordering the pathkeys has both local and global effects, and we need to consider all of that to make the optimization reliable, otherwise we'll inevitably end up with trivial regressions. The local effects are trivial - it's for example reordering the pathkeys to make the explicit sort as cheap as possible. This thread already discussed a number of things to base this on - ndistinct for columns, cost of comparison function, ... In any case, this is something we can decide locally, when building the grouping paths. The global effects are much harder to tackle, because the decision can't be made locally when building the grouping paths. It requires changes both below and above the point where we build grouping paths. An example of a decision we need to make before we even get to building a grouping path is which index paths to build. Currently we only build index paths with "useful" pathkeys, and without tweaking that we'll never even see the index in add_paths_to_grouping_rel(). But there are also decisions that can be made only after we build the grouping paths. For example, we may have both GROUP BY and ORDER BY, and there is no "always correct" way to combine those. In some cases it may be correct to use the same pathkeys, in other cases it's better to use different ones (which will require an extra Sort, with additional cost). So I don't think there will be a single "interesting" grouping pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll need to build grouping paths for all of those, and leave the planner to eventually pick the one giving us the cheapest plan ... A "brute-force" approach would be to generate all possible orderings of group_pathkeys, but that's probably not going to fly - it might easily cause an explosion in number of paths we track, etc. So we'll need to pick/prioritize orderings that are more likely to give us efficient plans, and ORDER BY seems like a good example because it means we won't need an explicit sort. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
> On Sat, May 16, 2020 at 04:56:09PM +0200, Tomas Vondra wrote: > > So I don't think there will be a single "interesting" grouping pathkeys > (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll > need to build grouping paths for all of those, and leave the planner to > eventually pick the one giving us the cheapest plan ... > > A "brute-force" approach would be to generate all possible orderings of > group_pathkeys, but that's probably not going to fly - it might easily > cause an explosion in number of paths we track, etc. So we'll need to > pick/prioritize orderings that are more likely to give us efficient > plans, and ORDER BY seems like a good example because it means we won't > need an explicit sort. Yes, I see. I've already rebased the original version of patch and now working on your suggestions. In the meantime one question: > But there are also decisions that can be made only after we build the > grouping paths. For example, we may have both GROUP BY and ORDER BY, and > there is no "always correct" way to combine those. In some cases it may > be correct to use the same pathkeys, in other cases it's better to use > different ones (which will require an extra Sort, with additional cost). Do I understand correctly, your idea is that in some cases it's cheaper to use different order for GROUP BY than in ORDER BY even with an extra sorting? In the current patch implementation there is an assumption that says it's always better to match the order of pathkeys for both GROUP BY/ORDER BY (which means that the only degree of freedom there is to reorder the tail, which in turn makes both "unsorted input" and "partially sorted input" cases from your original email essentially the same). Can you show such an example when this assumption is invalid? > I've only quickly skimmed the old thread, but IIRC there were two main > challenges in getting the optimization right: > > 1) deciding which orderings are interesting / worth additional work > > I think we need to consider these orderings, in addition to the one > specified in GROUP BY: Yes, looks like the current patch implementation together with preprocess_groupclause already implements this, although maybe not that flexible. > 2) costing the alternative orderings > > I think we've already discussed various ways to leverage as much > available info as possible (extended stats, MCVs, ...) but I think the > patch only does some of it. Right, there were couple of ideas what to do in case of a few groups which are too big they can invalidate current assumptions about costs. E.g. do not apply reordering if we detected such situation, or use "conservative" approach and take the biggest group for estimations.
On Sat, Jun 20, 2020 at 04:30:10PM +0200, Dmitry Dolgov wrote: >> On Sat, May 16, 2020 at 04:56:09PM +0200, Tomas Vondra wrote: >> >> So I don't think there will be a single "interesting" grouping pathkeys >> (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll >> need to build grouping paths for all of those, and leave the planner to >> eventually pick the one giving us the cheapest plan ... >> >> A "brute-force" approach would be to generate all possible orderings of >> group_pathkeys, but that's probably not going to fly - it might easily >> cause an explosion in number of paths we track, etc. So we'll need to >> pick/prioritize orderings that are more likely to give us efficient >> plans, and ORDER BY seems like a good example because it means we won't >> need an explicit sort. > >Yes, I see. I've already rebased the original version of patch and now >working on your suggestions. In the meantime one question: > >> But there are also decisions that can be made only after we build the >> grouping paths. For example, we may have both GROUP BY and ORDER BY, and >> there is no "always correct" way to combine those. In some cases it may >> be correct to use the same pathkeys, in other cases it's better to use >> different ones (which will require an extra Sort, with additional cost). > >Do I understand correctly, your idea is that in some cases it's cheaper >to use different order for GROUP BY than in ORDER BY even with an extra >sorting? In the current patch implementation there is an assumption that >says it's always better to match the order of pathkeys for both GROUP >BY/ORDER BY (which means that the only degree of freedom there is to >reorder the tail, which in turn makes both "unsorted input" and >"partially sorted input" cases from your original email essentially the >same). Can you show such an example when this assumption is invalid? > Yes, that is mostly the point - I don't think we can assume simply using the ORDER BY pathkeys (if specified) for grouping is optimal. As a trivial counter-example, consider this CREATE TABLE t ( a int, b int, c int); INSERT INTO t SELECT 1000 * random(), 1000 * random(), i FROM generate_series(1,10000000) s(i); CREATE INDEX ON t (a, b, c); VACUUM ANALYZE t; And now some simple queries (you may need to tweak the planning options a bit, to get these plans - disable hashagg etc.). test=# explain analyze select a, b, count(c) from t group by a, b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.037..10509.505 rows=1001923 loops=1) Group Key: a, b -> Index Only Scan using t_a_b_c_idx on t (cost=0.43..304007.06 rows=10000175 width=12) (actual time=0.024..5189.908rows=10000000 loops=1) Heap Fetches: 0 Planning Time: 0.113 ms Execution Time: 10941.677 ms (6 rows) test=# explain analyze select a,b, count(c) from t group by a, b order by a, b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.033..10606.925 rows=1001923 loops=1) Group Key: a, b -> Index Only Scan using t_a_b_c_idx on t (cost=0.43..304007.06 rows=10000175 width=12) (actual time=0.020..5240.332rows=10000000 loops=1) Heap Fetches: 0 Planning Time: 0.100 ms Execution Time: 11043.358 ms (6 rows) test=# explain analyze select a,b, count(c) from t group by a, b order by b, a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=1658556.19..1768558.12 rows=1000018 width=16) (actual time=14707.676..25533.053 rows=1001923 loops=1) Group Key: b, a -> Sort (cost=1658556.19..1683556.63 rows=10000175 width=12) (actual time=14707.659..20011.024 rows=10000000 loops=1) Sort Key: b, a Sort Method: external merge Disk: 219200kB -> Seq Scan on t (cost=0.00..154056.75 rows=10000175 width=12) (actual time=0.028..4751.244 rows=10000000 loops=1) Planning Time: 0.103 ms Execution Time: 25989.412 ms (8 rows) test=# explain analyze select * from (select a,b, count(c) from t group by a, b offset 0) foo order by b,a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=515759.00..518259.04 rows=1000018 width=16) (actual time=11281.094..11783.920 rows=1001923 loops=1) Sort Key: t.b, t.a Sort Method: external merge Disk: 34880kB -> GroupAggregate (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.064..10507.256 rows=1001923 loops=1) Group Key: t.a, t.b -> Index Only Scan using t_a_b_c_idx on t (cost=0.43..304007.06 rows=10000175 width=12) (actual time=0.052..5209.939rows=10000000 loops=1) Heap Fetches: 0 Planning Time: 0.111 ms Execution Time: 12218.370 ms (9 rows) To sum this up: grouping (a,b): 10941 ms grouping (a,b) + ordering (a,b): 11043 grouping (b,a) + ordering (b,a): 25989 grouping (a,b) + ordering (b,a): 12218 So, it's fast as long as we can use the IOS, even if we have to do an extra Sort at the end. This obviously relies on the grouping to reduce the number of rows (in this case from 10M to 1M), which makes the extra cost much cheaper. I'm pretty sure I could construct similar examples e.g. with incremental sort, and so on. Does this explain why I think we can't make the assumption that it's always better to match the pathkeys? FWIW, I think the change of plan for the third query (where we specify "GROUP BY a,b" but the planner decides to just change that) is rather annoying, and it clearly makes things worse :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, May 16, 2020 at 10:56 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > The local effects are trivial - it's for example reordering the pathkeys > to make the explicit sort as cheap as possible. This thread already > discussed a number of things to base this on - ndistinct for columns, > cost of comparison function, ... In any case, this is something we can > decide locally, when building the grouping paths. > > The global effects are much harder to tackle, because the decision can't > be made locally when building the grouping paths. It requires changes > both below and above the point where we build grouping paths. > > An example of a decision we need to make before we even get to building > a grouping path is which index paths to build. Currently we only build > index paths with "useful" pathkeys, and without tweaking that we'll > never even see the index in add_paths_to_grouping_rel(). > > But there are also decisions that can be made only after we build the > grouping paths. For example, we may have both GROUP BY and ORDER BY, and > there is no "always correct" way to combine those. In some cases it may > be correct to use the same pathkeys, in other cases it's better to use > different ones (which will require an extra Sort, with additional cost). > > So I don't think there will be a single "interesting" grouping pathkeys > (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll > need to build grouping paths for all of those, and leave the planner to > eventually pick the one giving us the cheapest plan ... I agree with all of this and I think it's really good analysis. Part of the reason why the planner isn't that sophisticated in this area is that, for a long time, we didn't use paths at this level, and so it was much harder to write any kind of viable patch to consider alternatives. With Tom's planner path-ification word there should be a lot more potential for optimization here, but, as you say, we need to do that by leveraging the existing costing machinery, not just via simple heuristics. It also strikes me that one of the problems in this area is that the statistics we currently gather don't seem to be entirely useful or reliable for aggregate planning. I wonder if there are extensions to the extended statistics mechanism, or even just more things we should gather during a routine ANALYZE, that would enable us to estimate things better here. The most obvious thing is that n_distinct is often wildly inaccurate, but it's deeper than that. For instance, we need some way to estimate how many groups you're going to get when you filter on a and then group by b that doesn't assume uniform distribution. And we need also need something that doesn't just output a number of groups, but gives you some inkling of the sizes of those groups: 1 giant group and a bunch of little ones isn't the same as a bunch of equally sized groups. I don't know that these are things we really have much chance of figuring out with the currently-available information, though. Sorry if this is hijacking the thread a bit; I don't mean to discourage work on this specific patch. I'm just wondering if we need to think a little bigger to see our way to a good solution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 22, 2020 at 11:50:49AM -0400, Robert Haas wrote: >On Sat, May 16, 2020 at 10:56 AM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> The local effects are trivial - it's for example reordering the >> pathkeys to make the explicit sort as cheap as possible. This thread >> already discussed a number of things to base this on - ndistinct for >> columns, cost of comparison function, ... In any case, this is >> something we can decide locally, when building the grouping paths. >> >> The global effects are much harder to tackle, because the decision >> can't be made locally when building the grouping paths. It requires >> changes both below and above the point where we build grouping paths. >> >> An example of a decision we need to make before we even get to >> building a grouping path is which index paths to build. Currently we >> only build index paths with "useful" pathkeys, and without tweaking >> that we'll never even see the index in add_paths_to_grouping_rel(). >> >> But there are also decisions that can be made only after we build the >> grouping paths. For example, we may have both GROUP BY and ORDER BY, >> and there is no "always correct" way to combine those. In some cases >> it may be correct to use the same pathkeys, in other cases it's >> better to use different ones (which will require an extra Sort, with >> additional cost). >> >> So I don't think there will be a single "interesting" grouping >> pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. >> And we'll need to build grouping paths for all of those, and leave >> the planner to eventually pick the one giving us the cheapest plan >> ... > > I agree with all of this and I think it's really good analysis. Part > of the reason why the planner isn't that sophisticated in this area is > that, for a long time, we didn't use paths at this level, and so it > was much harder to write any kind of viable patch to consider > alternatives. With Tom's planner path-ification word there should be > a lot more potential for optimization here, but, as you say, we need > to do that by leveraging the existing costing machinery, not just via > simple heuristics. Agreed. > It also strikes me that one of the problems in this area is that the > statistics we currently gather don't seem to be entirely useful or > reliable for aggregate planning. I don't think that's an immediate issue. I don't think anyone started implementing these optimizations and decided not to do that because of lack of statistics. The reason why we don't have those optimizations is more that we decided not to even consider those optimizations, possibly because of planner limitations. Of course, the optimizations may require additional stats, but that's only step #2. > I wonder if there are extensions to the extended statistics mechanism, > or even just more things we should gather during a routine ANALYZE, > that would enable us to estimate things better here. The most obvious > thing is that n_distinct is often wildly inaccurate, but it's deeper > than that. For instance, we need some way to estimate how many groups > you're going to get when you filter on a and then group by b that > doesn't assume uniform distribution. And we need also need something > that doesn't just output a number of groups, but gives you some inkling > of the sizes of those groups: 1 giant group and a bunch of little ones > isn't the same as a bunch of equally sized groups. I don't know that > these are things we really have much chance of figuring out with the > currently-available information, though. > Not sure. I think the extended stats we have now (ndistinct coeffs, multi-column MCV lists) should be good basis for these decisions, even with skewed data. Of course, we may need to add something else, but I'm not sure what would that be. The main limitation of extended stats is it's at the per-table level. Once you do a join, it's all over - we don't have that capability :-( > Sorry if this is hijacking the thread a bit; I don't mean to > discourage work on this specific patch. I'm just wondering if we need > to think a little bigger to see our way to a good solution. > I think it's fine continuing on this optimization. Either we'll be able to get it done with existing stats, or we'll find what other stats we need to sufficiently good heuristics. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wonder if anyone has plans to try again with this optimization in v14
cycle? The patches no longer apply thanks to the incremental sort patch,
but I suppose fixing that should not be extremely hard.
I found this feature interesting and I'd like to join.
PFA the updated version of the patch made fully compatible with changes in v13 and I suppose it is ready to be reviewed for v14.
Main things changed:
1. Patch is made compatible with new lists representation ( 1cff1b95ab6d )
2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and ba3e76cc571e )
3. Tests are changed to represent changes in keys numbering and naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 )
As always your ideas are very much welcome!
Attachment
> On Mon, Oct 26, 2020 at 12:44:58PM +0400, Pavel Borisov wrote: > > > > > > I wonder if anyone has plans to try again with this optimization in v14 > > cycle? The patches no longer apply thanks to the incremental sort patch, > > but I suppose fixing that should not be extremely hard. > > > > I found this feature interesting and I'd like to join. > PFA the updated version of the patch made fully compatible with changes in > v13 and I suppose it is ready to be reviewed for v14. > > Main things changed: > 1. Patch is made compatible with new lists representation ( 1cff1b95ab6d ) > 2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and > ba3e76cc571e ) > 3. Tests are changed to represent changes in keys numbering and > naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 ) > > As always your ideas are very much welcome! Hi, Thanks for your interest! FYI there is a new thread about this topic [1] with the next version of the patch and more commentaries (I've created it for visibility purposes, but probably it also created some confusion, sorry for that). [1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
Thanks for your interest! FYI there is a new thread about this topic [1]
with the next version of the patch and more commentaries (I've created
it for visibility purposes, but probably it also created some confusion,
sorry for that).
Thanks!
I made a very quick look at your updates and noticed that it is intended to be simple and some parts of the code are removed as they have little test coverage. I'd propose vice versa to increase test coverage to enjoy more precise cost calculation and probably partial grouping.
Or maybe it's worth to benchmark both patches and then re-decide what we want more to have a more complicated or a simpler version.
Good to know that this feature is not stuck anymore and we have more than one proposal.
Thanks!
--
> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote: > > Thanks for your interest! FYI there is a new thread about this topic [1] > > with the next version of the patch and more commentaries (I've created > > it for visibility purposes, but probably it also created some confusion, > > sorry for that). > > > > Thanks! > > I made a very quick look at your updates and noticed that it is intended to > be simple and some parts of the code are removed as they have little test > coverage. I'd propose vice versa to increase test coverage to enjoy more > precise cost calculation and probably partial grouping. > > Or maybe it's worth to benchmark both patches and then re-decide what we > want more to have a more complicated or a simpler version. > > Good to know that this feature is not stuck anymore and we have more than > one proposal. > Thanks! Just to clarify, the patch that I've posted in another thread mentioned above is not an alternative proposal, but a development of the same patch I had posted in this thread. As mentioned in [1], reduce of functionality is an attempt to reduce the scope, and as soon as the base functionality looks good enough it will be returned back. [1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote: >> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote: >> > Thanks for your interest! FYI there is a new thread about this topic [1] >> > with the next version of the patch and more commentaries (I've created >> > it for visibility purposes, but probably it also created some confusion, >> > sorry for that). >> > >> > Thanks! >> >> I made a very quick look at your updates and noticed that it is intended to >> be simple and some parts of the code are removed as they have little test >> coverage. I'd propose vice versa to increase test coverage to enjoy more >> precise cost calculation and probably partial grouping. >> >> Or maybe it's worth to benchmark both patches and then re-decide what we >> want more to have a more complicated or a simpler version. >> >> Good to know that this feature is not stuck anymore and we have more than >> one proposal. >> Thanks! > >Just to clarify, the patch that I've posted in another thread mentioned >above is not an alternative proposal, but a development of the same >patch I had posted in this thread. As mentioned in [1], reduce of >functionality is an attempt to reduce the scope, and as soon as the base >functionality looks good enough it will be returned back. > I find it hard to follow two similar threads trying to do the same (or very similar) things in different ways. Is there any chance to join forces and produce a single patch series merging the changes? With the "basic" functionality at the beginning, then patches with the more complex stuff. That's the usual way, I think. As I said in my response on the other thread [1], I think constructing additional paths with alternative orderings of pathkeys is the right approach. Otherwise we can't really deal with optimizations above the place where we consider this optimization. That's essentially what I was trying in explain May 16 response [2] when I actually said this: So I don't think there will be a single "interesting" grouping pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll need to build grouping paths for all of those, and leave the planner to eventually pick the one giving us the cheapest plan. I wouldn't go as far as saying the approach in this patch (i.e. picking one particular ordering) is doomed, but it's going to be very hard to make it work reliably. Even if we get the costing *at this node* right, who knows how it'll affect costing of the nodes above us? So if I can suggest something, I'd merge the two patches, adopting the path-based approach. With the very basic functionality/costing in the first patch, and the more advanced stuff in additional patches. Does that make sense? regards [1] https://www.postgresql.org/message-id/20200901210743.lutgvnfzntvhuban%40development [2] https://www.postgresql.org/message-id/20200516145609.vm7nrqy7frj4ha6r%40development -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Tue, Oct 27, 2020 at 09:19:51PM +0100, Tomas Vondra wrote: > On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote: > > > On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote: > > > > Thanks for your interest! FYI there is a new thread about this topic [1] > > > > with the next version of the patch and more commentaries (I've created > > > > it for visibility purposes, but probably it also created some confusion, > > > > sorry for that). > > > > > > > > Thanks! > > > > > > I made a very quick look at your updates and noticed that it is intended to > > > be simple and some parts of the code are removed as they have little test > > > coverage. I'd propose vice versa to increase test coverage to enjoy more > > > precise cost calculation and probably partial grouping. > > > > > > Or maybe it's worth to benchmark both patches and then re-decide what we > > > want more to have a more complicated or a simpler version. > > > > > > Good to know that this feature is not stuck anymore and we have more than > > > one proposal. > > > Thanks! > > > > Just to clarify, the patch that I've posted in another thread mentioned > > above is not an alternative proposal, but a development of the same > > patch I had posted in this thread. As mentioned in [1], reduce of > > functionality is an attempt to reduce the scope, and as soon as the base > > functionality looks good enough it will be returned back. > > > > I find it hard to follow two similar threads trying to do the same (or > very similar) things in different ways. Is there any chance to join > forces and produce a single patch series merging the changes? With the > "basic" functionality at the beginning, then patches with the more > complex stuff. That's the usual way, I think. > > As I said in my response on the other thread [1], I think constructing > additional paths with alternative orderings of pathkeys is the right > approach. Otherwise we can't really deal with optimizations above the > place where we consider this optimization. > > That's essentially what I was trying in explain May 16 response [2] > when I actually said this: > > So I don't think there will be a single "interesting" grouping > pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys. > And we'll need to build grouping paths for all of those, and leave > the planner to eventually pick the one giving us the cheapest plan. > > I wouldn't go as far as saying the approach in this patch (i.e. picking > one particular ordering) is doomed, but it's going to be very hard to > make it work reliably. Even if we get the costing *at this node* right, > who knows how it'll affect costing of the nodes above us? > > So if I can suggest something, I'd merge the two patches, adopting the > path-based approach. With the very basic functionality/costing in the > first patch, and the more advanced stuff in additional patches. > > Does that make sense? Yes, and from what I understand it's already what had happened in the newer thread [1]. To avoid any confusion, there are no "two patches" at least from my side, and what I've posted in [1] is the continuation of this work, but with path-based approach adopted and a bit less functionality (essentially I've dropped everything what was not covered with tests in the original patch). In case if I'm missing something and Pavel's proposal is significantly different from the original patch (if I understand correctly, at the moment the latest patch posted here is a rebase and adjusting the old patch to work with the latest changes in master, right?), then indeed they could be merged, but please in the newer thread [1]. [1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
In case if I'm missing something and Pavel's proposal is significantly
different from the original patch (if I understand correctly, at the
moment the latest patch posted here is a rebase and adjusting the old
patch to work with the latest changes in master, right?), then indeed
they could be merged, but please in the newer thread [1].
Sure, my patch has the only difference from the previous Theodor's code for compatibility with v.13, though it is not small, and I appreciate the changes in paths processing. The only thing that caused my notice, is that some useful changes which I've mentioned before, are discarded now. But as long as they are planned to be put in later it is completely fine. I agree to discuss the thing in any thread, though I don't quite understand the reason for a switch.
Still I don't see a problem.
On Wed, Mar 10, 2021 at 4:05 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
I take a look at the patch today. Attached is the v12 and a separate
patch with some comment tweaks and review comments.
1) I see the patch results in some plan changes in postgres_fdw. I
assume it's somehow related to the sort costing changes, but I find it a
bit suspicious. Why should the plan change from this:
Foreign Scan
Remote SQL: ... ORDER BY ... OFFSET 100 LIMIT 10;
to
Limit
Foreign Scan
Remote SQL: ... ORDER BY ...
But even if this is due to changes to order by costing, postponing the
limit seems like a net loss - the sort happens before the limit, and by
postponing the limit after foreign scan we need to transfer 100 more
rows. So what's causing this?
The difference in plan cost seems fairly substantial (old: 196.52, new:
241.94). But maybe that's OK and expected.
2) I wonder how much more expensive (in terms of CPU) is the new sort
costing code. It's iterative, and it's calling functions to calculate
number of groups etc. I wonder what's the impact simple queries?
3) It's not clear to me why we need "fake var" protection? Why would the
estimate_num_groups() fail for that?
4) In one of the places a comment says DEFAULT_NUM_DISTINCT is not used
because we're dealing with multiple columns. That makes sense, but maybe
we should use DEFAULT_NUM_DISTINCT at least to limit the range. That is,
with N columns we should restrict the nGroups estimate by:
min = DEFAULT_NUM_DISTINCT
max = Min(pow(DEFAULT_NUM_DISTINCT, ncolumns), tuples)
Also, it's not clear what's the logic behind the formula:
nGroups = ceil(2.0 + sqrt(tuples) *
list_length(pathkeyExprs) / list_length(pathkeys));
A comment explaining that would be helpful.
5) The compute_cpu_sort_cost comment includes two references (in a quite
mixed-up way), and then there's another reference in the code. I suggest
we list all of them in the function comment.
6) There's a bunch of magic values (various multipliers etc.). It's not
clear which values are meant to be equal, etc. Let's introduce nicer
constants with reasonable names.
7) The comment at get_cheapest_group_keys_order is a bit misleading,
because it talks about "uniqueness" - but that's not what we're looking
at here (I mean, is the column unique or not). We're looking at number
of distinct values in the column, which is a different thing. Also, it'd
be good to roughly explain what get_cheapest_group_keys_order does - how
it calculates the order (permutations up to 4 values, etc.).
8) The comment at compute_cpu_sort_cost should also explain basics of
the algorithm. I tried doing something like that, but I'm not sure I got
all the details right and it probably needs further improvements.
9) The main concern I have is still about the changes in planner.c, and
I think I've already shared it before. The code takes the grouping
pathkeys, and just reshuffles them to what it believes is a better order
(cheaper, ...). That is, there's on input pathkey list, and it gets
transformed into another pathkey list. The problem is that even if this
optimizes the sort, it may work against some optimization in the upper
part of the plan.
Imagine a query that does something like this:
SELECT a, b, count(*) FROM (
SELECT DISTINCT a, b, c, d FROM x
) GROUP BY a, b;
Now, from the costing perspective, it may be cheaper to do the inner
grouping by "c, d, a, b" for example. But that works directly against
the second grouping, which would benefit from having the input sorted by
"a, b". How exactly would this behaves depends on the number of distinct
values in the columns, how expensive the comparisons are for each
column, and so on. But you get the idea.
I admit I haven't managed to construct a reasonably query that'd be
significantly harmed by this, but I haven't spent a lot of time on it.
I'm pretty sure I used this trick in the past, when doing aggregations
on large data sets, where it was much better to "pipe" the data through
multiple GroupAggregate nodes.
For this reason I think the code generating paths should work a bit like
get_useful_pathkeys_for_relation and generate_useful_gather_paths.
* group_keys_reorder_by_pathkeys should not produce just one list of
pathkeys, but multiple interesting options. At the moment it should
produce at least the input grouping pathkeys (as specified in the
query), and the "optimal" pathkeys (by lowest cost).
* the places calling group_keys_reorder_by_pathkeys should loop on the
result, and generate separate path for each option.
I'd guess in the future we'll "peek forward" in the plan and see if
there are other interesting pathkeys (that's the expectation for
get_useful_pathkeys_for_relation).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
The patch does not apply successfully, can you please take a look at that
http://cfbot.cputube.org/patch_33_1651.log
=== applying patch ./0001-v12-20210309.patch
atching file src/include/utils/selfuncs.h
Hunk #1 FAILED at 198.
1 out of 1 hunk FAILED -- saving rejects to file src/include/utils/selfuncs.h.rej
Based on @Tomas Vondra comments, and patch does not apply, I am changing the status to "Waiting On Author".
--
Ibrar Ahmed
Hi, here is an updated version of this patch, with some significant changes. The main change that instead of calling get_cheapest_group_keys_order directly, the planner now calls get_useful_group_keys_orderings and gets multiple "interesting" pathkey orderings instead of just a single one. The callers then loop over these orderings and construct paths for all of them. This idea is similar to get_useful_pathkeys_for_relation() added by incremental sort. FWIW this addresses point (9) from my last review - I started with it, because it was the main thing affecting the overall architecture. The remaining bits are more "local". I haven't investigated how expensive those changes are (in terms of planning overhead), but the number of extra orderings is fairly low, and I'd expect most of the paths to be eliminated fairly quickly. I've also added / improved a number of comments etc. but I'm sure more cleanup is needed. The other comments from the review still apply - I'm particularly concerned about the (1) point, i.e. plan changes in postgres_fdw. Those seem to be rather strange (LIMIT not being pushed down in queries without any grouping). I'd bet this is due to changes in sort costing and does not seem very desirable. regards [1] https://www.postgresql.org/message-id/22c44f98-bfa8-8630-62b5-5155e11eb284%40enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 7/20/21 3:12 PM, Tomas Vondra wrote: > ... > > The other comments from the review still apply - I'm particularly > concerned about the (1) point, i.e. plan changes in postgres_fdw. Those > seem to be rather strange (LIMIT not being pushed down in queries > without any grouping). I'd bet this is due to changes in sort costing > and does not seem very desirable. > I did look at this a bit closer, and yes - this very much seems like a costing issue in the patch. The first query in postgres_fdw that changes switches from a query with LIMIT pushed to remote QUERY PLAN ------------------------------------------------------------------------ Foreign Scan (cost=196.29..196.52 rows=10 width=14) Output: t1.c1, t2.c1, t1.c3 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON ... LIMIT ... OFFSET ... (4 rows) to the LIMIT executed locally QUERY PLAN ------------------------------------------------------------------------- Limit (cost=241.72..241.94 rows=10 width=14) Output: t1.c1, t2.c1, t1.c3 -> Foreign Scan (cost=239.47..261.97 rows=1000 width=14) Output: t1.c1, t2.c1, t1.c3 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN ... (6 rows) The FDW code runs explain on both queries - with and without LIMIT pushed to remote side, to get estimates, and without the patch it gets this: QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=106.97..109.47 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (7 rows) QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=96.29..96.32 rows=10 width=14) -> Sort (cost=96.04..98.54 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (8 rows) while with the patch it gets this: QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=139.47..141.97 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (7 rows) QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=145.75..145.77 rows=10 width=14) -> Sort (cost=145.50..148.00 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (8 rows) Notice that the costs get "inverted" master patched ---------------------------------------------- no limit 106.97..109.47 139.47..141.97 limit 96.29.. 96.32 145.75..145.77 so the limit looks a bit more expensive - just enough so that it seems cheaper to transfer more rows and execute the limit locally. IMO this looks rather suspicious (or wrong), and compute_cpu_sort_cost may need some fixes. I wonder why differs from the old code so much ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, after a bit more time spent on this, I found that the issue is due to this chunk of code in if (heapSort) { if (tuplesPerPrevGroup < output_tuples) /* comparing only inside output_tuples */ correctedNGroups = ceil(2.0 * output_tuples / (tuplesPerPrevGroup / nGroups)); else /* two groups - in output and out */ correctedNGroups = 2.0; } else correctedNGroups = nGroups; if (correctedNGroups <= 1.0) correctedNGroups = 2.0; else correctedNGroups = ceil(correctedNGroups); per_tuple_cost += totalFuncCost * LOG2(correctedNGroups); There's a couple issues, mostly due to differences in handling of cases with different heapSort flag. A full-sort (no LIMIT clause) we have heapSort=false, and hence the execution simply jumps to correctedNGroups = nGroups; while for LIMIT we do heapSort=true, in which case we also start with tuplesPerPrevGroup = ntuples; That is almost certainly greater than output_tuples (=limit+offset), so the first if condition in calculating correctedNGroups can't possibly be true, and we simply end with correctedNGroups = 2.0; in the first loop. Which seems pretty bogus - why would there be just two groups? When processing the first expression, it's as if there was one big "prev group" with all the tuples, so why not to just use nGroups as it is? This seems to confuse the costing quite a bit - enough to produce the "inversed" costs with/without LIMIT, and picking the other plan. I've simplified the costing a bit, and the attached version actually undoes all the "suspicious" plan changes in postgres_fdw. It changes one new plan, but that seems somewhat reasonable, as it pushes sort to the remote side. But after looking at the costing function, I have a bunch of additional comments and questions: 1) I looked at the resources mentioned as sources the formulas came from, but I've been unable to really match the algorithm to them. The quicksort paper is particularly "dense", the notation seems to be very different, and none of the theorems seem like an obvious fit. Would be good to make the relationship clearer in comments etc. For the Sedgewick course it's even worse - it's way too extensive to just point at it and say "We're using ideas from this!" because no one is going to know which chapter/section to look at. We need to be a bit more specific about the reference. 2) I'm a bit puzzled by the "heapsort" chunk, actually. How come we need it now, when we didn't need that before? In a way, the difference in behavior between heasort and non-heapsort is what triggered the plan changes ... FWIW It's quite possible I tweaked the costing incorrectly, but it ends up choosing the right plans purely by accident. 3) I'm getting a bit skeptical about the various magic coefficients that are meant to model higher costs with non-uniform distribution. But consider that we do this, for example: tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups); but then in the next loop we call estimate_num_groups_incremental and pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this may have various strange consequences - we'll calculate the nGroups based on the inflated value, and we'll calculate tuplesPerPrevGroup from that again - that seems susceptible to "amplification". We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher nGroups estimate in the next loop - but not linearly. An then we'll calculate the inflated tuplesPerPrevGroup and estimated nGroup ... That seems pretty dubious, with hard to explain behavior, IMO. If we want to keep applying these coefficients, we need to do that in a way that does not affect the subsequent loop like this - we might tweak the per_tuple_cost formula, for example, not tuplesPerPrevGroup. 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to estimate_num_groups_incremental. In principle yes, if we use "group size" from the previous step, then the returned value is the number of new groups after adding the "new" pathkey. But even if we ignore the issues with amplification mentioned in (3), there's an issue with non-linear behavior in estimate_num_groups, because at some point it's calculating D(N,n,p) = n * (1 - ((N-p)/N)^(N/n)) where N - total rows, p - sample size, n - number of distinct values. And if we have (N1,n1) and (N2,n2) then the ratio of calculated estimated (which is pretty much what calculating group size does) D(N2,n2,p2) / D(N1,n1,p1) which will differ depending on p1 and p2. And if we're tweaking the tuplesPerPrevGroup all the time, that's really annoying, as it may make the groups smaller or larger, which is unpredictable and annoying, and I wonder if it might go against the idea of penalizing tuplesPerPrevGroup to some extent. We could simply use the input "tuples" value here, and then divide the current and previous estimate to calculate the number of new groups. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
I keep work on this patch. Here is intermediate results. On 7/22/21 3:58 AM, Tomas Vondra wrote: > in the first loop. Which seems pretty bogus - why would there be just > two groups? When processing the first expression, it's as if there was > one big "prev group" with all the tuples, so why not to just use nGroups > as it is? I think, heapsort code seems very strange. Look into fallback case. It based on an output_tuples value. Maybe we should use nGroups value here, but based on a number of output_tuples? > 1) I looked at the resources mentioned as sources the formulas came > from, but I've been unable to really match the algorithm to them. The > quicksort paper is particularly "dense", the notation seems to be very > different, and none of the theorems seem like an obvious fit. Would be > good to make the relationship clearer in comments etc. Fixed (See attachment). > 3) I'm getting a bit skeptical about the various magic coefficients that > are meant to model higher costs with non-uniform distribution. But > consider that we do this, for example: > > tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups); > > but then in the next loop we call estimate_num_groups_incremental and > pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this > may have various strange consequences - we'll calculate the nGroups > based on the inflated value, and we'll calculate tuplesPerPrevGroup from > that again - that seems susceptible to "amplification". > > We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher > nGroups estimate in the next loop - but not linearly. An then we'll > calculate the inflated tuplesPerPrevGroup and estimated nGroup ... Weighting coefficient '1.5' shows our desire to minimize the number of comparison operations on each next attribute of a pathkeys list. Increasing this coef we increase a chance, that planner will order pathkeys by decreasing of uniqueness. I think, it's ok. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 7/22/21 3:58 AM, Tomas Vondra wrote: > 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to > estimate_num_groups_incremental. In principle yes, if we use "group > size" from the previous step, then the returned value is the number of > new groups after adding the "new" pathkey. > But even if we ignore the issues with amplification mentioned in (3), > there's an issue with non-linear behavior in estimate_num_groups, > because at some point it's calculating > > D(N,n,p) = n * (1 - ((N-p)/N)^(N/n)) > > where N - total rows, p - sample size, n - number of distinct values. > And if we have (N1,n1) and (N2,n2) then the ratio of calculated > estimated (which is pretty much what calculating group size does) > > D(N2,n2,p2) / D(N1,n1,p1) > > which will differ depending on p1 and p2. And if we're tweaking the > tuplesPerPrevGroup all the time, that's really annoying, as it may make > the groups smaller or larger, which is unpredictable and annoying, and I > wonder if it might go against the idea of penalizing tuplesPerPrevGroup > to some extent. > We could simply use the input "tuples" value here, and then divide the > current and previous estimate to calculate the number of new groups. tuplesPerPrevGroup is only a top boundary for the estimation. I think, we should use result of previous estimation as a limit for the next during incremental estimation. Maybe we simply limit the tuplesPerPrevGroup value to ensure of monotonic non-growth of this value? - see in attachment patch to previous fixes. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 1/21/22 12:09, Andrey Lepikhov wrote: > On 7/22/21 3:58 AM, Tomas Vondra wrote: >> 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to >> estimate_num_groups_incremental. In principle yes, if we use "group >> size" from the previous step, then the returned value is the number of >> new groups after adding the "new" pathkey. >> But even if we ignore the issues with amplification mentioned in (3), >> there's an issue with non-linear behavior in estimate_num_groups, >> because at some point it's calculating >> >> D(N,n,p) = n * (1 - ((N-p)/N)^(N/n)) >> >> where N - total rows, p - sample size, n - number of distinct values. >> And if we have (N1,n1) and (N2,n2) then the ratio of calculated >> estimated (which is pretty much what calculating group size does) >> >> D(N2,n2,p2) / D(N1,n1,p1) >> >> which will differ depending on p1 and p2. And if we're tweaking the >> tuplesPerPrevGroup all the time, that's really annoying, as it may make >> the groups smaller or larger, which is unpredictable and annoying, and I >> wonder if it might go against the idea of penalizing tuplesPerPrevGroup >> to some extent. >> We could simply use the input "tuples" value here, and then divide the >> current and previous estimate to calculate the number of new groups. > > tuplesPerPrevGroup is only a top boundary for the estimation. I think, > we should use result of previous estimation as a limit for the next > during incremental estimation. Maybe we simply limit the > tuplesPerPrevGroup value to ensure of monotonic non-growth of this > value? - see in attachment patch to previous fixes. > Yes, I think that's a reasonable defense - it makes no sense to exceed the group size in the preceding step, which could happen with small number of groups or some unexpected ndistinct estimate. The other thing we could do is reduce the coefficient gradually - so it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so on. But it seems somewhat arbitrary (I certainly don't have some sound theoretical justification ...). I've merged most of the fixes you reported. I've skipped the path_save removal in planner.c, because that seems incorrect - if there are multiple pathkeys, we must start with the original path, not the modified one we built in the last iteration. Or am I missing something? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 22/1/2022 01:34, Tomas Vondra wrote: > The other thing we could do is reduce the coefficient gradually - so > it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so > on. But it seems somewhat arbitrary (I certainly don't have some sound > theoretical justification ...). I think, it hasn't a reason to increase complexity without any theory at the bottom. Simple solution allows to realize problems much faster, if they arise. > ... I've skipped the path_save > removal in planner.c, because that seems incorrect - if there are > multiple pathkeys, we must start with the original path, not the > modified one we built in the last iteration. Or am I missing something You are right, I misunderstood the idea of path_save variable. -- regards, Andrey Lepikhov Postgres Professional
> On 7/22/21 3:58 AM, Tomas Vondra wrote: > I've simplified the costing a bit, and the attached version actually > undoes all the "suspicious" plan changes in postgres_fdw. It changes one > new plan, but that seems somewhat reasonable, as it pushes sort to the > remote side. I tried to justify heap-sort part of the compute_cpu_sort_cost() routine and realized, that here we may have a mistake. After a week of efforts, I don't found any research papers on dependence of bounded heap-sort time compexity on number of duplicates. So, I suppose self-made formula, based on simple logical constructions: 1. We should base on initial formula: cost ~ N*LOG2(M), where M - output_tuples. 2. Realize, that full representation of this formula is: cost ~ N*LOG2(min{N,M}) 3. In the case of multicolumn, number of comparisons for each next column can be estimated by the same formula, but arranged to a number of tuples per group: comparisons ~ input * LOG2(min{input,M}) 4. Realize, that for the case, when M > input, we should change this formula a bit: comparisons ~ max{input,M} * LOG2(min{input,M}) Remember, that in our case M << tuples. So, general formula for bounded heap sort can be written as: cost ~ N * sum(max{n_i,M}/N * LOG2(min{n_i,M})), i=1,ncols where n_1 == N, n_i - number of tuples per group, estimated from previous iteration. In attachment - an implementation of this approach. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 1/22/22 01:34, Tomas Vondra wrote: > I rebased (with minor fixes) this patch onto current master. Also, second patch dedicated to a problem of "varno 0" (fake_var). I think, this case should make the same estimations as in the case of varno != 0, but no any stats found. So I suppose to restrict number of groups with min of a number of incoming tuples and DEFAULT_NUM_DISTINCT value. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 2/10/22 10:00, Andrey V. Lepikhov wrote: > On 1/22/22 01:34, Tomas Vondra wrote: >> > > I rebased (with minor fixes) this patch onto current master. > > Also, second patch dedicated to a problem of "varno 0" (fake_var). > I think, this case should make the same estimations as in the case of > varno != 0, but no any stats found. So I suppose to restrict number of > groups with min of a number of incoming tuples and DEFAULT_NUM_DISTINCT > value. > Thanks for the rebase. The two proposed changes (tweaked costing and simplified fake_var handling) seem fine to me. I think the last thing that needs to be done is cleanup of the debug GUCs, which I added to allow easier experimentation with the patch. I probably won't remove the GUCs entirely, though. I plan to add a single GUC that would enable/disable this optimization. I'm not a huge fan of adding more and more GUCs, but in this case it's probably the right thing to do given the complexity of estimating cost with correlated columns etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 3/15/22 13:26, Tomas Vondra wrote: > Thanks for the rebase. The two proposed changes (tweaked costing and > simplified fake_var handling) seem fine to me. I think the last thing > that needs to be done is cleanup of the debug GUCs, which I added to > allow easier experimentation with the patch. Thanks, I'm waiting for the last step. > > I probably won't remove the GUCs entirely, though. I plan to add a > single GUC that would enable/disable this optimization. I'm not a huge > fan of adding more and more GUCs, but in this case it's probably the > right thing to do given the complexity of estimating cost with > correlated columns etc. Agree. Because it is a kind of automation we should allow user to switch it off in the case of problems or manual tuning. Also, I looked through this patch. It has some minor problems: 1. Multiple typos in the patch comment. 2. The term 'cardinality of a key' - may be replace with 'number of duplicates'? -- regards, Andrey Lepikhov Postgres Professional
Hi, Here's a rebased/improved version of the patch, with smaller parts addressing various issues. There are seven parts: 0001 - main part, just rebased 0002 - replace the debug GUC options with a single GUC to disable the optimization if needed 0003 - minor code cleanup, removal of unnecessary variable 0004 - various comment fixes (rewordings, typos, ...) 0005 - a minor code simplification, addressing FIXMEs from 0004 0006 - adds the new GUC to the docs 0007 - demonstrates plan changes with a disabled optimization The first 6 parts should be squashed and committed at one, I only kept them separate for clarity. The 0007 is merely a demonstration of the new GUC and that it disables the optimization. > Agree. Because it is a kind of automation we should allow user to switch > it off in the case of problems or manual tuning. > > Also, I looked through this patch. It has some minor problems: > 1. Multiple typos in the patch comment. I went through the comments and checked all of them for grammar mistakes and typos using a word processor, so hopefully that should be OK. But maybe there's still something wrong. > 2. The term 'cardinality of a key' - may be replace with 'number of > duplicates'? No, cardinality means "number of distinct values", so "duplicates" would be wrong. And I think "cardinality" is well established term, so I think it's fine. BTW I named the GUC enable_group_by_reordering, I wonder if it should be named differently, e.g. enable_groupby_reordering? Opinions? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
- 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch
- 0002-fixup-rework-GUC-options-20220328.patch
- 0003-fixup-minor-code-cleanup-20220328.patch
- 0004-fixup-comments-cleanup-type-fixes-and-rewor-20220328.patch
- 0005-fixup-minor-simplification-20220328.patch
- 0006-fixup-add-GUC-to-docs-20220328.patch
- 0007-demonstrate-disabled-optimization-20220328.patch
On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
Here's a rebased/improved version of the patch, with smaller parts
addressing various issues. There are seven parts:
0001 - main part, just rebased
0002 - replace the debug GUC options with a single GUC to disable the
optimization if needed
0003 - minor code cleanup, removal of unnecessary variable
0004 - various comment fixes (rewordings, typos, ...)
0005 - a minor code simplification, addressing FIXMEs from 0004
0006 - adds the new GUC to the docs
0007 - demonstrates plan changes with a disabled optimization
The first 6 parts should be squashed and committed at one, I only kept
them separate for clarity. The 0007 is merely a demonstration of the new
GUC and that it disables the optimization.
> Agree. Because it is a kind of automation we should allow user to switch
> it off in the case of problems or manual tuning.
> > Also, I looked through this patch. It has some minor problems:
> 1. Multiple typos in the patch comment.
I went through the comments and checked all of them for grammar mistakes
and typos using a word processor, so hopefully that should be OK. But
maybe there's still something wrong.
> 2. The term 'cardinality of a key' - may be replace with 'number of
> duplicates'?
No, cardinality means "number of distinct values", so "duplicates" would
be wrong. And I think "cardinality" is well established term, so I think
it's fine.
BTW I named the GUC enable_group_by_reordering, I wonder if it should be
named differently, e.g. enable_groupby_reordering? Opinions?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:
multiple parametes need to be
parametes -> parameters
leave more expensive comparions
comparions -> comparisons
+ if (has_fake_var == false)
The above can be written as:
if (!has_fake_var)
+ nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / list_length(pathkeys));
Looks like the value of tuples doesn't change inside the loop.
You can precompute sqrt(tuples) outside the loop and store the value in a variable.
+ return -1;
+ else if (a->cost == b->cost)
+ return 0;
+ return 1;
+ else if (a->cost == b->cost)
+ return 0;
+ return 1;
the keyword 'else' is not needed.
+ * Returns newly allocated lists. If no reordering is possible (or needed),
+ * the lists are set to NIL.
+ */
+static bool
+ * the lists are set to NIL.
+ */
+static bool
+get_cheapest_group_keys_order(PlannerInfo *root, double nrows,
It seems the comment for return value doesn't match the bool return type.
+ /* If this optimization is disabled, we're done. */
+ if (!debug_cheapest_group_by)
+ if (!debug_cheapest_group_by)
It seems enable_cheapest_group_by would be better name for the flag.
Cheers
On 3/29/22 04:02, Zhihong Yu wrote: > > > On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra > <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> > wrote: > > Hi, > > Here's a rebased/improved version of the patch, with smaller parts > addressing various issues. There are seven parts: > > 0001 - main part, just rebased > > 0002 - replace the debug GUC options with a single GUC to disable the > optimization if needed > > 0003 - minor code cleanup, removal of unnecessary variable > > 0004 - various comment fixes (rewordings, typos, ...) > > 0005 - a minor code simplification, addressing FIXMEs from 0004 > > 0006 - adds the new GUC to the docs > > 0007 - demonstrates plan changes with a disabled optimization > > The first 6 parts should be squashed and committed at one, I only kept > them separate for clarity. The 0007 is merely a demonstration of the new > GUC and that it disables the optimization. > > > Agree. Because it is a kind of automation we should allow user to > switch > > it off in the case of problems or manual tuning. > > > Also, I looked through this patch. It has some minor problems: > > 1. Multiple typos in the patch comment. > > I went through the comments and checked all of them for grammar mistakes > and typos using a word processor, so hopefully that should be OK. But > maybe there's still something wrong. > > > 2. The term 'cardinality of a key' - may be replace with 'number of > > duplicates'? > > No, cardinality means "number of distinct values", so "duplicates" would > be wrong. And I think "cardinality" is well established term, so I think > it's fine. > > BTW I named the GUC enable_group_by_reordering, I wonder if it should be > named differently, e.g. enable_groupby_reordering? Opinions? > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com> > The Enterprise PostgreSQL Company > > Hi, > > For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch: > > multiple parametes need to be > > parametes -> parameters > > leave more expensive comparions > > comparions -> comparisons > > + if (has_fake_var == false) > > The above can be written as: > > if (!has_fake_var) > All of this was already fixed in one of the subsequent "fixup" patches. Attached is a patch merging 0001-0006, which is what I proposed to get committed. > + nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / > list_length(pathkeys)); > > Looks like the value of tuples doesn't change inside the loop. > You can precompute sqrt(tuples) outside the loop and store the value in > a variable. > IMHO it makes the formula harder to read, and the effect is not going to be measurable - we're processing only a couple elements. If the loop was ~100 iterations or more, maybe it'd have impact. > + return -1; > + else if (a->cost == b->cost) > + return 0; > + return 1; > > the keyword 'else' is not needed. > True, but this is how comparators are usually implemented. > + * Returns newly allocated lists. If no reordering is possible (or needed), > + * the lists are set to NIL. > + */ > +static bool > +get_cheapest_group_keys_order(PlannerInfo *root, double nrows, > > It seems the comment for return value doesn't match the bool return type. > Yup, this is a valid point. I've fixed/reworded the comment a bit. > + /* If this optimization is disabled, we're done. */ > + if (!debug_cheapest_group_by) > > It seems enable_cheapest_group_by would be better name for the flag. > This was renamed to enable_order_by_reordering in one of the fixup patches. Attached is a patch merging 0001 and all the fixup patches, i.e. the patch I plan to commit. There was a minor test failure - the new GUC was not added to the sample config file, so 003_check_guc.pl was failing. I'm not including the 0007 part, because that's meant to demonstrate plan changes with disabled optimization, which would confuse cfbot. regards regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Pushed, after going through the patch once more, running check-world under valgrind, and updating the commit message. Thanks to everyone who reviewed/tested this patch! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Pushed, after going through the patch once more, running check-world > under valgrind, and updating the commit message. I'm just in this general area of the code again today and wondered about the header comment for the preprocess_groupclause() function. It says: * In principle it might be interesting to consider other orderings of the * GROUP BY elements, which could match the sort ordering of other * possible plans (eg an indexscan) and thereby reduce cost. We don't * bother with that, though. Hashed grouping will frequently win anyway. I'd say this commit makes that paragraph mostly obsolete. It's only true now in the sense that we don't try orders that suit some index that would provide pre-sorted results for a GroupAggregate path. The comment leads me to believe that we don't do anything at all to find a better order, and that's not true now. David
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Pushed, after going through the patch once more, running check-world > under valgrind, and updating the commit message. I'm still working in this area and I noticed that db0d67db2 updated some regression tests in partition_aggregate.out without any care as to what the test was testing. The comment above the test reads: -- Without ORDER BY clause, to test Gather at top-most path and you've changed the expected plan from being a parallel plan with a Gather to being a serial plan. So it looks like the test might have become useless. I see that the original plan appears to come back with some adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a bit strange to me that the changes with this patch would cause a change of plan for this. There is only 1 GROUP BY column in the query in question. There's no rearrangement to do with a single column GROUP BY. David
On 7/15/22 07:18, David Rowley wrote: > On Thu, 31 Mar 2022 at 12:19, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Pushed, after going through the patch once more, running check-world >> under valgrind, and updating the commit message. > > I'm still working in this area and I noticed that db0d67db2 updated > some regression tests in partition_aggregate.out without any care as > to what the test was testing. > > The comment above the test reads: > > -- Without ORDER BY clause, to test Gather at top-most path > > and you've changed the expected plan from being a parallel plan with a > Gather to being a serial plan. So it looks like the test might have > become useless. > I agree this is a mistake in db0d67db2 that makes the test useless. > I see that the original plan appears to come back with some > adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a > bit strange to me that the changes with this patch would cause a > change of plan for this. There is only 1 GROUP BY column in the query > in question. There's no rearrangement to do with a single column GROUP > BY. > It might seem a bit strange, but the patch tweaked the costing a bit, so it's not entirely unexpected. I'd bet the plan cost changed just a teeny bit, but enough to change the cheapest plan. The costing changed for all group counts, including a single group. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: > I agree this is a mistake in db0d67db2 that makes the test useless. Tomas, please note that this is an open item assigned to you. Are you planning to do something with these regression tests by beta3? -- Michael
Attachment
On Tue, 2 Aug 2022 at 22:21, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: > > I agree this is a mistake in db0d67db2 that makes the test useless. > > Tomas, please note that this is an open item assigned to you. Are you > planning to do something with these regression tests by beta3? There's still something to do there for PG15, but 1349d2790 (just gone in to master) made those two tests run as a parallel query again. David
On 8/2/22 13:14, David Rowley wrote: > On Tue, 2 Aug 2022 at 22:21, Michael Paquier <michael@paquier.xyz> wrote: >> >> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: >>> I agree this is a mistake in db0d67db2 that makes the test useless. >> >> Tomas, please note that this is an open item assigned to you. Are you >> planning to do something with these regression tests by beta3? > > There's still something to do there for PG15, but 1349d2790 (just gone > in to master) made those two tests run as a parallel query again. > Hi, I've been looking at this, and as I speculated before it's due to the sort costing changing a little bit. On PG14, the costing of the plans looks like this: Gather (cost=1869.39..2823.15 rows=8 width=52) and with disabled parallelism, it's like this: Append (cost=998.04..3000.64 rows=10 width=52) so there's a (fairly small) diffrence in favor of the parallel plan. But on PG15 it's the other way around - the selected non-parallel one is costed like this: Append (cost=779.41..2490.61 rows=10 width=52) and by setting parallel_setup_cost=0 you get this: Gather (cost=700.34..1531.76 rows=8 width=52) So with the setup cost included it's ~2531, and it loses to the simple plan. This happens because the patch changed sort costing - the same sort on PG14 and PG15 looks like this: PG14: -> Sort (cost=998.04..1028.04 rows=12000 width=13) PG15: -> Sort (cost=779.41..809.41 rows=12000 width=13) As mentioned, the commit tweaked sort costing - before it was pretty much just comparison_cost * tuples * LOG2(tuples) but the patch needs to cost different pathkey orderings, and consider that maybe we don't need to compare all the keys (which the original costing kind assumes). That's the whole point of this optimization. The costing (compute_cpu_sort_cost) considers a couple other things, like cost of the comparator function for the data type, width of the values, groupings determined by preceding keys, and so on. It might seem strange that a query with a single pathkey changes, but that single pathkey is costed the same way, of course. In principle we might have a special costing for this case, but I'd bet that would result in pretty surprising inconsistencies when adding a sort key (going from 1 to 2 keys). So I don't think the current costing is wrong, but it certainly is more complex. But the test does not test what it intended - I have two ideas how to make it work: 1) increase the number of rows in the table 2) increase cpu_operator_cost (for that one test?) 3) tweak the costing somehow, to increase the cost a bit Both (1) and (2) work - I've tried doubling the number of rows or setting the operator cost to 0.005, and that does the trick, but maybe a smaller change would be enough. I don't like (3) very much - changing the costing just to get the same test behavior as on older release does not seem very principled. Yes, maybe it should be tweaked, but not because of a regression test. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, 18 Aug 2022 at 02:46, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > So I don't think the current costing is wrong, but it certainly is more > complex. But the test does not test what it intended - I have two ideas > how to make it work: > > 1) increase the number of rows in the table > > 2) increase cpu_operator_cost (for that one test?) > > 3) tweak the costing somehow, to increase the cost a bit Why not, 4) SET parallel_setup_cost = 0; there are plenty of other places we do just that so we get a parallel plan without having to generate enough cost to drown out the parallel worker startup cost. Here are a couple of patches to demo the idea. David
Attachment
On 8/18/22 03:32, David Rowley wrote: > On Thu, 18 Aug 2022 at 02:46, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> So I don't think the current costing is wrong, but it certainly is more >> complex. But the test does not test what it intended - I have two ideas >> how to make it work: >> >> 1) increase the number of rows in the table >> >> 2) increase cpu_operator_cost (for that one test?) >> >> 3) tweak the costing somehow, to increase the cost a bit > > Why not, 4) SET parallel_setup_cost = 0; there are plenty of other > places we do just that so we get a parallel plan without having to > generate enough cost to drown out the parallel worker startup cost. > > Here are a couple of patches to demo the idea. > Yeah, that's an option too. I should have mentioned it along with the cpu_operator_cost. BTW would you mind taking a look at the costing? I think it's fine, but it would be good if someone not involved in the patch takes a look. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/18/22 3:29 AM, Tomas Vondra wrote: > > > On 8/18/22 03:32, David Rowley wrote: >> Here are a couple of patches to demo the idea. >> > > Yeah, that's an option too. I should have mentioned it along with the > cpu_operator_cost. > > BTW would you mind taking a look at the costing? I think it's fine, but > it would be good if someone not involved in the patch takes a look. With RMT hat on, just a friendly reminder that this is still on the open items list[1] -- we have the Beta 4 release next week and it would be great if we can get this resolved prior to it. Thanks! Jonathan [1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items [2] https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6c69@postgresql.org
Attachment
On 8/30/22 15:15, Jonathan S. Katz wrote: > On 8/18/22 3:29 AM, Tomas Vondra wrote: >> >> >> On 8/18/22 03:32, David Rowley wrote: > >>> Here are a couple of patches to demo the idea. >>> >> >> Yeah, that's an option too. I should have mentioned it along with the >> cpu_operator_cost. >> >> BTW would you mind taking a look at the costing? I think it's fine, but >> it would be good if someone not involved in the patch takes a look. > > With RMT hat on, just a friendly reminder that this is still on the open > items list[1] -- we have the Beta 4 release next week and it would be > great if we can get this resolved prior to it. > I know. I'll fix this by tweaking one of the cost gucs, mentioned by David. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9/1/22 9:06 AM, Tomas Vondra wrote: > > > On 8/30/22 15:15, Jonathan S. Katz wrote: >> On 8/18/22 3:29 AM, Tomas Vondra wrote: >>> >>> >>> On 8/18/22 03:32, David Rowley wrote: >> >>>> Here are a couple of patches to demo the idea. >>>> >>> >>> Yeah, that's an option too. I should have mentioned it along with the >>> cpu_operator_cost. >>> >>> BTW would you mind taking a look at the costing? I think it's fine, but >>> it would be good if someone not involved in the patch takes a look. >> >> With RMT hat on, just a friendly reminder that this is still on the open >> items list[1] -- we have the Beta 4 release next week and it would be >> great if we can get this resolved prior to it. >> > > I know. I'll fix this by tweaking one of the cost gucs, mentioned by David. Great -- thanks for the update! Jonathan
Attachment
On 9/1/22 16:05, Jonathan S. Katz wrote: > On 9/1/22 9:06 AM, Tomas Vondra wrote: >> >> >> On 8/30/22 15:15, Jonathan S. Katz wrote: >>> On 8/18/22 3:29 AM, Tomas Vondra wrote: >>>> >>>> >>>> On 8/18/22 03:32, David Rowley wrote: >>> >>>>> Here are a couple of patches to demo the idea. >>>>> >>>> >>>> Yeah, that's an option too. I should have mentioned it along with the >>>> cpu_operator_cost. >>>> >>>> BTW would you mind taking a look at the costing? I think it's fine, but >>>> it would be good if someone not involved in the patch takes a look. >>> >>> With RMT hat on, just a friendly reminder that this is still on the open >>> items list[1] -- we have the Beta 4 release next week and it would be >>> great if we can get this resolved prior to it. >>> >> >> I know. I'll fix this by tweaking one of the cost gucs, mentioned by >> David. > > Great -- thanks for the update! > I've pushed the fix to 15+master. In the end I just used David's patches that set parallel_setup_cost to 0. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9/4/22 6:14 PM, Tomas Vondra wrote: > I've pushed the fix to 15+master. In the end I just used David's patches > that set parallel_setup_cost to 0. Thanks! I have closed the open item. Jonathan
Attachment
On Mon, Sep 05, 2022 at 12:14:48AM +0200, Tomas Vondra wrote: > I've pushed the fix to 15+master. In the end I just used David's patches > that set parallel_setup_cost to 0. Thanks, Tomas! -- Michael
Attachment
On Wed, 13 Jul 2022 at 15:37, David Rowley <dgrowleyml@gmail.com> wrote: > I'm just in this general area of the code again today and wondered > about the header comment for the preprocess_groupclause() function. > > It says: > > * In principle it might be interesting to consider other orderings of the > * GROUP BY elements, which could match the sort ordering of other > * possible plans (eg an indexscan) and thereby reduce cost. We don't > * bother with that, though. Hashed grouping will frequently win anyway. > > I'd say this commit makes that paragraph mostly obsolete. It's only > true now in the sense that we don't try orders that suit some index > that would provide pre-sorted results for a GroupAggregate path. The > comment leads me to believe that we don't do anything at all to find a > better order, and that's not true now. I've just pushed a fix for this. David
On 3/10/2022 21:56, Tom Lane wrote: > Revert "Optimize order of GROUP BY keys". > > This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and > several follow-on fixes. > ... > Since we're hard up against the release deadline for v15, let's > revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 7/20/23 08:37, Andrey Lepikhov wrote: > On 3/10/2022 21:56, Tom Lane wrote: >> Revert "Optimize order of GROUP BY keys". >> >> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >> several follow-on fixes. >> ... >> Since we're hard up against the release deadline for v15, let's >> revert these changes for now. We can always try again later. > > It may be time to restart the project. As a first step, I rebased the > patch on the current master. It wasn't trivial because of some latest > optimizations (a29eab, 1349d27 and 8d83a5d). > Now, Let's repeat the review and rewrite the current path according to > the reasons uttered in the revert commit. I think the fundamental task is to make the costing more reliable, and the commit message 443df6e2db points out a couple challenges in this area. Not sure how feasible it is to address enough of them ... 1) procost = 1.0 - I guess we could make this more realistic by doing some microbenchmarks and tuning the costs for the most expensive cases. 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20/7/2023 18:46, Tomas Vondra wrote: > On 7/20/23 08:37, Andrey Lepikhov wrote: >> On 3/10/2022 21:56, Tom Lane wrote: >>> Revert "Optimize order of GROUP BY keys". >>> >>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>> several follow-on fixes. >>> ... >>> Since we're hard up against the release deadline for v15, let's >>> revert these changes for now. We can always try again later. >> >> It may be time to restart the project. As a first step, I rebased the >> patch on the current master. It wasn't trivial because of some latest >> optimizations (a29eab, 1349d27 and 8d83a5d). >> Now, Let's repeat the review and rewrite the current path according to >> the reasons uttered in the revert commit. > > I think the fundamental task is to make the costing more reliable, and > the commit message 443df6e2db points out a couple challenges in this > area. Not sure how feasible it is to address enough of them ... > > 1) procost = 1.0 - I guess we could make this more realistic by doing > some microbenchmarks and tuning the costs for the most expensive cases. > > 2) estimating quicksort comparisons - This relies on ndistinct > estimates, and I'm not sure how much more reliable we can make those. > Probably not much :-( Not sure what to do about this, the only thing I > can think of is to track "reliability" of the estimates and only do the > reordering if we have high confidence in the estimates. That means we'll > miss some optimization opportunities, but it should limit the risk. For me personally, the most challenging issue is: 3. Imbalance, induced by the cost_sort() changes. It may increase or decrease the contribution of sorting to the total cost compared to other factors and change choice of sorted paths. -- regards, Andrey Lepikhov Postgres Professional
On 20/7/2023 18:46, Tomas Vondra wrote: > On 7/20/23 08:37, Andrey Lepikhov wrote: >> On 3/10/2022 21:56, Tom Lane wrote: >>> Revert "Optimize order of GROUP BY keys". >>> >>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>> several follow-on fixes. >>> ... >>> Since we're hard up against the release deadline for v15, let's >>> revert these changes for now. We can always try again later. >> >> It may be time to restart the project. As a first step, I rebased the >> patch on the current master. It wasn't trivial because of some latest >> optimizations (a29eab, 1349d27 and 8d83a5d). >> Now, Let's repeat the review and rewrite the current path according to >> the reasons uttered in the revert commit. > > I think the fundamental task is to make the costing more reliable, and > the commit message 443df6e2db points out a couple challenges in this > area. Not sure how feasible it is to address enough of them ... > > 1) procost = 1.0 - I guess we could make this more realistic by doing > some microbenchmarks and tuning the costs for the most expensive cases. > > 2) estimating quicksort comparisons - This relies on ndistinct > estimates, and I'm not sure how much more reliable we can make those. > Probably not much :-( Not sure what to do about this, the only thing I > can think of is to track "reliability" of the estimates and only do the > reordering if we have high confidence in the estimates. That means we'll > miss some optimization opportunities, but it should limit the risk. I read up on the history of this thread. As I see, all the problems mentioned above can be beaten by excluding the new cost model at all. We can sort GROUP BY columns according to the 'ndistinct' value. I see the reason for introducing the cost model in [1]. The main supporting point here is that with this patch, people couldn't optimize the query by themselves, organizing the order of the columns in a more optimal way. But now we have at least the GUC to switch off the behaviour introduced here. Also, some extensions, like the well-known pg_hint_plan, can help with automation. So, how about committing of the undoubted part of the feature and working on the cost model in a new thread? [1] https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com -- regards, Andrey Lepikhov Postgres Professional
On 7/24/23 04:10, Andrey Lepikhov wrote: > On 20/7/2023 18:46, Tomas Vondra wrote: >> On 7/20/23 08:37, Andrey Lepikhov wrote: >>> On 3/10/2022 21:56, Tom Lane wrote: >>>> Revert "Optimize order of GROUP BY keys". >>>> >>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>>> several follow-on fixes. >>>> ... >>>> Since we're hard up against the release deadline for v15, let's >>>> revert these changes for now. We can always try again later. >>> >>> It may be time to restart the project. As a first step, I rebased the >>> patch on the current master. It wasn't trivial because of some latest >>> optimizations (a29eab, 1349d27 and 8d83a5d). >>> Now, Let's repeat the review and rewrite the current path according to >>> the reasons uttered in the revert commit. >> >> I think the fundamental task is to make the costing more reliable, and >> the commit message 443df6e2db points out a couple challenges in this >> area. Not sure how feasible it is to address enough of them ... >> >> 1) procost = 1.0 - I guess we could make this more realistic by doing >> some microbenchmarks and tuning the costs for the most expensive cases. >> >> 2) estimating quicksort comparisons - This relies on ndistinct >> estimates, and I'm not sure how much more reliable we can make those. >> Probably not much :-( Not sure what to do about this, the only thing I >> can think of is to track "reliability" of the estimates and only do the >> reordering if we have high confidence in the estimates. That means we'll >> miss some optimization opportunities, but it should limit the risk. > I read up on the history of this thread. > As I see, all the problems mentioned above can be beaten by excluding > the new cost model at all. We can sort GROUP BY columns according to the > 'ndistinct' value. > I see the reason for introducing the cost model in [1]. The main > supporting point here is that with this patch, people couldn't optimize > the query by themselves, organizing the order of the columns in a more > optimal way. But now we have at least the GUC to switch off the > behaviour introduced here. Also, some extensions, like the well-known > pg_hint_plan, can help with automation. I think the main concern is that if we reorder the group keys and get it wrong, it's a regression. If that happens often (due to costing based on poor stats), it's a problem. Yes, there's a GUC, but that's a rather blunt instrument, unfortunately. > So, how about committing of the undoubted part of the feature and > working on the cost model in a new thread? > But Tom's commit message says this: Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. so I'm not sure this really counts as "undoubted". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24/7/2023 16:56, Tomas Vondra wrote: > On 7/24/23 04:10, Andrey Lepikhov wrote: >> On 20/7/2023 18:46, Tomas Vondra wrote: >>> On 7/20/23 08:37, Andrey Lepikhov wrote: >>>> On 3/10/2022 21:56, Tom Lane wrote: >>>>> Revert "Optimize order of GROUP BY keys". >>>>> >>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>>>> several follow-on fixes. >>>>> ... >>>>> Since we're hard up against the release deadline for v15, let's >>>>> revert these changes for now. We can always try again later. >>>> >>>> It may be time to restart the project. As a first step, I rebased the >>>> patch on the current master. It wasn't trivial because of some latest >>>> optimizations (a29eab, 1349d27 and 8d83a5d). >>>> Now, Let's repeat the review and rewrite the current path according to >>>> the reasons uttered in the revert commit. >>> >>> I think the fundamental task is to make the costing more reliable, and >>> the commit message 443df6e2db points out a couple challenges in this >>> area. Not sure how feasible it is to address enough of them ... >>> >>> 1) procost = 1.0 - I guess we could make this more realistic by doing >>> some microbenchmarks and tuning the costs for the most expensive cases. >>> >>> 2) estimating quicksort comparisons - This relies on ndistinct >>> estimates, and I'm not sure how much more reliable we can make those. >>> Probably not much :-( Not sure what to do about this, the only thing I >>> can think of is to track "reliability" of the estimates and only do the >>> reordering if we have high confidence in the estimates. That means we'll >>> miss some optimization opportunities, but it should limit the risk. >> I read up on the history of this thread. >> As I see, all the problems mentioned above can be beaten by excluding >> the new cost model at all. We can sort GROUP BY columns according to the >> 'ndistinct' value. >> I see the reason for introducing the cost model in [1]. The main >> supporting point here is that with this patch, people couldn't optimize >> the query by themselves, organizing the order of the columns in a more >> optimal way. But now we have at least the GUC to switch off the >> behaviour introduced here. Also, some extensions, like the well-known >> pg_hint_plan, can help with automation. > > I think the main concern is that if we reorder the group keys and get it > wrong, it's a regression. If that happens often (due to costing based on > poor stats), it's a problem. Yes, there's a GUC, but that's a rather > blunt instrument, unfortunately. I see. My point here is if the ndistinct of one column is much more than the ndistinct of another, it is more probable that this correlation will be kept in the grouping phase. Here we can get some regression, which can be overweighed by the possibility below. > >> So, how about committing of the undoubted part of the feature and >> working on the cost model in a new thread? >> > > But Tom's commit message says this: > > Worse, to arrive at estimates of the number of calls made to the > lower-order-column comparison functions, the code needs to make > estimates of the numbers of distinct values of multiple columns, > which are necessarily even less trustworthy than per-column stats. > > so I'm not sure this really counts as "undoubted". Don't try to estimate multiple columns - just sort columns according to the value of ndistinct as a heuristic. I think we should estimate the number of values of multiple columns only if we have extended statistics on these columns. And this can extend the applicability of extended statistics. The suggestion above is just an attempt to gather low-hanging fruit right now. If it is not applicable, we will go a long way. -- regards, Andrey Lepikhov Postgres Professional
On 7/24/23 14:04, Andrey Lepikhov wrote: > On 24/7/2023 16:56, Tomas Vondra wrote: >> On 7/24/23 04:10, Andrey Lepikhov wrote: >>> On 20/7/2023 18:46, Tomas Vondra wrote: >>>> On 7/20/23 08:37, Andrey Lepikhov wrote: >>>>> On 3/10/2022 21:56, Tom Lane wrote: >>>>>> Revert "Optimize order of GROUP BY keys". >>>>>> >>>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>>>>> several follow-on fixes. >>>>>> ... >>>>>> Since we're hard up against the release deadline for v15, let's >>>>>> revert these changes for now. We can always try again later. >>>>> >>>>> It may be time to restart the project. As a first step, I rebased the >>>>> patch on the current master. It wasn't trivial because of some latest >>>>> optimizations (a29eab, 1349d27 and 8d83a5d). >>>>> Now, Let's repeat the review and rewrite the current path according to >>>>> the reasons uttered in the revert commit. >>>> >>>> I think the fundamental task is to make the costing more reliable, and >>>> the commit message 443df6e2db points out a couple challenges in this >>>> area. Not sure how feasible it is to address enough of them ... >>>> >>>> 1) procost = 1.0 - I guess we could make this more realistic by doing >>>> some microbenchmarks and tuning the costs for the most expensive cases. >>>> >>>> 2) estimating quicksort comparisons - This relies on ndistinct >>>> estimates, and I'm not sure how much more reliable we can make those. >>>> Probably not much :-( Not sure what to do about this, the only thing I >>>> can think of is to track "reliability" of the estimates and only do the >>>> reordering if we have high confidence in the estimates. That means >>>> we'll >>>> miss some optimization opportunities, but it should limit the risk. >>> I read up on the history of this thread. >>> As I see, all the problems mentioned above can be beaten by excluding >>> the new cost model at all. We can sort GROUP BY columns according to the >>> 'ndistinct' value. >>> I see the reason for introducing the cost model in [1]. The main >>> supporting point here is that with this patch, people couldn't optimize >>> the query by themselves, organizing the order of the columns in a more >>> optimal way. But now we have at least the GUC to switch off the >>> behaviour introduced here. Also, some extensions, like the well-known >>> pg_hint_plan, can help with automation. >> >> I think the main concern is that if we reorder the group keys and get it >> wrong, it's a regression. If that happens often (due to costing based on >> poor stats), it's a problem. Yes, there's a GUC, but that's a rather >> blunt instrument, unfortunately. > I see. My point here is if the ndistinct of one column is much more than > the ndistinct of another, it is more probable that this correlation will > be kept in the grouping phase. Here we can get some regression, which > can be overweighed by the possibility below. I think the word "probable" hits what the problem is. Because what if it isn't? Also, we don't actually need ndistinct for individual attributes, but for the groups of attributes. Imagine you do GROUP BY a, b so you need to estimate whether to do (a,b) or (b,a). You need to calculate ndistinct(a,b) / ndistinct(a) and ndistinct(b,a) / ndistinct(b) And similarly for more grouping keys. This is why we have ndistinct extended statistics, mostly. >> >>> So, how about committing of the undoubted part of the feature and >>> working on the cost model in a new thread? >>> >> >> But Tom's commit message says this: >> >> Worse, to arrive at estimates of the number of calls made to the >> lower-order-column comparison functions, the code needs to make >> estimates of the numbers of distinct values of multiple columns, >> which are necessarily even less trustworthy than per-column stats. >> >> so I'm not sure this really counts as "undoubted". > Don't try to estimate multiple columns - just sort columns according to > the value of ndistinct as a heuristic. Surely you realize how easy such simple heuristics can fail, right? > I think we should estimate the number of values of multiple columns only > if we have extended statistics on these columns. And this can extend the > applicability of extended statistics. > I don't see why this should estimate ndistinct differently than every other place. That'd certainly be surprising. I can be convinced, but there needs to be sound justification why that's the right way to do (i.e. why it's less likely to cause poor planning choices). > The suggestion above is just an attempt to gather low-hanging fruit > right now. If it is not applicable, we will go a long way. > I'm not sure this qualities as "low hanging fruit" - that generally means simple changes with little risk. But you're using it for rather simplistic heuristic that can easily misfire (I think). At least that's my impression, I can be convinced otherwise. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Here is the patch rebased on the current master. Also, I fixed some minor slips and one static analyzer warning. This is just for adding to the next commitfest and enforcing work with this patch. One extra difference in newly added postgres_fdw tests is caused by this patch - see changes in the query plan in attachment. -- regards, Andrey Lepikhov Postgres Professional
Attachment
On 20/7/2023 18:46, Tomas Vondra wrote: > On 7/20/23 08:37, Andrey Lepikhov wrote: >> On 3/10/2022 21:56, Tom Lane wrote: >>> Revert "Optimize order of GROUP BY keys". >>> >>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>> several follow-on fixes. >>> ... >>> Since we're hard up against the release deadline for v15, let's >>> revert these changes for now. We can always try again later. >> >> It may be time to restart the project. As a first step, I rebased the >> patch on the current master. It wasn't trivial because of some latest >> optimizations (a29eab, 1349d27 and 8d83a5d). >> Now, Let's repeat the review and rewrite the current path according to >> the reasons uttered in the revert commit. > 1) procost = 1.0 - I guess we could make this more realistic by doing > some microbenchmarks and tuning the costs for the most expensive cases. Ok, some thoughts on this part of the task. As I see, we have not so many different operators: 26 with fixed width and 16 with variable width: SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen>0 ORDER BY o.oid; SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen<0 ORDER BY o.oid; Benchmarking procedure of types with fixed length can be something like below: CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE ( nrows integer, exec_time float ) AS $$ DECLARE data json; step integer; BEGIN SET work_mem='1GB'; FOR step IN 0..3 LOOP SELECT pow(100, step)::integer INTO nrows; DROP TABLE IF EXISTS test CASCADE; EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x FROM generate_series(1,%s) AS gs;', typ, nrows); EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON) SELECT * FROM test ORDER BY (x) DESC INTO data; SELECT data->0->'Execution Time' INTO exec_time; RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql VOLATILE; Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for the integer type (as a basis) we can calculate costs for other operators. Variable-width types, i think, could require more complex technique to check dependency on the length. Does this way look worthwhile? -- regards, Andrey Lepikhov Postgres Professional
On 20/7/2023 18:46, Tomas Vondra wrote: > 2) estimating quicksort comparisons - This relies on ndistinct > estimates, and I'm not sure how much more reliable we can make those. > Probably not much :-( Not sure what to do about this, the only thing I > can think of is to track "reliability" of the estimates and only do the > reordering if we have high confidence in the estimates. That means we'll > miss some optimization opportunities, but it should limit the risk. According to this issue, I see two options: 1. Go through the grouping column list and find the most reliable one. If we have a unique column or column with statistics on the number of distinct values, which is significantly more than ndistincts for other grouping columns, we can place this column as the first in the grouping. It should guarantee the reliability of such a decision, isn't it? 2. If we have extended statistics on distinct values and these statistics cover some set of first columns in the grouping list, we can optimize these positions. It also looks reliable. Any thoughts? -- regards, Andrey Lepikhov Postgres Professional
New version of the patch. Fixed minor inconsistencies and rebased onto current master. -- regards, Andrey Lepikhov Postgres Professional
Attachment
Hi! On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > New version of the patch. Fixed minor inconsistencies and rebased onto > current master. Thank you (and other authors) for working on this subject. Indeed to GROUP BY clauses are order-agnostic. Reordering them in the most suitable order could give up significant query planning benefits. I went through the thread: I see significant work has been already made on this patch, the code is quite polished. I'd like to make some notes. 1) As already mentioned, there is clearly a repetitive pattern for the code following after get_useful_group_keys_orderings() calls. I think it would be good to extract it into a separate function. Please, do this as a separate patch coming before the group-by patch. That would simplify the review. 2) I wonder what planning overhead this patch could introduce? Could you try to measure the worst case? What if we have a table with a lot of indexes and a long list of group-by clauses partially patching every index. This should give us an understanding on whether we need a separate GUC to control this feature. 3) I see that get_useful_group_keys_orderings() makes 3 calls to get_cheapest_group_keys_order() function. Each time get_cheapest_group_keys_order() performs the cost estimate and reorders the free keys. However, cost estimation implies the system catalog lookups (that is quite expensive). I wonder if we could change the algorithm. Could we just sort the group-by keys by cost once, save this ordering and then just re-use it. So, every time we need to reorder a group by, we can just pull the required keys to the top and use saved ordering for the rest. I also wonder if we could do this once for add_paths_to_grouping_rel() and create_partial_grouping_paths() calls. So, it probably should be somewhere in create_ordinary_grouping_paths(). 4) I think we can do some optimizations when enable_incremental_sort == off. Then in get_useful_group_keys_orderings() we should only deal with input_path fully matching the group-by clause, and try only full match of group-by output to the required order. ------ Regards, Alexander Korotkov
On 21/12/2023 17:53, Alexander Korotkov wrote: > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> New version of the patch. Fixed minor inconsistencies and rebased onto >> current master. > Thank you (and other authors) for working on this subject. Indeed to > GROUP BY clauses are order-agnostic. Reordering them in the most > suitable order could give up significant query planning benefits. I > went through the thread: I see significant work has been already made > on this patch, the code is quite polished. Maybe, but issues, mentioned in [1], still not resolved. It is the only reason, why this thread hasn't been active. > I'd like to make some notes. > 1) As already mentioned, there is clearly a repetitive pattern for the > code following after get_useful_group_keys_orderings() calls. I think > it would be good to extract it into a separate function. Please, do > this as a separate patch coming before the group-by patch. That would > simplify the review. Yeah, these parts of code a bit different. I will try to make common routine. > 2) I wonder what planning overhead this patch could introduce? Could > you try to measure the worst case? What if we have a table with a lot > of indexes and a long list of group-by clauses partially patching > every index. This should give us an understanding on whether we need > a separate GUC to control this feature. Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to > get_cheapest_group_keys_order() function. Each time > get_cheapest_group_keys_order() performs the cost estimate and > reorders the free keys. However, cost estimation implies the system > catalog lookups (that is quite expensive). I wonder if we could > change the algorithm. Could we just sort the group-by keys by cost > once, save this ordering and then just re-use it. So, every time we > need to reorder a group by, we can just pull the required keys to the > top and use saved ordering for the rest. I also wonder if we could do > this once for add_paths_to_grouping_rel() and > create_partial_grouping_paths() calls. So, it probably should be > somewhere in create_ordinary_grouping_paths(). Thanks for the idea!> 4) I think we can do some optimizations when enable_incremental_sort > == off. Then in get_useful_group_keys_orderings() we should only deal > with input_path fully matching the group-by clause, and try only full > match of group-by output to the required order. Oh, we had designed before the incremental sort was invented. Will see what we can do here. [1] https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com -- regards, Andrei Lepikhov Postgres Professional
On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 21/12/2023 17:53, Alexander Korotkov wrote: > > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > >> New version of the patch. Fixed minor inconsistencies and rebased onto > >> current master. > > Thank you (and other authors) for working on this subject. Indeed to > > GROUP BY clauses are order-agnostic. Reordering them in the most > > suitable order could give up significant query planning benefits. I > > went through the thread: I see significant work has been already made > > on this patch, the code is quite polished. > Maybe, but issues, mentioned in [1], still not resolved. It is the only > reason, why this thread hasn't been active. Yes, this makes sense. I have a couple of points from me on this subject. 1) The patch reorders GROUP BY items not only to make comparison cheaper but also to match the ordering of input paths and to match the ORDER BY clause. Thus, even if we leave aside for now sorting GROUP BY items by their cost, the patch will remain valuable. 2) An accurate estimate of the sorting cost is quite a difficult task. What if we make a simple rule of thumb that sorting integers and floats is cheaper than sorting numerics and strings with collation C, in turn, that is cheaper than sorting collation-aware strings (probably more groups)? Within the group, we could keep the original order of items. ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > 2) An accurate estimate of the sorting cost is quite a difficult task. Indeed. > What if we make a simple rule of thumb that sorting integers and > floats is cheaper than sorting numerics and strings with collation C, > in turn, that is cheaper than sorting collation-aware strings > (probably more groups)? Within the group, we could keep the original > order of items. I think it's a fool's errand to even try to separate different sort column orderings by cost. We simply do not have sufficiently accurate cost information. The previous patch in this thread got reverted because of that (well, also some implementation issues, but mostly that), and nothing has happened to make me think that another try will fare any better. regards, tom lane
On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: > > 2) An accurate estimate of the sorting cost is quite a difficult task. > > Indeed. > > > What if we make a simple rule of thumb that sorting integers and > > floats is cheaper than sorting numerics and strings with collation C, > > in turn, that is cheaper than sorting collation-aware strings > > (probably more groups)? Within the group, we could keep the original > > order of items. > > I think it's a fool's errand to even try to separate different sort > column orderings by cost. We simply do not have sufficiently accurate > cost information. The previous patch in this thread got reverted because > of that (well, also some implementation issues, but mostly that), and > nothing has happened to make me think that another try will fare any > better. If there is a choice of what to compare first: 8-bytes integers or collation-aware texts possibly toasted, then the answer is pretty evident for me. For sure, there are cases then this choice is wrong. But even if all the integers appear to be the same, the penalty isn't that much. Besides sorting column orderings by cost, this patch also tries to match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys. Do you think there is a chance for the second part if we leave the cost part aside? ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it's a fool's errand to even try to separate different sort >> column orderings by cost. > Besides sorting column orderings by cost, this patch also tries to > match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys. Do > you think there is a chance for the second part if we leave the cost > part aside? I think it's definitely reasonable to try to match up available orderings, because that doesn't really require fine distinctions of cost: either it matches or it doesn't. Eliminating a sort step entirely is clearly a win. (Incremental sort complicates this though. I doubt our cost model for incremental sorts is any good either, so I am not eager to rely on that more heavily.) regards, tom lane
On 27/12/2023 11:15, Alexander Korotkov wrote: > On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alexander Korotkov <aekorotkov@gmail.com> writes: >>> 2) An accurate estimate of the sorting cost is quite a difficult task. >> >> Indeed. >> >>> What if we make a simple rule of thumb that sorting integers and >>> floats is cheaper than sorting numerics and strings with collation C, >>> in turn, that is cheaper than sorting collation-aware strings >>> (probably more groups)? Within the group, we could keep the original >>> order of items. >> >> I think it's a fool's errand to even try to separate different sort >> column orderings by cost. We simply do not have sufficiently accurate >> cost information. The previous patch in this thread got reverted because >> of that (well, also some implementation issues, but mostly that), and >> nothing has happened to make me think that another try will fare any >> better. To be clear. In [1], I mentioned we can perform micro-benchmarks and structure costs of operators. At least for fixed-length operators, it is relatively easy. So, the main block here is an accurate prediction of ndistincts for different combinations of columns. Does it make sense to continue to design the feature in the direction of turning on choosing between different sort column orderings if we have extended statistics on the columns? [1] https://www.postgresql.org/message-id/e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru -- regards, Andrei Lepikhov Postgres Professional
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes: > To be clear. In [1], I mentioned we can perform micro-benchmarks and > structure costs of operators. At least for fixed-length operators, it is > relatively easy. I repeat what I said: this is a fool's errand. You will not get trustworthy results even for the cases you measured, let alone all the rest. I'd go as far as to say I would not believe your microbenchmarks, because they would only apply for one platform, compiler, backend build, phase of the moon, etc. regards, tom lane
On 27/12/2023 12:07, Tom Lane wrote: > Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes: >> To be clear. In [1], I mentioned we can perform micro-benchmarks and >> structure costs of operators. At least for fixed-length operators, it is >> relatively easy. > > I repeat what I said: this is a fool's errand. You will not get > trustworthy results even for the cases you measured, let alone > all the rest. I'd go as far as to say I would not believe your > microbenchmarks, because they would only apply for one platform, > compiler, backend build, phase of the moon, etc. Thanks for the explanation. I removed all cost-related codes. It still needs to be finished; I will smooth the code further and rewrite regression tests - many of them without cost-dependent reorderings look silly. Also, remember Alexander's remarks, which must be implemented, too. But already here, it works well. Look: Preliminaries: CREATE TABLE t(x int, y int, z text, w int); INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs FROM generate_series(1,10000) AS gs; CREATE INDEX abc ON t(x,y); ANALYZE t; SET enable_hashagg = 'off'; This patch eliminates unneeded Sort operation: explain SELECT x,y FROM t GROUP BY (x,y); explain SELECT x,y FROM t GROUP BY (y,x); Engages incremental sort: explain SELECT x,y FROM t GROUP BY (x,y,z,w); explain SELECT x,y FROM t GROUP BY (z,y,w,x); explain SELECT x,y FROM t GROUP BY (w,z,x,y); explain SELECT x,y FROM t GROUP BY (w,x,z,y); Works with subqueries: explain SELECT x,y FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1 GROUP BY (w,x,z,y); explain SELECT x,y FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1 GROUP BY (w,x,z,y); But arrangement with an ORDER BY clause doesn't work: DROP INDEX abc; explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); I think the reason is that the sort_pathkeys and group_pathkeys are physically different structures, and we can't just compare pointers here. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > But arrangement with an ORDER BY clause doesn't work: > > DROP INDEX abc; > explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); > > I think the reason is that the sort_pathkeys and group_pathkeys are > physically different structures, and we can't just compare pointers here. I haven't yet looked into the code. But this looks strange to me. Somehow, optimizer currently matches index pathkeys to ORDER BY pathkeys. If GROUP BY pathkeys could be matched to index pathkeys, then it should be possible to match them to ORDER BY pathkeys too. ------ Regards, Alexander Korotkov
On 28/12/2023 18:29, Alexander Korotkov wrote: > On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> But arrangement with an ORDER BY clause doesn't work: >> >> DROP INDEX abc; >> explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); >> >> I think the reason is that the sort_pathkeys and group_pathkeys are >> physically different structures, and we can't just compare pointers here. > > I haven't yet looked into the code. But this looks strange to me. > Somehow, optimizer currently matches index pathkeys to ORDER BY > pathkeys. If GROUP BY pathkeys could be matched to index pathkeys, > then it should be possible to match them to ORDER BY pathkeys too. Oh, I found the mistake: I got used to using GROUP BY and ORDER BY on many columns with round brackets. In the case of the grouping list, it doesn't change anything. But ordering treats it as a WholeRowVar and breaks group-by arrangement. Look: explain (COSTS OFF) SELECT relname,reltuples FROM pg_class GROUP BY relname,reltuples ORDER BY reltuples,relname; Group Group Key: reltuples, relname -> Sort Sort Key: reltuples, relname -> Seq Scan on pg_class But: explain (COSTS OFF) SELECT relname,reltuples FROM pg_class GROUP BY relname,reltuples ORDER BY (reltuples,relname); Sort Sort Key: (ROW(reltuples, relname)) -> Group Group Key: relname, reltuples -> Sort Sort Key: relname, reltuples -> Seq Scan on pg_class So, let's continue to work. -- regards, Andrei Lepikhov Postgres Professional
Here is a new version of GROUP-BY optimization without sort model. On 21/12/2023 17:53, Alexander Korotkov wrote: > I'd like to make some notes. > > 1) As already mentioned, there is clearly a repetitive pattern for the > code following after get_useful_group_keys_orderings() calls. I think > it would be good to extract it into a separate function. Please, do > this as a separate patch coming before the group-by patch. That would > simplify the review. Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't practical, because it blows out the interface of the routine. > 2) I wonder what planning overhead this patch could introduce? Could > you try to measure the worst case? What if we have a table with a lot > of indexes and a long list of group-by clauses partially patching > every index. This should give us an understanding on whether we need > a separate GUC to control this feature. In current implementation I don't anticipate any significant overhead. GUC is needed here to allow users adhere their own ordering and to disable feature in the case of problems. > 4) I think we can do some optimizations when enable_incremental_sort > == off. Then in get_useful_group_keys_orderings() we should only deal > with input_path fully matching the group-by clause, and try only full > match of group-by output to the required order. Hm, is it really make sense in current implementation? -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > Here is a new version of GROUP-BY optimization without sort model. > > On 21/12/2023 17:53, Alexander Korotkov wrote: > > I'd like to make some notes. > > > > 1) As already mentioned, there is clearly a repetitive pattern for the > > code following after get_useful_group_keys_orderings() calls. I think > > it would be good to extract it into a separate function. Please, do > > this as a separate patch coming before the group-by patch. That would > > simplify the review. > Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't > practical, because it blows out the interface of the routine. > > > 2) I wonder what planning overhead this patch could introduce? Could > > you try to measure the worst case? What if we have a table with a lot > > of indexes and a long list of group-by clauses partially patching > > every index. This should give us an understanding on whether we need > > a separate GUC to control this feature. > In current implementation I don't anticipate any significant overhead. > GUC is needed here to allow users adhere their own ordering and to > disable feature in the case of problems. > > > 4) I think we can do some optimizations when enable_incremental_sort > > == off. Then in get_useful_group_keys_orderings() we should only deal > > with input_path fully matching the group-by clause, and try only full > > match of group-by output to the required order. > Hm, is it really make sense in current implementation? CFBot shows the following errors at [1] with: [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function ‘estimate_num_groups’: [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning: implicit declaration of function ‘estimate_num_groups_incremental’ [-Wimplicit-function-declaration] [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs, [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level: [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no previous prototype for ‘estimate_num_groups_incremental’ [-Wmissing-prototypes] [08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo *root, List *groupExprs, [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error: conflicting types for ‘estimate_num_groups_incremental’ [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note: previous implicit declaration of ‘estimate_num_groups_incremental’ was here [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs, [1] - https://cirrus-ci.com/task/5074942069309440 Regards, Vignesh
On 9/1/2024 16:45, vignesh C wrote: > On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: >> >> Here is a new version of GROUP-BY optimization without sort model. >> >> On 21/12/2023 17:53, Alexander Korotkov wrote: >>> I'd like to make some notes. >>> >>> 1) As already mentioned, there is clearly a repetitive pattern for the >>> code following after get_useful_group_keys_orderings() calls. I think >>> it would be good to extract it into a separate function. Please, do >>> this as a separate patch coming before the group-by patch. That would >>> simplify the review. >> Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't >> practical, because it blows out the interface of the routine. >> >>> 2) I wonder what planning overhead this patch could introduce? Could >>> you try to measure the worst case? What if we have a table with a lot >>> of indexes and a long list of group-by clauses partially patching >>> every index. This should give us an understanding on whether we need >>> a separate GUC to control this feature. >> In current implementation I don't anticipate any significant overhead. >> GUC is needed here to allow users adhere their own ordering and to >> disable feature in the case of problems. >> >>> 4) I think we can do some optimizations when enable_incremental_sort >>> == off. Then in get_useful_group_keys_orderings() we should only deal >>> with input_path fully matching the group-by clause, and try only full >>> match of group-by output to the required order. >> Hm, is it really make sense in current implementation? > > CFBot shows the following errors at [1] with: > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function > ‘estimate_num_groups’: > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning: > implicit declaration of function ‘estimate_num_groups_incremental’ > [-Wimplicit-function-declaration] > [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs, > [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level: > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no > previous prototype for ‘estimate_num_groups_incremental’ > [-Wmissing-prototypes] > [08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo > *root, List *groupExprs, > [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error: > conflicting types for ‘estimate_num_groups_incremental’ > [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note: > previous implicit declaration of ‘estimate_num_groups_incremental’ was > here > [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs, Hmm, I don't see this old code in these patches. Resend 0002-* because of trailing spaces. -- regards, Andrei Lepikhov Postgres Professional
Attachment
Hi, Andrei!
Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.
AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new version of the whole patchset should be sent even if most patches are unchanged.
Regards,
Pavel Borisov
Hi! On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >> Hmm, I don't see this old code in these patches. Resend 0002-* because >> of trailing spaces. > > > AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new versionof the whole patchset should be sent even if most patches are unchanged. Please, find the revised patchset with some refactoring and comments improvement from me. I'll continue looking into this. ------ Regards, Alexander Korotkov
Attachment
On 11/1/2024 18:30, Alexander Korotkov wrote: > Hi! > > On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >>> Hmm, I don't see this old code in these patches. Resend 0002-* because >>> of trailing spaces. >> >> >> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new versionof the whole patchset should be sent even if most patches are unchanged. > > Please, find the revised patchset with some refactoring and comments > improvement from me. I'll continue looking into this. The patch looks better, thanks to your refactoring. I propose additional comments and tests to make the code more understandable (see attachment). I intended to look into this part of the code more, but the tests show a difference between PostgreSQL v.15 and v.16, which causes changes in the code of this feature. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 11/1/2024 18:30, Alexander Korotkov wrote: > > On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > >>> Hmm, I don't see this old code in these patches. Resend 0002-* because > >>> of trailing spaces. > >> > >> > >> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new versionof the whole patchset should be sent even if most patches are unchanged. > > > > Please, find the revised patchset with some refactoring and comments > > improvement from me. I'll continue looking into this. > The patch looks better, thanks to your refactoring. > I propose additional comments and tests to make the code more > understandable (see attachment). > I intended to look into this part of the code more, but the tests show a > difference between PostgreSQL v.15 and v.16, which causes changes in the > code of this feature. Makes sense. I've incorporated your changes into the patchset. ------ Regards, Alexander Korotkov
Attachment
On 13/1/2024 22:00, Alexander Korotkov wrote: > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> On 11/1/2024 18:30, Alexander Korotkov wrote: >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >>>>> Hmm, I don't see this old code in these patches. Resend 0002-* because >>>>> of trailing spaces. >>>> >>>> >>>> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new versionof the whole patchset should be sent even if most patches are unchanged. >>> >>> Please, find the revised patchset with some refactoring and comments >>> improvement from me. I'll continue looking into this. >> The patch looks better, thanks to your refactoring. >> I propose additional comments and tests to make the code more >> understandable (see attachment). >> I intended to look into this part of the code more, but the tests show a >> difference between PostgreSQL v.15 and v.16, which causes changes in the >> code of this feature. > > Makes sense. I've incorporated your changes into the patchset. One more improvement. To underpin code change: - return cur_ec; /* Match! */ + { + /* + * Match! + * + * Copy the sortref if it wasn't set yet. That may happen if + * the ec was constructed from WHERE clause, i.e. it doesn't + * have a target reference at all. + */ + if (cur_ec->ec_sortref == 0 && sortref > 0) + cur_ec->ec_sortref = sortref; + return cur_ec; + } I propose the test (see attachment). It shows why we introduce this change: GROUP-BY should juggle not only pathkeys generated by explicit sort operators but also planner-made, likewise in this example, by MergeJoin. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Sun, Jan 14, 2024 at 2:14 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 13/1/2024 22:00, Alexander Korotkov wrote: > > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > >> On 11/1/2024 18:30, Alexander Korotkov wrote: > >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > >>>>> Hmm, I don't see this old code in these patches. Resend 0002-* because > >>>>> of trailing spaces. > >>>> > >>>> > >>>> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a newversion of the whole patchset should be sent even if most patches are unchanged. > >>> > >>> Please, find the revised patchset with some refactoring and comments > >>> improvement from me. I'll continue looking into this. > >> The patch looks better, thanks to your refactoring. > >> I propose additional comments and tests to make the code more > >> understandable (see attachment). > >> I intended to look into this part of the code more, but the tests show a > >> difference between PostgreSQL v.15 and v.16, which causes changes in the > >> code of this feature. > > > > Makes sense. I've incorporated your changes into the patchset. > One more improvement. To underpin code change: > > - return cur_ec; /* Match! */ > + { > + /* > + * Match! > + * > + * Copy the sortref if it wasn't set yet. That may happen if > + * the ec was constructed from WHERE clause, i.e. it doesn't > + * have a target reference at all. > + */ > + if (cur_ec->ec_sortref == 0 && sortref > 0) > + cur_ec->ec_sortref = sortref; > + return cur_ec; > + } > > I propose the test (see attachment). It shows why we introduce this > change: GROUP-BY should juggle not only pathkeys generated by explicit > sort operators but also planner-made, likewise in this example, by > MergeJoin. Thank you for providing the test case relevant for this code change. The revised patch incorporating this change is attached. Now the patchset looks good to me. I'm going to push it if there are no objections. ------ Regards, Alexander Korotkov
Attachment
On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached. Now the
patchset looks good to me. I'm going to push it if there are no
objections.
Seems I'm late for the party. Can we hold for several more days? I'd
like to have a review on this patch.
Thanks
Richard
like to have a review on this patch.
Thanks
Richard
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> Thank you for providing the test case relevant for this code change. >> The revised patch incorporating this change is attached. Now the >> patchset looks good to me. I'm going to push it if there are no >> objections. > > Seems I'm late for the party. Can we hold for several more days? I'd > like to have a review on this patch. Sure, NP. I'll hold it till your review. ------ Regards, Alexander Korotkov
On 15/1/2024 13:42, Richard Guo wrote: > > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com > <mailto:aekorotkov@gmail.com>> wrote: > > Thank you for providing the test case relevant for this code change. > The revised patch incorporating this change is attached. Now the > patchset looks good to me. I'm going to push it if there are no > objections. > > > Seems I'm late for the party. Can we hold for several more days? I'd > like to have a review on this patch. Get on board! It looks like this feature needs as much review as possible (likewise SJE). -- regards, Andrei Lepikhov Postgres Professional
On 15.01.2024 12:46, Andrei Lepikhov wrote: > On 15/1/2024 13:42, Richard Guo wrote: >> >> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov >> <aekorotkov@gmail.com <mailto:aekorotkov@gmail.com>> wrote: >> >> Thank you for providing the test case relevant for this code change. >> The revised patch incorporating this change is attached. Now the >> patchset looks good to me. I'm going to push it if there are no >> objections. >> >> >> Seems I'm late for the party. Can we hold for several more days? I'd >> like to have a review on this patch. > Get on board! It looks like this feature needs as much review as > possible (likewise SJE). > Hi! Thank you for your work on this issue! I believe that this will help the scheduler to make a more optimal query plan here and therefore speed up their execution. I have reviewed patches and noticed that we can add some code refactoring. I have attached a diff file (group_by.diff) to this email. The changes involve spelling corrections, renaming variables and porting some common parts. In addition, I have a few questions, since some points in the code remained unclear to me. 1. I didn't understand why we have a question in the comment next to the enable_group_by_reordering variable in src/backend/optimizer/path/pathkeys.c file, I assumed it was spelling and fixed it in the diff file. 2. Why do we set the variable (path = path_save) here (add_paths_to_grouping_rel function) if we change its variable below and we can pass path_save as a parameter? foreach(lc2, pathkey_orderings) { PathKeyInfo *info = (PathKeyInfo *) lfirst(lc2); /* restore the path (we replace it in the loop) */ path = path_save; path = make_ordered_path(root, grouped_rel, path, cheapest_path, info->pathkeys); if (path == NULL) continue; -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Thank you for providing the test case relevant for this code change.
>> The revised patch incorporating this change is attached. Now the
>> patchset looks good to me. I'm going to push it if there are no
>> objections.
>
> Seems I'm late for the party. Can we hold for several more days? I'd
> like to have a review on this patch.
Sure, NP. I'll hold it till your review.
Thanks. Appreciate that.
I have briefly reviewed this patch and here are some comments.
* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys. This does not make sense. I guess this is a typo and
the real intention is to check against root->sort_pathkeys.
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
root->num_groupby_pathkeys);
if (n > 0 &&
- (enable_incremental_sort || n == list_length(path->pathkeys)))
+ (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
However, I think this is also incorrect. When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.
* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos. Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great. Consider the query below:
create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;
explain select count(*) from t group by a, b;
QUERY PLAN
----------------------------------------------------------------------------------
GroupAggregate (cost=0.15..97.27 rows=226 width=16)
Group Key: a, b
-> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8)
(3 rows)
The same path with group keys {a, b} is created and added twice.
* Part of the work performed in this patch overlaps with that of
preprocess_groupclause. They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY. I wonder if it would be better to
perform this work only once.
* When reordering the GROUP BY keys, I think the following checks need
some improvements.
+ /*
+ * Since 1349d27 pathkey coming from underlying node can be in the
+ * root->group_pathkeys but not in the processed_groupClause. So, we
+ * should be careful here.
+ */
+ sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+ *group_clauses);
+ if (!sgc)
+ /* The grouping clause does not cover this pathkey */
+ break;
I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it. Also, how
can we guarantee that the returned GROUP BY item is sortable? Should we
check that with OidIsValid(sgc->sortop)?
Thanks
Richard
* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys. This does not make sense. I guess this is a typo and
the real intention is to check against root->sort_pathkeys.
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
root->num_groupby_pathkeys);
if (n > 0 &&
- (enable_incremental_sort || n == list_length(path->pathkeys)))
+ (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
However, I think this is also incorrect. When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.
* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos. Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great. Consider the query below:
create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;
explain select count(*) from t group by a, b;
QUERY PLAN
----------------------------------------------------------------------------------
GroupAggregate (cost=0.15..97.27 rows=226 width=16)
Group Key: a, b
-> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8)
(3 rows)
The same path with group keys {a, b} is created and added twice.
* Part of the work performed in this patch overlaps with that of
preprocess_groupclause. They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY. I wonder if it would be better to
perform this work only once.
* When reordering the GROUP BY keys, I think the following checks need
some improvements.
+ /*
+ * Since 1349d27 pathkey coming from underlying node can be in the
+ * root->group_pathkeys but not in the processed_groupClause. So, we
+ * should be careful here.
+ */
+ sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+ *group_clauses);
+ if (!sgc)
+ /* The grouping clause does not cover this pathkey */
+ break;
I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it. Also, how
can we guarantee that the returned GROUP BY item is sortable? Should we
check that with OidIsValid(sgc->sortop)?
Thanks
Richard
Hi! Thank you for your review. The revised patchset is attached. On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote: >> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> >> >> Thank you for providing the test case relevant for this code change. >> >> The revised patch incorporating this change is attached. Now the >> >> patchset looks good to me. I'm going to push it if there are no >> >> objections. >> > >> > Seems I'm late for the party. Can we hold for several more days? I'd >> > like to have a review on this patch. >> >> Sure, NP. I'll hold it till your review. > > > Thanks. Appreciate that. > > I have briefly reviewed this patch and here are some comments. > > * When trying to match the ordering of GROUP BY to that of ORDER BY in > get_useful_group_keys_orderings, this patch checks against the length of > path->pathkeys. This does not make sense. I guess this is a typo and > the real intention is to check against root->sort_pathkeys. > > --- a/src/backend/optimizer/path/pathkeys.c > +++ b/src/backend/optimizer/path/pathkeys.c > @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) > root->num_groupby_pathkeys); > > if (n > 0 && > - (enable_incremental_sort || n == list_length(path->pathkeys))) > + (enable_incremental_sort || n == list_length(root->sort_pathkeys))) > > However, I think this is also incorrect. When incremental sort is > disabled, it is reasonable to consider reordering the GROUP BY keys only > if the number of matching pathkeys is equal to the length of > root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'. Hmm... Why should this be list_length(root->group_pathkeys) while we're targeting root->sort_pathkeys. I yet changed that to list_length(root->sort_pathkeys). > * When the original ordering of GROUP BY keys matches the path's > pathkeys or ORDER BY keys, get_useful_group_keys_orderings would > generate duplicate PathKeyInfos. Consequently, this duplication would > lead to the creation and addition of identical paths multiple times. > This is not great. Consider the query below: > > create table t (a int, b int); > create index on t (a, b); > set enable_hashagg to off; > > explain select count(*) from t group by a, b; > QUERY PLAN > ---------------------------------------------------------------------------------- > GroupAggregate (cost=0.15..97.27 rows=226 width=16) > Group Key: a, b > -> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8) > (3 rows) > > The same path with group keys {a, b} is created and added twice. I tried to address that. > * Part of the work performed in this patch overlaps with that of > preprocess_groupclause. They are both trying to adjust the ordering of > the GROUP BY keys to match ORDER BY. I wonder if it would be better to > perform this work only once. Andrei, could you take a look. > * When reordering the GROUP BY keys, I think the following checks need > some improvements. > > + /* > + * Since 1349d27 pathkey coming from underlying node can be in the > + * root->group_pathkeys but not in the processed_groupClause. So, we > + * should be careful here. > + */ > + sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref, > + *group_clauses); > + if (!sgc) > + /* The grouping clause does not cover this pathkey */ > + break; > > I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is > valid before calling get_sortgroupref_clause_noerr with it. Also, how > can we guarantee that the returned GROUP BY item is sortable? Should we > check that with OidIsValid(sgc->sortop)? Added. ------ Regards, Alexander Korotkov
Attachment
On 16/1/2024 22:05, Alexander Korotkov wrote: > On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote: >> * When trying to match the ordering of GROUP BY to that of ORDER BY in >> get_useful_group_keys_orderings, this patch checks against the length of >> path->pathkeys. This does not make sense. I guess this is a typo and >> the real intention is to check against root->sort_pathkeys. Thanks! It is really my blunder - fresh look works. >> >> --- a/src/backend/optimizer/path/pathkeys.c >> +++ b/src/backend/optimizer/path/pathkeys.c >> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) >> root->num_groupby_pathkeys); >> >> if (n > 0 && >> - (enable_incremental_sort || n == list_length(path->pathkeys))) >> + (enable_incremental_sort || n == list_length(root->sort_pathkeys))) >> >> However, I think this is also incorrect. When incremental sort is >> disabled, it is reasonable to consider reordering the GROUP BY keys only >> if the number of matching pathkeys is equal to the length of >> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'. > > Hmm... Why should this be list_length(root->group_pathkeys) while > we're targeting root->sort_pathkeys. I yet changed that to > list_length(root->sort_pathkeys). I think, in the first case, when we are trying to arrange group-by keys according to underlying pathkeys with incremental sort = off, it makes sense to do if we fetch all group-by keys regardless of a more or equal number of path keys the incoming path contains. The code and test case are in step1.txt. > >> * When the original ordering of GROUP BY keys matches the path's >> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would >> generate duplicate PathKeyInfos. Consequently, this duplication would >> lead to the creation and addition of identical paths multiple times. >> This is not great. Consider the query below: >> >> create table t (a int, b int); >> create index on t (a, b); >> set enable_hashagg to off; >> >> explain select count(*) from t group by a, b; >> QUERY PLAN >> ---------------------------------------------------------------------------------- >> GroupAggregate (cost=0.15..97.27 rows=226 width=16) >> Group Key: a, b >> -> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8) >> (3 rows) >> >> The same path with group keys {a, b} is created and added twice. > > I tried to address that. > >> * Part of the work performed in this patch overlaps with that of >> preprocess_groupclause. They are both trying to adjust the ordering of >> the GROUP BY keys to match ORDER BY. I wonder if it would be better to >> perform this work only once. > > Andrei, could you take a look. As I see, the PathKeyInfo list often contains duplicated pathkeys, coming from either sort_pathkeys or path->pathkeys orderings. So, I can propose to check duplicates each time (see step2.txt in attachment). > >> * When reordering the GROUP BY keys, I think the following checks need >> some improvements. >> >> + /* >> + * Since 1349d27 pathkey coming from underlying node can be in the >> + * root->group_pathkeys but not in the processed_groupClause. So, we >> + * should be careful here. >> + */ >> + sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref, >> + *group_clauses); >> + if (!sgc) >> + /* The grouping clause does not cover this pathkey */ >> + break; >> >> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is >> valid before calling get_sortgroupref_clause_noerr with it. Also, how >> can we guarantee that the returned GROUP BY item is sortable? Should we >> check that with OidIsValid(sgc->sortop)? > > Added. Reviewed it, looks good. -- regards, Andrei Lepikhov Postgres Professional
Attachment
Just forgotten second attachment. -- regards, Andrei Lepikhov Postgres Professional
Attachment
Hi! I've applied your changes with minor editing, thank you. On Thu, Jan 18, 2024 at 11:49 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > >> * Part of the work performed in this patch overlaps with that of > >> preprocess_groupclause. They are both trying to adjust the ordering of > >> the GROUP BY keys to match ORDER BY. I wonder if it would be better to > >> perform this work only once. > > > > Andrei, could you take a look. > As I see, the PathKeyInfo list often contains duplicated pathkeys, > coming from either sort_pathkeys or path->pathkeys orderings. So, I can > propose to check duplicates each time (see step2.txt in attachment). Actually I asked to recheck if we can cut some part of preprocess_groupclause() given that we're reordering the pathkeys later. It seems that we can remove everything except the work with a grouping set. I've done this in the revised patchset. I'm going to push this if there are no objections. ------ Regards, Alexander Korotkov
Attachment
A recent buildfarm failure [0] seems to indicate a name collision with the "abc" index in the aggregates.sql test and the "abc" table in namespace.sql. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > A recent buildfarm failure [0] seems to indicate a name collision with the > "abc" index in the aggregates.sql test and the "abc" table in > namespace.sql. > > [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14 Thank you for catching this. Fixed. ------ Regards, Alexander Korotkov
On Thu, 25 Jan 2024 at 01:15, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > A recent buildfarm failure [0] seems to indicate a name collision with the > > "abc" index in the aggregates.sql test and the "abc" table in > > namespace.sql. > > > > [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14 > > Thank you for catching this. Fixed. Since the patch has been committed, I have marked this entry as committed. Regards, Vignesh
On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think it's a fool's errand to even try to separate different sort > column orderings by cost. We simply do not have sufficiently accurate > cost information. The previous patch in this thread got reverted because > of that (well, also some implementation issues, but mostly that), and > nothing has happened to make me think that another try will fare any > better. I'm late to the party, but I'd like to better understand what's being argued here. If you're saying that, for some particular planner problem, we should prefer a solution that doesn't need to know about the relative cost of various sorts over one that does, I agree, for exactly the reason that you state: our knowledge of sort costs won't be reliable, and we will make mistakes. That's true in lots of situations, not just related to sorts, because estimation is a hard problem. Heuristics not based on cost are going to be, in many cases, more accurate than heuristics based on cost. They're also often cheaper, since they often let us reject possible approaches very early, without all the bother of a cost comparison. But if you're saying that it's utterly impossible to know whether sorting text will be cheaper or more expensive than sorting 4-byte integers, and that if a particular problem can be solved only by knowing which one is cheaper we should just give up, then I disagree. In the absence of any other information, it must be right, at the very least, to bank on varlena data types being more expensive to sort than fixed-length data types. How much more expensive is hard to know, because toasted blobs are going to be more expensive to sort than short varlenas. But even before you reach the comparison function, a pass-by-value datum has a significantly lower access cost than a pass-by-reference datum. The fact that the pass-by-reference value might be huge only compounds the problem. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it's a fool's errand to even try to separate different sort >> column orderings by cost. We simply do not have sufficiently accurate >> cost information. The previous patch in this thread got reverted because >> of that (well, also some implementation issues, but mostly that), and >> nothing has happened to make me think that another try will fare any >> better. > I'm late to the party, but I'd like to better understand what's being > argued here. What I am saying is that we don't have sufficiently accurate cost information to support the sort of logic that got committed and reverted before. I did not mean to imply that it's not possible to have such info, only that it is not present today. IOW, what I'm saying is that if you want to write code that tries to make a cost-based preference of one sorting over another, you *first* need to put in a bunch of legwork to create more accurate cost numbers. Trying to make such logic depend on the numbers we have today is just going to result in garbage in, garbage out. Sadly, that's not a small task: * We'd need to put effort into assigning more realistic procost values --- preferably across the board, not just comparison functions. As long as all the comparison functions have procost 1.0, you're just flying blind. * As you mentioned, there'd need to be some accounting for the likely size of varlena inputs, and especially whether they might be toasted. * cost_sort knows nothing of the low-level sort algorithm improvements we've made in recent years, such as abbreviated keys. That's a lot of work, and I think it has to be done before we try to build infrastructure on top, not afterwards. regards, tom lane
On Fri, Jan 26, 2024 at 10:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sadly, that's not a small task: > > * We'd need to put effort into assigning more realistic procost > values --- preferably across the board, not just comparison functions. > As long as all the comparison functions have procost 1.0, you're > just flying blind. > > * As you mentioned, there'd need to be some accounting for the > likely size of varlena inputs, and especially whether they might > be toasted. > > * cost_sort knows nothing of the low-level sort algorithm improvements > we've made in recent years, such as abbreviated keys. > > That's a lot of work, and I think it has to be done before we try > to build infrastructure on top, not afterwards. OK, that makes sense to me. Thanks. -- Robert Haas EDB: http://www.enterprisedb.com
Alexander Korotkov <aekorotkov@gmail.com> writes: > I'm going to push this if there are no objections. One of the test cases added by this commit has not been very stable in the buildfarm. Latest example is here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 and I've seen similar failures intermittently on other machines. I'd suggest building this test atop a table that is more stable than pg_class. You're just waving a red flag in front of a bull if you expect stable statistics from that during a regression run. Nor do I see any particular reason for pg_class to be especially suited to the test. regards, tom lane
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I'm going to push this if there are no objections.
One of the test cases added by this commit has not been very
stable in the buildfarm. Latest example is here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
and I've seen similar failures intermittently on other machines.
I'd suggest building this test atop a table that is more stable
than pg_class. You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.
Yeah, it's not a good practice to use pg_class in this place. While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great. Such as
* The table 'btg' is inserted with 10000 tuples, which seems a bit
expensive for a test. I don't think we need such a big table to test
what we want.
* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.
* I think we'd better write the tests with the keywords being all upper
or all lower. A mixed use of upper and lower is not great. Such as in
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
* Some comments for the test queries are not easy to read.
* For this statement
CREATE INDEX idx_y_x_z ON btg(y,x,w);
I think the index name would cause confusion. It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.
I'd like to write a draft patch for the fixes.
Thanks
Richard
looking through the test cases added by this commit, I noticed some
other minor issues that are not great. Such as
* The table 'btg' is inserted with 10000 tuples, which seems a bit
expensive for a test. I don't think we need such a big table to test
what we want.
* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.
* I think we'd better write the tests with the keywords being all upper
or all lower. A mixed use of upper and lower is not great. Such as in
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
* Some comments for the test queries are not easy to read.
* For this statement
CREATE INDEX idx_y_x_z ON btg(y,x,w);
I think the index name would cause confusion. It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.
I'd like to write a draft patch for the fixes.
Thanks
Richard
On 2/2/2024 09:02, Tom Lane wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: >> I'm going to push this if there are no objections. > > One of the test cases added by this commit has not been very > stable in the buildfarm. Latest example is here: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 > > and I've seen similar failures intermittently on other machines. > > I'd suggest building this test atop a table that is more stable > than pg_class. You're just waving a red flag in front of a bull > if you expect stable statistics from that during a regression run. > Nor do I see any particular reason for pg_class to be especially > suited to the test. Yeah, It is my fault. Please, see in the attachment the patch fixing that. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:One of the test cases added by this commit has not been very
stable in the buildfarm. Latest example is here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
and I've seen similar failures intermittently on other machines.
I'd suggest building this test atop a table that is more stable
than pg_class. You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.Yeah, it's not a good practice to use pg_class in this place. While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great. Such as
* The table 'btg' is inserted with 10000 tuples, which seems a bit
expensive for a test. I don't think we need such a big table to test
what we want.
* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.
* I think we'd better write the tests with the keywords being all upper
or all lower. A mixed use of upper and lower is not great. Such as in
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
* Some comments for the test queries are not easy to read.
* For this statement
CREATE INDEX idx_y_x_z ON btg(y,x,w);
I think the index name would cause confusion. It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.
I'd like to write a draft patch for the fixes.
Here is the draft patch that fixes the issues I complained about in
upthread.
Thanks
Richard
upthread.
Thanks
Richard
Attachment
On 2/2/2024 11:06, Richard Guo wrote: > > On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com > <mailto:guofenglinux@gmail.com>> wrote: > > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > One of the test cases added by this commit has not been very > stable in the buildfarm. Latest example is here: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04> > > and I've seen similar failures intermittently on other machines. > > I'd suggest building this test atop a table that is more stable > than pg_class. You're just waving a red flag in front of a bull > if you expect stable statistics from that during a regression run. > Nor do I see any particular reason for pg_class to be especially > suited to the test. > > > Yeah, it's not a good practice to use pg_class in this place. While > looking through the test cases added by this commit, I noticed some > other minor issues that are not great. Such as > > * The table 'btg' is inserted with 10000 tuples, which seems a bit > expensive for a test. I don't think we need such a big table to test > what we want. > > * I don't see why we need to manipulate GUC max_parallel_workers and > max_parallel_workers_per_gather. > > * I think we'd better write the tests with the keywords being all upper > or all lower. A mixed use of upper and lower is not great. Such as in > > explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w; > > * Some comments for the test queries are not easy to read. > > * For this statement > > CREATE INDEX idx_y_x_z ON btg(y,x,w); > > I think the index name would cause confusion. It creates an index on > columns y, x and w, but the name indicates an index on y, x and z. > > I'd like to write a draft patch for the fixes. > > > Here is the draft patch that fixes the issues I complained about in > upthread. I passed through the patch. Looks like it doesn't break anything. Why do you prefer to use count(*) in EXPLAIN instead of plain targetlist, like "SELECT x,y,..."? Also, according to the test mentioned by Tom: 1. I see, PG uses IndexScan on (x,y). So, column x will be already sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) instead of full sort? 2. For memo, IMO, this test shows us the future near perspective of this feature: It is cheaper to use grouping order (w,z) instead of (z,w). -- regards, Andrei Lepikhov Postgres Professional
On Fri, Feb 2, 2024 at 12:40 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 2/2/2024 11:06, Richard Guo wrote:
>
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com
> <mailto:guofenglinux@gmail.com>> wrote:
>
> On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
> One of the test cases added by this commit has not been very
> stable in the buildfarm. Latest example is here:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
>
> and I've seen similar failures intermittently on other machines.
>
> I'd suggest building this test atop a table that is more stable
> than pg_class. You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.
>
>
> Yeah, it's not a good practice to use pg_class in this place. While
> looking through the test cases added by this commit, I noticed some
> other minor issues that are not great. Such as
>
> * The table 'btg' is inserted with 10000 tuples, which seems a bit
> expensive for a test. I don't think we need such a big table to test
> what we want.
>
> * I don't see why we need to manipulate GUC max_parallel_workers and
> max_parallel_workers_per_gather.
>
> * I think we'd better write the tests with the keywords being all upper
> or all lower. A mixed use of upper and lower is not great. Such as in
>
> explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
> * Some comments for the test queries are not easy to read.
>
> * For this statement
>
> CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
> I think the index name would cause confusion. It creates an index on
> columns y, x and w, but the name indicates an index on y, x and z.
>
> I'd like to write a draft patch for the fixes.
>
>
> Here is the draft patch that fixes the issues I complained about in
> upthread.
I passed through the patch. Looks like it doesn't break anything. Why do
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like
"SELECT x,y,..."?
Nothing special. Just making the test cases consistent as much as
possible.
possible.
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)
instead of full sort?
I think that's because the planner chooses to use (z, w, x) to perform
the mergejoin. I did not delve into the details, but I guess the cost
estimation decides this is cheaper.
Hi Alexander,
What do you think about the revisions for the test cases?
Thanks
Richard
the mergejoin. I did not delve into the details, but I guess the cost
estimation decides this is cheaper.
Hi Alexander,
What do you think about the revisions for the test cases?
Thanks
Richard
Hi, Richard! > What do you think about the revisions for the test cases? I've rebased your patch upthread. Did some minor beautifications. > * The table 'btg' is inserted with 10000 tuples, which seems a bit > expensive for a test. I don't think we need such a big table to test > what we want. Your patch reduces the number of rows to 1000 tuples. I found it possible to further reduce it to 100 tuples. That also allowed me to save the plan in the test case introduced by e1b7fde418. Please check if you're OK with the patch attached. ------ Regards, Alexander Korotkov
Attachment
Hi!
Another issue on test introduced in 0452b461bc405. I think it may be unstable in some circumstances.
For example, if we'll try to use different BLCKSZ. See, I've made a little change in the number of tuples to be inserted:
$ git diff
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;
And the bulk extension is kicked, so we got zeroed pages in the relation. The plane is also changed,
switched to seq scan from index scan:
@@ -2734,7 +2734,7 @@
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;
-- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@
-- Engage incremental sort
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+------------------------------
Group
Group Key: x, y, z, w
- -> Incremental Sort
+ -> Sort
Sort Key: x, y, z, w
- Presorted Key: x, y
- -> Index Scan using btg_x_y_idx on btg
-(6 rows)
+ -> Seq Scan on btg
+(5 rows)
i % 100 AS y,
'abc' || i % 10 AS z,
i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
CREATE INDEX btg_x_y_idx ON btg(x,y);
ANALYZE btg;
-- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@
-- Engage incremental sort
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+------------------------------
Group
Group Key: x, y, z, w
- -> Incremental Sort
+ -> Sort
Sort Key: x, y, z, w
- Presorted Key: x, y
- -> Index Scan using btg_x_y_idx on btg
-(6 rows)
+ -> Seq Scan on btg
+(5 rows)
... and so on.
So, my proposal is simple. I think we need not just "ANALYZE btg", but
"VACUUM ANALYZE btg", to get rid of zeroed pages in this particular
case. PFA corresponding patch.
--
Best regards,
Maxim Orlov.
Attachment
On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Richard!
> What do you think about the revisions for the test cases?
I've rebased your patch upthread. Did some minor beautifications.
> * The table 'btg' is inserted with 10000 tuples, which seems a bit
> expensive for a test. I don't think we need such a big table to test
> what we want.
Your patch reduces the number of rows to 1000 tuples. I found it
possible to further reduce it to 100 tuples. That also allowed me to
save the plan in the test case introduced by e1b7fde418.
Please check if you're OK with the patch attached.
I looked through the v2 patch and have two comments.
* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan. So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?
* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation". This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath. So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.
See the attached v3 patch. I also made cosmetic tweaks to the comments,
and simplified a test case query.
Thanks
Richard
* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan. So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?
* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation". This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath. So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.
See the attached v3 patch. I also made cosmetic tweaks to the comments,
and simplified a test case query.
Thanks
Richard
Attachment
On 22/2/2024 09:09, Richard Guo wrote: > > On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com > <mailto:aekorotkov@gmail.com>> wrote: > > Hi, Richard! > > > What do you think about the revisions for the test cases? > > I've rebased your patch upthread. Did some minor beautifications. > > > * The table 'btg' is inserted with 10000 tuples, which seems a bit > > expensive for a test. I don't think we need such a big table to test > > what we want. > > Your patch reduces the number of rows to 1000 tuples. I found it > possible to further reduce it to 100 tuples. That also allowed me to > save the plan in the test case introduced by e1b7fde418. > > Please check if you're OK with the patch attached. > > > I looked through the v2 patch and have two comments. > > * The test case under "Check we don't pick aggregate path key instead of > grouping path key" does not have EXPLAIN to show the plan. So how can > we ensure we do not mistakenly select the aggregate pathkeys instead of > the grouping pathkeys? I confirm it works correctly. I am not sure about the stability of the zeros number in the output on different platforms here: avg -------------------- 4.0000000000000000 5.0000000000000000 It was why I'd used the format() function before. So, may we elaborate on the test and restrict the output? > > * I don't think the test case introduced by e1b7fde418 is still needed, > because we already have one under "Utilize the ordering of merge join to > avoid a full Sort operation". This kind of test case is just to ensure > that we are able to utilize the ordering of the subplans underneath. So > it should be parallel to the test cases for utilize the ordering of > index scan and subquery scan. I confirm, this version also checks ec_sortref initialization in the case when ec are contructed from WHERE clause. Generally, I like more one test for one issue instead of one test for all at once. But it works and I don't have any reason to dispute it. Also, I'm unsure about removing the disabling of the max_parallel_workers_per_gather parameter. Have you discovered the domination of the current plan over the partial one? Do the cost fluctuations across platforms not trigger a parallel plan? What's more, I suggest to address here the complaint from [1]. As I see, cost difference between Sort and IncrementalSort strategies in that case is around 0.5. To make the test more stable I propose to change it a bit and add a limit: SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10; It makes efficacy of IncrementalSort more obvious difference around 10 cost points. [1] https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rBKXEoTwgzMcLaD5MPhfW0Lg@mail.gmail.com -- regards, Andrei Lepikhov Postgres Professional
On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 22/2/2024 09:09, Richard Guo wrote:
> I looked through the v2 patch and have two comments.
>
> * The test case under "Check we don't pick aggregate path key instead of
> grouping path key" does not have EXPLAIN to show the plan. So how can
> we ensure we do not mistakenly select the aggregate pathkeys instead of
> the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the
zeros number in the output on different platforms here:
avg
--------------------
4.0000000000000000
5.0000000000000000
It was why I'd used the format() function before. So, may we elaborate
on the test and restrict the output?
The avg() function on integer argument is commonly used in
aggregates.sql. I don't think this is an issue. See the first test
query in aggregates.sql.
SELECT avg(four) AS avg_1 FROM onek;
avg_1
--------------------
1.5000000000000000
(1 row)
aggregates.sql. I don't think this is an issue. See the first test
query in aggregates.sql.
SELECT avg(four) AS avg_1 FROM onek;
avg_1
--------------------
1.5000000000000000
(1 row)
> * I don't think the test case introduced by e1b7fde418 is still needed,
> because we already have one under "Utilize the ordering of merge join to
> avoid a full Sort operation". This kind of test case is just to ensure
> that we are able to utilize the ordering of the subplans underneath. So
> it should be parallel to the test cases for utilize the ordering of
> index scan and subquery scan.
Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?
The table used for testing contains only 100 tuples, which is the size
of only one page. I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
of only one page. I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
What's more, I suggest to address here the complaint from [1]. As I see,
cost difference between Sort and IncrementalSort strategies in that case
is around 0.5. To make the test more stable I propose to change it a bit
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.
I don't think that's necessary. With Incremental Sort the final cost
is:
GroupAggregate (cost=1.66..19.00 rows=100 width=25)
while with full Sort it is:
GroupAggregate (cost=16.96..19.46 rows=100 width=25)
With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost. Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
Thanks
Richard
is:
GroupAggregate (cost=1.66..19.00 rows=100 width=25)
while with full Sort it is:
GroupAggregate (cost=16.96..19.46 rows=100 width=25)
With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost. Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
Thanks
Richard
On 22/2/2024 13:35, Richard Guo wrote: > The avg() function on integer argument is commonly used in > aggregates.sql. I don't think this is an issue. See the first test > query in aggregates.sql. Make sense > > it should be parallel to the test cases for utilize the ordering of > > index scan and subquery scan. > > Also, I'm unsure about removing the disabling of the > max_parallel_workers_per_gather parameter. Have you discovered the > domination of the current plan over the partial one? Do the cost > fluctuations across platforms not trigger a parallel plan? > > > The table used for testing contains only 100 tuples, which is the size > of only one page. I don't believe it would trigger any parallel plans, > unless we manually change min_parallel_table_scan_size. I don't intend to argue it, but just for the information, I frequently reduce it to zero, allowing PostgreSQL to make a decision based on costs. It sometimes works much better, because one small table in multi join can disallow an effective parallel plan. > > What's more, I suggest to address here the complaint from [1]. As I > see, > cost difference between Sort and IncrementalSort strategies in that > case > is around 0.5. To make the test more stable I propose to change it a > bit > and add a limit: > SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10; > It makes efficacy of IncrementalSort more obvious difference around 10 > cost points. > > > I don't think that's necessary. With Incremental Sort the final cost > is: > > GroupAggregate (cost=1.66..19.00 rows=100 width=25) > > while with full Sort it is: > > GroupAggregate (cost=16.96..19.46 rows=100 width=25) > > With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path > is cheaper on total cost. Not to say that even if somehow we decide the > two paths are fuzzily the same on total cost, the first path still > dominates because its startup cost is much cheaper. As before, I won't protest here - it needs some computations about how much cost can be added by bulk extension of the relation blocks. If Maxim will answer that it's enough to resolve his issue, why not? -- regards, Andrei Lepikhov Postgres Professional
I'm late to the party, and I apologize for not having paid attention to this thread for months ... but I am wondering why 0452b461b got committed. It seems to add a substantial amount of complexity and planner cycles in return for not much. The reason why I'm dubious about it can be found in this comment that the patch deleted: - * In principle it might be interesting to consider other orderings of the - * GROUP BY elements, which could match the sort ordering of other - * possible plans (eg an indexscan) and thereby reduce cost. We don't - * bother with that, though. Hashed grouping will frequently win anyway. Indeed, if you remove the SET enable_hashagg = off; SET enable_seqscan = off; lines added to aggregates.sql, you find that every single one of the test cases added there changes plan, except for the one case that is there to prove that the planner *doesn't* pick this type of plan. That shows that the planner doesn't believe any of the plans developed here are cheaper than other alternatives it would normally consider (usually HashAgg). So it sure seems like we're spending a lot of effort on uninteresting plans. Maybe this is an artifact of testing against toy tables and a useful win could be shown on bigger tables, but I don't see any indication in the thread that anyone actually demonstrated that. I might hold still for this patch anyway if the patch were simpler and more obviously correct, but there are scary bits in it: * The very first hunk causes get_eclass_for_sort_expr to have side-effects on existing EquivalenceClass data structures. What is the argument that that's not going to cause problems? At the very least it introduces asymmetry, in that the first caller wins the chance to change the EC's ec_sortref and later callers won't be able to. * I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which I notice has already had one band-aid added to it since commit). In particular, it seems to believe that the pathkeys and clauses lists match one-for-one, but I seriously doubt that that invariant remains guaranteed after the cleanup steps /* append the remaining group pathkeys (will be treated as not sorted) */ *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys, *group_pathkeys); *group_clauses = list_concat_unique_ptr(new_group_clauses, *group_clauses); For that to be reliable, the SortGroupClauses added to new_group_clauses in the main loop have to be exactly those that are associated with the same pathkeys in the old lists. I doubt that that's necessarily true in the presence of redundant grouping clauses. (Maybe we can't get here with any redundant grouping clauses, but still, we don't really guarantee uniqueness of SortGroupClauses, much less that they are never copied which is what you need if you want to believe that pointer equality is sufficient for de-duping here. PathKeys are explicitly made to be safe to compare pointer-wise, but I know of no such guarantee for SortGroupClauses.) * It seems like root->processed_groupClause no longer has much to do with the grouping behavior, which is scary given how much code still believes that it does. I suspect there are bugs lurking there, and am not comforted by the fact that the patch changed exactly nothing in the pathnodes.h documentation of that field. This comment looks pretty silly now too: /* Preprocess regular GROUP BY clause, if any */ root->processed_groupClause = list_copy(parse->groupClause); What "preprocessing" is going on there? This comment was adequate before, when the code line invoked preprocess_groupclause which had a bunch of commentary; but now it has to stand on its own and it's not doing a great job of that. * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node type name, and the comments provided for it are not going to educate anybody. What is the "association" between the pathkeys and clauses? I guess the clauses are supposed to be SortGroupClauses, but not all pathkeys match up to SortGroupClauses, so what then? This is very underspecified, and fuzzy thinking about data structures tends to lead to bugs. So I'm quite afraid that there are still bugs lurking here. What's more, given that the plans this patch makes apparently seldom win when you don't put a thumb on the scales, it might take a very long time to isolate those bugs. If the patch produces a faulty plan (e.g. not sorted properly) we'd never notice if that plan isn't chosen, and even if it is chosen it probably wouldn't produce anything as obviously wrong as a crash. If this patch were producing better results I'd be more excited about putting more work into it. But on the basis of what I'm seeing right now, I think maybe we ought to give up on it. regards, tom lane
On 4/12/24 06:44, Tom Lane wrote: > If this patch were producing better results I'd be more excited > about putting more work into it. But on the basis of what I'm > seeing right now, I think maybe we ought to give up on it. First, thanks for the deep review - sometimes, only a commit gives us a chance to get such observation :))). On a broader note, introducing automatic group-by-order choosing is a step towards training the optimiser to handle poorly tuned incoming queries. While it's true that this may initially impact performance, it's crucial to weigh the potential benefits. So, beforehand, we should agree: Is it worth it? If yes, I would say I see how often hashing doesn't work in grouping. Sometimes because of estimation errors, sometimes because grouping already has sorted input, sometimes in analytical queries when planner doesn't have enough memory for hashing. In analytical cases, the only way to speed up queries sometimes is to be smart with features like IncrementalSort and this one. About low efficiency. Remember the previous version of the GROUP-BY optimisation - we disagreed on operator costs and the cost model in general. In the current version, we went the opposite - adding small features step-by-step. The current commit contains an integral part of the feature and is designed for safely testing the approach and adding more profitable parts like choosing group-by-order according to distinct values or unique indexes on grouping columns. I have passed through the code being steered by the issues explained in detail. I see seven issues. Two of them definitely should be scrutinised right now, and I'm ready to do that. -- regards, Andrei Lepikhov Postgres Professional
On 4/12/24 06:44, Tom Lane wrote: > * The very first hunk causes get_eclass_for_sort_expr to have > side-effects on existing EquivalenceClass data structures. > What is the argument that that's not going to cause problems? > At the very least it introduces asymmetry, in that the first > caller wins the chance to change the EC's ec_sortref and later > callers won't be able to. Let me resolve issues bit by bit. Addressing your first note, 'asymmetry,' I discovered this part of the code. Before the 8d83a5d, it could cause some artefacts, but since then, a kind of asymmetry has been introduced into the GROUP-BY clause list. As you can see in the attached patch, GROUP-BY reordering works even when the asymmetry of the 8d83a5d chooses different keys. At the same time, I agree that implicitly setting anything in an exporting function, which should just look up for EC, is a questionable coding style. To avoid possible usage issues (in extensions, for example), I propose setting it up afterwards, explicitly forcing this action by input parameter - see my attempt in the attachment. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On 4/12/24 06:44, Tom Lane wrote: > * I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which > I notice has already had one band-aid added to it since commit). > In particular, it seems to believe that the pathkeys and clauses > lists match one-for-one, but I seriously doubt that that invariant > remains guaranteed after the cleanup steps > > /* append the remaining group pathkeys (will be treated as not sorted) */ > *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys, > *group_pathkeys); > *group_clauses = list_concat_unique_ptr(new_group_clauses, > *group_clauses); > > For that to be reliable, the SortGroupClauses added to > new_group_clauses in the main loop have to be exactly those > that are associated with the same pathkeys in the old lists. > I doubt that that's necessarily true in the presence of redundant > grouping clauses. (Maybe we can't get here with any redundant > grouping clauses, but still, we don't really guarantee uniqueness of > SortGroupClauses, much less that they are never copied which is what > you need if you want to believe that pointer equality is sufficient > for de-duping here. PathKeys are explicitly made to be safe to compare > pointer-wise, but I know of no such guarantee for SortGroupClauses.) I spent a lot of time inventing situations with SortGroupClause duplicates. Unfortunately, it looks impossible so far. But because we really don't guarantee uniqueness, I changed the code to survive in this case. Also, I added assertion checking to be sure we don't have logical mistakes here - see attachment. About the band-aid mentioned above - as I see, 4169850 introduces the same trick in planner.c. So, it looks like result of design of the current code. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On 4/12/24 06:44, Tom Lane wrote: > * It seems like root->processed_groupClause no longer has much to do > with the grouping behavior, which is scary given how much code still > believes that it does. I suspect there are bugs lurking there, and 1. Analysing the code, processed_groupClause is used in: - adjust_foreign_grouping_path_cost - to decide, do we need to add cost of sort to the foreign grouping. - just for replacing relids during SJE process. - create_groupingsets_plan(), preprocess_grouping_sets, remove_useless_groupby_columns - we don't apply this feature in the case of grouping sets. - get_number_of_groups - estimation shouldn't depend on the order of clauses. - create_grouping_paths - to analyse hashability of grouping clause - make_group_input_target, make_partial_grouping_target - doesn't depend on the order of clauses planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in the case of hash grouping. So, the only case we can impact current behavior lies in the postgres_fdw. But here we don't really know which plan will be chosen during planning on foreign node and stay the same behavior is legal for me. > am not comforted by the fact that the patch changed exactly nothing > in the pathnodes.h documentation of that field. This comment looks > pretty silly now too: > > /* Preprocess regular GROUP BY clause, if any */ > root->processed_groupClause = list_copy(parse->groupClause); > > What "preprocessing" is going on there? This comment was adequate > before, when the code line invoked preprocess_groupclause which had > a bunch of commentary; but now it has to stand on its own and it's > not doing a great job of that. Thanks for picking this place. I fixed it. More interesting here is that we move decisions on the order of group-by clauses to the stage, where we already have underlying subtree and incoming path keys. In principle, we could return the preprocessing routine and arrange GROUP-BY with the ORDER-BY clause as it was before. But looking into the code, I found only one reason to do this: adjust_group_pathkeys_for_groupagg. Curious about how much profit we get here, I think we can discover it later with no hurry. A good outcome here will be a test case that can show the importance of arranging GROUP-BY and ORDER-BY at an early stage. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On 4/12/24 06:44, Tom Lane wrote: > * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node > type name, and the comments provided for it are not going to educate > anybody. What is the "association" between the pathkeys and clauses? > I guess the clauses are supposed to be SortGroupClauses, but not all > pathkeys match up to SortGroupClauses, so what then? This is very > underspecified, and fuzzy thinking about data structures tends to lead > to bugs. I'm not the best in English documentation and naming style. So, feel free to check my version. > > So I'm quite afraid that there are still bugs lurking here. > What's more, given that the plans this patch makes apparently > seldom win when you don't put a thumb on the scales, it might > take a very long time to isolate those bugs. If the patch > produces a faulty plan (e.g. not sorted properly) we'd never > notice if that plan isn't chosen, and even if it is chosen > it probably wouldn't produce anything as obviously wrong as > a crash. I added checkings on the proper order of pathkeys and clauses. If you really care about that, we should spend additional time and rewrite the code to generate an order of clauses somewhere in the plan creation phase. For example, during the create_group_plan call, we could use the processed_groupClause list, cycle through subpath->pathkeys, set the order, and detect whether the pathkeys list corresponds to the group-by or is enough to build a grouping plan. Anyway, make this part of code more resistant to blunders is another story. -- regards, Andrei Lepikhov Postgres Professional
Attachment
Hi, Andrei! On Thu, Apr 18, 2024 at 11:54 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 4/12/24 06:44, Tom Lane wrote: > > * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node > > type name, and the comments provided for it are not going to educate > > anybody. What is the "association" between the pathkeys and clauses? > > I guess the clauses are supposed to be SortGroupClauses, but not all > > pathkeys match up to SortGroupClauses, so what then? This is very > > underspecified, and fuzzy thinking about data structures tends to lead > > to bugs. > I'm not the best in English documentation and naming style. So, feel > free to check my version. > > > > So I'm quite afraid that there are still bugs lurking here. > > What's more, given that the plans this patch makes apparently > > seldom win when you don't put a thumb on the scales, it might > > take a very long time to isolate those bugs. If the patch > > produces a faulty plan (e.g. not sorted properly) we'd never > > notice if that plan isn't chosen, and even if it is chosen > > it probably wouldn't produce anything as obviously wrong as > > a crash. > I added checkings on the proper order of pathkeys and clauses. > If you really care about that, we should spend additional time and > rewrite the code to generate an order of clauses somewhere in the plan > creation phase. For example, during the create_group_plan call, we could > use the processed_groupClause list, cycle through subpath->pathkeys, set > the order, and detect whether the pathkeys list corresponds to the > group-by or is enough to build a grouping plan. > Anyway, make this part of code more resistant to blunders is another story. Thank you for the fixes you've proposed. I didn't look much into details yet, but I think the main concern Tom expressed in [1] is whether the feature is reasonable at all. I think at this stage the most important thing is to come up with convincing examples showing how huge performance benefits it could cause. I will return to this later today and will try to provide some convincing examples. Links 1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us ------ Regards, Alexander Korotkov
On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Thank you for the fixes you've proposed. I didn't look much into > details yet, but I think the main concern Tom expressed in [1] is > whether the feature is reasonable at all. I think at this stage the > most important thing is to come up with convincing examples showing > how huge performance benefits it could cause. I will return to this > later today and will try to provide some convincing examples. > hi. I found a case where it improved performance. +-- GROUP BY optimization by reorder columns +CREATE TABLE btg AS SELECT + i % 100 AS x, + i % 100 AS y, + 'abc' || i % 10 AS z, + i AS w +FROM generate_series(1,10000) AS i; +CREATE INDEX abc ON btg(x,y); +ANALYZE btg; + I change +FROM generate_series(1,10000) AS i; to + FROM generate_series(1, 1e6) AS i; Then I found out about these 2 queries performance improved a lot. A: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY y, x \watch i=0.1 c=10 B: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, z, y ORDER BY y, x, z, w \watch i=0.1 c=10 set (enable_seqscan,enable_hashagg) from on to off: queryA execution time from 1533.013 ms to 533.430 ms queryB execution time from 1996.817 ms to 497.020 ms
On 4/12/24 06:44, Tom Lane wrote: > If this patch were producing better results I'd be more excited > about putting more work into it. But on the basis of what I'm > seeing right now, I think maybe we ought to give up on it. Let me show current cases where users have a profit with this tiny improvement (see queries and execution results in query.sql): 1. 'Not optimised query text' — when we didn't consider group-by ordering during database development. 2. 'Accidental pathkeys' - we didn't see any explicit orderings, but accidentally, the planner used merge join that caused some orderings and we can utilise it. 3. 'Uncertain scan path' — We have options regarding which index to use, and because of that, we can't predict the optimal group-by ordering before the start of query planning. 4. 'HashAgg V/S GroupAgg' — sometimes, the GroupAgg strategy outperforms HashAgg just because we don't need any ordering procedure at all. And the last thing here — this code introduces the basics needed to add more sophisticated strategies, like ordering according to uniqueness or preferring to set constant-width columns first in grouping. -- regards, Andrei Lepikhov Postgres Professional
Attachment
On Fri, Apr 19, 2024 at 6:44 PM jian he <jian.universality@gmail.com> wrote: > > On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > Thank you for the fixes you've proposed. I didn't look much into > > details yet, but I think the main concern Tom expressed in [1] is > > whether the feature is reasonable at all. I think at this stage the > > most important thing is to come up with convincing examples showing > > how huge performance benefits it could cause. I will return to this > > later today and will try to provide some convincing examples. > > hi. previously preprocess_groupclause will not process cases where no ORDER BY clause is specified. commit 0452b461b will reorder the GROUP BY element even though no ORDER BY clause is specified , if there are associated indexes on it. (hope I understand it correctly). for example (when enable_hashagg is false) explain(verbose) select count(*) FROM btg GROUP BY y,x; in pg16 will not reorder, it will be as is: `GROUP BY y,x` after commit 0452b461b, it will reorder to `GROUP BY x,y`. because there is an index `btree (x, y)` (only one) associated with it. if you drop the index `btree (x, y)` , it will be `GROUP BY y,x` as pg16. This reordering GROUP BY element when no ORDER BY clause is not specified is performant useful when the work_mem is small. I've attached some tests comparing master with REL_16_STABLE to demonstrate that. all the tests attached are under the condition: work_mem='64kB', buildtype=release, max_parallel_workers_per_gather=0. one example: CREATE TABLE btg5 AS SELECT i::numeric % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i % 100000 AS w FROM generate_series(1, 1e6) AS i; CREATE INDEX btg5_x_y_idx ON btg5(x, y); explain(analyze) SELECT count(*) FROM btg5 GROUP BY z, y, w, x; in pg17, the execution time is: 746.574 ms in pg16, the execution time is: 1693.483 ms if I reorder it manually as: `explain(analyze) SELECT count(*) FROM btg5 GROUP BY x, y, w, z;` then in pg16, the execution time is 630.394 ms
Attachment
Hi! On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Thank you for the fixes you've proposed. I didn't look much into > details yet, but I think the main concern Tom expressed in [1] is > whether the feature is reasonable at all. I think at this stage the > most important thing is to come up with convincing examples showing > how huge performance benefits it could cause. I will return to this > later today and will try to provide some convincing examples. I took a fresh look at 0452b461b, and have the following thoughts: 1) Previously, preprocess_groupclause() tries to reorder GROUP BY clauses to match the required ORDER BY order. It only reorders if GROUP BY pathkeys are the prefix of ORDER BY pathkeys or vice versa. So, both of them need to be satisfied by one ordering. 0452b461b also tries to match GROUP BY clauses to ORDER BY clauses, but takes into account an incremental sort. Actually, instead of removing preprocess_groupclause(), we could just teach it to take incremental sort into account. 2) The get_useful_group_keys_orderings() function takes into account 3 orderings of pathkeys and clauses: original order as written in GROUP BY, matching ORDER BY clauses as much as possible, and matching the input path as much as possible. Given that even before 0452b461b, preprocess_groupclause() could change the original order of GROUP BY clauses, so we don't need to distinguish the first two. We could just consider output of new preprocess_groupclause() taking into account an incremental sort and the ordering matching the input path as much as possible. This seems like significant simplification. Let me also describe my thoughts about the justification of the feature itself. As Tom pointed upthread, Sort + Grouping is generally unlikely faster than Hash Aggregate. The main point of this feature is being able to match the order of GROUP BY clauses to the order of the input path. That input path could be Index Scan or Subquery. Let's concentrate on Index Scan. Index Scan can give us the required order, so we can do grouping without Sort or with significantly cheaper Incremental Sort. That could be much faster than Hash Aggregate. But if we scan the full table (or its significant fraction), Index Scan is typically much more expensive than Sequential Scan because of random page access. However, there are cases when Index Scan could be cheaper. 1) If the heap row is wide and the index contains all the required columns, Index Only Scan can be cheaper than Sequential Scan because of lesser volume. 2) If the query predicate is selective enough and matches the index, Index Scan might be significantly cheaper. One may say that if the query predicate is selective enough then there are not that many matching rows, so aggregating them either way isn't a big deal anyway. However, given that data volumes are growing tremendously, it's not hard to imagine that the index selected a small fraction of table rows, but they are still enough to be costly for aggregating. Therefore, I think there are significant cases where matching GROUP BY clauses to the order of the input path could give a substantial improvement over Hash Aggregate. While there are some particular use-cases by Jian He, I hope that above could give some rationale. ------ Regards, Alexander Korotkov
hi. I found an interesting case. CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; the above part will use: -> Incremental Sort Sort Key: x, $, $, $ Presorted Key: x -> Index Scan using t1_x_y_idx on t1 EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z; these will use: -> Incremental Sort Sort Key: x, y, $, $ Presorted Key: x, y -> Index Scan using t1_x_y_idx on t1 I guess this is fine, but not optimal?
hi one more question (maybe a dumb one....) drop table if exists t1; CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY z,x,y,w order by w; QUERY PLAN ------------------------------------------------------ GroupAggregate Output: count(*), w, z, x, y Group Key: t1.w, t1.x, t1.y, t1.z -> Sort Output: w, z, x, y Sort Key: t1.w, t1.x, t1.y, t1.z -> Index Scan using t1_x_y_idx on public.t1 Output: w, z, x, y (8 rows) if you do ` Sort Key: t1.w, t1.x, t1.y, t1.z` then the output is supposed to be: Output: w, x, y, z ?
On Wed, Apr 24, 2024 at 2:25 PM jian he <jian.universality@gmail.com> wrote: > > hi. > I found an interesting case. > > CREATE TABLE t1 AS > SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > z, i::int4 AS w > FROM generate_series(1, 100) AS i; > CREATE INDEX t1_x_y_idx ON t1 (x, y); > ANALYZE t1; > SET enable_hashagg = off; > SET enable_seqscan = off; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > the above part will use: > -> Incremental Sort > Sort Key: x, $, $, $ > Presorted Key: x > -> Index Scan using t1_x_y_idx on t1 We can make these cases also `Presorted Key: x, y`. in `if (path->pathkeys && !pathkeys_contained_in(path->pathkeys, root->group_pathkeys))` branch we can simple do - infos = lappend(infos, info); + infos = lcons(info, infos); similar to what we did at plancat.c (search lcons). get_useful_group_keys_orderings returns a list of PathKeyInfo, then the caller function just iterates each element. so for the caller, order of the returned list element from get_useful_group_keys_orderings does not matter. for path Incremental Sort: function make_ordered_path will return the same cost for different numbers of presorted keys. for example: EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; make_ordered_path cost is same for: `info->pathkeys: x,y,z,w` `info->pathkeys:x,z,y,w` if we arrange `info->pathkeys: x,y,z,w` before `info->pathkeys:x,z,y,w` in get_useful_group_keys_orderings. then with the same cost, we will choose the first one (`info->pathkeys: x,y,z,w`), if we use IncrementalSort, then we use `Presorted Key: x, y`.
On 24.04.2024 13:25, jian he wrote: > hi. > I found an interesting case. > > CREATE TABLE t1 AS > SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > z, i::int4 AS w > FROM generate_series(1, 100) AS i; > CREATE INDEX t1_x_y_idx ON t1 (x, y); > ANALYZE t1; > SET enable_hashagg = off; > SET enable_seqscan = off; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > the above part will use: > -> Incremental Sort > Sort Key: x, $, $, $ > Presorted Key: x > -> Index Scan using t1_x_y_idx on t1 > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w; > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z; > > these will use: > -> Incremental Sort > Sort Key: x, y, $, $ > Presorted Key: x, y > -> Index Scan using t1_x_y_idx on t1 > > I guess this is fine, but not optimal? It looks like a bug right now - in current implementation we don't differentiate different orders. So: 1. Applying all the patches from the thread which I proposed as an answer to T.Lane last rebuke - does behavior still the same?. 2. Could you try to find the reason? -- regards, Andrei Lepikhov Postgres Professional
On Thu, May 16, 2024 at 3:47 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > On 24.04.2024 13:25, jian he wrote: > > hi. > > I found an interesting case. > > > > CREATE TABLE t1 AS > > SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > > z, i::int4 AS w > > FROM generate_series(1, 100) AS i; > > CREATE INDEX t1_x_y_idx ON t1 (x, y); > > ANALYZE t1; > > SET enable_hashagg = off; > > SET enable_seqscan = off; > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > > the above part will use: > > -> Incremental Sort > > Sort Key: x, $, $, $ > > Presorted Key: x > > -> Index Scan using t1_x_y_idx on t1 > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z; > > > > these will use: > > -> Incremental Sort > > Sort Key: x, y, $, $ > > Presorted Key: x, y > > -> Index Scan using t1_x_y_idx on t1 > > > > I guess this is fine, but not optimal? > It looks like a bug right now - in current implementation we don't > differentiate different orders. So: > 1. Applying all the patches from the thread which I proposed as an > answer to T.Lane last rebuke - does behavior still the same?. I've applied these 4 patches in this thread. final_improvements.patch minor_comment.patch get_useful_group_keys_orderings.patch group_by_asymmetry.diff The behavior is still the same as the master. meaning that below quoted queries are still using "Presorted Key: x". > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > > the above part will use: > > -> Incremental Sort > > Sort Key: x, $, $, $ > > Presorted Key: x > > -> Index Scan using t1_x_y_idx on t1 > 2. Could you try to find the reason? the following are my understanding, it may be wrong... function get_useful_group_keys_orderings, may generate alternative pathkeys and group clauses. for different PathKeyInfo sub functions within add_paths_to_grouping_rel: make_ordered_path, create_agg_path will yield the same cost. function add_path (within add_paths_to_grouping_rel) will add these paths (different PathKeyInfos) in a *sequential* order. then later in the function set_cheapest. `foreach(p, parent_rel->pathlist)` will iterate these different paths in a sequential order. so in set_cheapest if 2 path costs are the same, then the first path residing in parent_rel->pathlist wins. fo a concrete example: CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::float4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY x,z,y,w; add_paths_to_grouping_rel will do the following in a sequential order. A. generate and add original PathKeyInfo(groupby x,z,y,w) B. generate and add alternative PathKeyInfo(groupby x,y,z,w). (because of the index path) C. can_hash is true, generate a HashAgg Path (because of enable_hashagg is set to off, so this path the cost is very high) As mentioned previously, both A and B can use Incremental Sort, set_cheapest will choose the A instead of B, because A step generated path **first** satisfies increment sort.
On 20/5/2024 15:54, jian he wrote: > As mentioned previously, > both A and B can use Incremental Sort, set_cheapest will choose the A > instead of B, > because A step generated path **first** satisfies increment sort. Yeah, I agree with your analysis. Looking into the cost_incremental_sort, I see that we have ten group_tuples. This value doesn't depend on how many presorted keys (1 or 2) we use. This is caused by default estimation. Given the current circumstances, it seems unlikely that we can make any significant changes without introducing a new sort cost model that accounts for the number of sorted columns. -- regards, Andrei Lepikhov Postgres Professional
On Mon, May 20, 2024 at 4:54 PM jian he <jian.universality@gmail.com> wrote: > > > The behavior is still the same as the master. > meaning that below quoted queries are still using "Presorted Key: x". > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > > > the above part will use: > > > -> Incremental Sort > > > Sort Key: x, $, $, $ > > > Presorted Key: x > > > -> Index Scan using t1_x_y_idx on t1 > > > > 2. Could you try to find the reason? > the following are my understanding, it may be wrong... > I think my previous thread only explained that two paths have the same cost, the planner chooses the one that was first added into the pathlist. but didn't explain why Incremental Sort path, presorted by one key and presorted by two keys yield the same cost. -------------------------------------------- if (rel->tuples > 0) { /* * Clamp to size of rel, or size of rel / 10 if multiple Vars. The * fudge factor is because the Vars are probably correlated but we * don't know by how much. We should never clamp to less than the * largest ndistinct value for any of the Vars, though, since * there will surely be at least that many groups. */ double clamp = rel->tuples; if (relvarcount > 1) { clamp *= 0.1; if (clamp < relmaxndistinct) { clamp = relmaxndistinct; /* for sanity in case some ndistinct is too large: */ if (clamp > rel->tuples) clamp = rel->tuples; } } if (reldistinct > clamp) reldistinct = clamp; ... } i think, the above code[0] snippet within function estimate_num_groups makes Incremental Sort Path not pickup the optimal presorted keys. see original problem thread: [1] ---------------------------------------------------------------------------------------- CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; will generate 2 Incremental Sort path, one: "Presorted Key: x", another one: "Presorted Key: x,y". The first Incremental Sort path is added first. The function estimate_num_groups returned value (input_groups) is the main key to calculate the cost of Incremental Sort path! But here, the estimate_num_groups function returned the same value: 10 for "Presorted Key: x" and "Presorted Key: x,y". then later cost_incremental_sort returns the same cost for "Presorted Key: x" and "Presorted Key: x,y". (line refers to src/backend/utils/adt/selfuncs line number). why "Presorted Key: x,y" return 10 in estimate_num_groups: line 3667 assign clamp to 100.0, because rel->tuples is 100. line 3671 clamp *= 0.1; make clamp because 10. line 3680, 3681 `if (reldistinct > clamp) branch` make reldistinct from 100 to 10. line 3733 make `numdistinct *= reldistinct;` makes numdistinct because 10. If I change the total number of rows in a relation, or change the distinct number of y values then the planner will use "Incremental Sort Presorted Key: x,y". for example: CREATE TABLE t10 AS SELECT (i % 10)::numeric AS x,(i % 11)::int8 AS y,'abc' || i % 10 AS z, i::float4 AS w FROM generate_series(1, 1E2) AS i; CREATE INDEX t10_x_y_idx ON t10 (x, y); ANALYZE t10; The above setup will make the following query using "Incremental Sort Presorted Key: x,y". EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,y,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,y,z; EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,w,y; EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,z,y; summary: * the regression test setup at [2] can make some cases not pickup the best optimal Incremental Sort path. we use this test in src/test/regress/sql/aggregates.sql -- Engage incremental sort EXPLAIN (COSTS OFF) SELECT count(*) FROM btg GROUP BY z, y, w, x; but if we use: EXPLAIN (COSTS OFF) SELECT count(*) FROM btg GROUP BY x, z, w, y; then the plan is not the best. * The regression test setup makes estimate_num_groups code logic return the same value. therefore make Incremental Sort presort by one key, two keys yield the same cost. the cost_incremental_sort will return the same cost. * https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/aggregates.sql#n1194 maybe we can change: CREATE TABLE btg AS SELECT i % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i AS w FROM generate_series(1, 100) AS i; to CREATE TABLE btg AS SELECT i % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i AS w FROM generate_series(1, 1000) AS i; [0] https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/utils/adt/selfuncs.c#n3669 [1] https://www.postgresql.org/message-id/CACJufxGt99nZ%2Bnir%2BaB6pFQ%3DK8oNiHAQ3OELqSbGMqNxok8nxA%40mail.gmail.com [2] https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/aggregates.sql#n1194
Hi! On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Thank you for the fixes you've proposed. I didn't look much into > > details yet, but I think the main concern Tom expressed in [1] is > > whether the feature is reasonable at all. I think at this stage the > > most important thing is to come up with convincing examples showing > > how huge performance benefits it could cause. I will return to this > > later today and will try to provide some convincing examples. > > I took a fresh look at 0452b461b, and have the following thoughts: > 1) Previously, preprocess_groupclause() tries to reorder GROUP BY > clauses to match the required ORDER BY order. It only reorders if > GROUP BY pathkeys are the prefix of ORDER BY pathkeys or vice versa. > So, both of them need to be satisfied by one ordering. 0452b461b also > tries to match GROUP BY clauses to ORDER BY clauses, but takes into > account an incremental sort. Actually, instead of removing > preprocess_groupclause(), we could just teach it to take incremental > sort into account. > 2) The get_useful_group_keys_orderings() function takes into account 3 > orderings of pathkeys and clauses: original order as written in GROUP > BY, matching ORDER BY clauses as much as possible, and matching the > input path as much as possible. Given that even before 0452b461b, > preprocess_groupclause() could change the original order of GROUP BY > clauses, so we don't need to distinguish the first two. We could just > consider output of new preprocess_groupclause() taking into account an > incremental sort and the ordering matching the input path as much as > possible. This seems like significant simplification. > > Let me also describe my thoughts about the justification of the > feature itself. As Tom pointed upthread, Sort + Grouping is generally > unlikely faster than Hash Aggregate. The main point of this feature > is being able to match the order of GROUP BY clauses to the order of > the input path. That input path could be Index Scan or Subquery. > Let's concentrate on Index Scan. Index Scan can give us the required > order, so we can do grouping without Sort or with significantly > cheaper Incremental Sort. That could be much faster than Hash > Aggregate. But if we scan the full table (or its significant > fraction), Index Scan is typically much more expensive than Sequential > Scan because of random page access. However, there are cases when > Index Scan could be cheaper. > 1) If the heap row is wide and the index contains all the required > columns, Index Only Scan can be cheaper than Sequential Scan because > of lesser volume. > 2) If the query predicate is selective enough and matches the index, > Index Scan might be significantly cheaper. One may say that if the > query predicate is selective enough then there are not that many > matching rows, so aggregating them either way isn't a big deal anyway. > However, given that data volumes are growing tremendously, it's not > hard to imagine that the index selected a small fraction of table > rows, but they are still enough to be costly for aggregating. > Therefore, I think there are significant cases where matching GROUP BY > clauses to the order of the input path could give a substantial > improvement over Hash Aggregate. > > While there are some particular use-cases by Jian He, I hope that > above could give some rationale. I've assembled patches in this thread into one patchset. 0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref by Andrei [1]. I've revised comments and wrote the commit message. 0002 The patch for handling duplicates of SortGroupClause. I didn't get the sense of Andrei implementation. It seems to care about duplicate pointers in group clauses list. But the question is the equal SortGroupClause's comprising different pointers. I think we should group duplicate SortGroupClause's together as preprocess_groupclause() used to do. Reimplemented patch to do so. 0003 Rename PathKeyInfo to GroupByOrdering by Andres [3]. I only revised comments and wrote the commit message. 0004 Turn back preprocess_groupclause() for the reason I described upthread [4]. Any thoughts? Links. 1. https://www.postgresql.org/message-id/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru 2. https://www.postgresql.org/message-id/a663f0f6-cbf6-49aa-af2e-234dc6768a07%40postgrespro.ru 3. https://www.postgresql.org/message-id/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru 4. https://www.postgresql.org/message-id/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
Attachment
On 5/27/24 19:41, Alexander Korotkov wrote: > On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> While there are some particular use-cases by Jian He, I hope that >> above could give some rationale. > > I've assembled patches in this thread into one patchset. > 0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref > by Andrei [1]. I've revised comments and wrote the commit message. > 0002 The patch for handling duplicates of SortGroupClause. I didn't > get the sense of Andrei implementation. It seems to care about > duplicate pointers in group clauses list. But the question is the > equal SortGroupClause's comprising different pointers. I think we > should group duplicate SortGroupClause's together as > preprocess_groupclause() used to do. Reimplemented patch to do so. > 0003 Rename PathKeyInfo to GroupByOrdering by Andres [3]. I only > revised comments and wrote the commit message. > 0004 Turn back preprocess_groupclause() for the reason I described upthread [4]. > > Any thoughts? About 0001: Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it. Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear to me. It may be better to write something like: 'building pathkeys by the list of grouping clauses'. 0002: The part under USE_ASSERT_CHECKING looks good to me. But the code in group_keys_reorder_by_pathkeys looks suspicious: of course, we do some doubtful work without any possible way to reproduce, but if we envision some duplicated elements in the group_clauses, we should avoid usage of the list_concat_unique_ptr. What's more, why do you not exit from foreach_ptr immediately after SortGroupClause has been found? I think the new_group_clauses should be consistent with the new_group_pathkeys. 0003: Looks good 0004: I was also thinking about reintroducing the preprocess_groupclause because with the re-arrangement of GROUP-BY clauses according to incoming pathkeys, it doesn't make sense to have a user-defined order—at least while cost_sort doesn't differ costs for alternative column orderings. So, I'm okay with the code. But why don't you use the same approach with foreach_ptr as before? -- regards, Andrei Lepikhov Postgres Professional
Hi, Andrei! Thank you for your feedback. On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 5/27/24 19:41, Alexander Korotkov wrote: > > On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > >> While there are some particular use-cases by Jian He, I hope that > >> above could give some rationale. > > > > I've assembled patches in this thread into one patchset. > > 0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref > > by Andrei [1]. I've revised comments and wrote the commit message. > > 0002 The patch for handling duplicates of SortGroupClause. I didn't > > get the sense of Andrei implementation. It seems to care about > > duplicate pointers in group clauses list. But the question is the > > equal SortGroupClause's comprising different pointers. I think we > > should group duplicate SortGroupClause's together as > > preprocess_groupclause() used to do. Reimplemented patch to do so. > > 0003 Rename PathKeyInfo to GroupByOrdering by Andres [3]. I only > > revised comments and wrote the commit message. > > 0004 Turn back preprocess_groupclause() for the reason I described upthread [4]. > > > > Any thoughts? > About 0001: > Having overviewed it, I don't see any issues (but I'm the author), > except grammatical ones - but I'm not a native to judge it. > Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear > to me. It may be better to write something like: 'building pathkeys by > the list of grouping clauses'. OK, thank you. I'll run once again for the grammar issues. > 0002: > The part under USE_ASSERT_CHECKING looks good to me. But the code in > group_keys_reorder_by_pathkeys looks suspicious: of course, we do some > doubtful work without any possible way to reproduce, but if we envision > some duplicated elements in the group_clauses, we should avoid usage of > the list_concat_unique_ptr. As I understand Tom, there is a risk that clauses list may contain multiple instances of equivalent SortGroupClause, not duplicate pointers. > What's more, why do you not exit from > foreach_ptr immediately after SortGroupClause has been found? I think > the new_group_clauses should be consistent with the new_group_pathkeys. I wanted this to be consistent with preprocess_groupclause(), where duplicate SortGroupClause'es are grouped together. Otherwise, we could just delete redundant SortGroupClause'es. > 0003: > Looks good > > 0004: > I was also thinking about reintroducing the preprocess_groupclause > because with the re-arrangement of GROUP-BY clauses according to > incoming pathkeys, it doesn't make sense to have a user-defined order—at > least while cost_sort doesn't differ costs for alternative column orderings. > So, I'm okay with the code. But why don't you use the same approach with > foreach_ptr as before? I restored the function as it was before 0452b461bc with minimal edits to support the incremental sort. I think it would be more valuable to keep the difference with pg16 code small rather than refactor to simplify existing code. ------ Regards, Alexander Korotkov
On 5/29/24 19:53, Alexander Korotkov wrote: > Hi, Andrei! > > Thank you for your feedback. > > On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> On 5/27/24 19:41, Alexander Korotkov wrote: >>> Any thoughts? >> About 0001: >> Having overviewed it, I don't see any issues (but I'm the author), >> except grammatical ones - but I'm not a native to judge it. >> Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear >> to me. It may be better to write something like: 'building pathkeys by >> the list of grouping clauses'. > > OK, thank you. I'll run once again for the grammar issues. > >> 0002: >> The part under USE_ASSERT_CHECKING looks good to me. But the code in >> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some >> doubtful work without any possible way to reproduce, but if we envision >> some duplicated elements in the group_clauses, we should avoid usage of >> the list_concat_unique_ptr. > > As I understand Tom, there is a risk that clauses list may contain > multiple instances of equivalent SortGroupClause, not duplicate > pointers. Maybe. I just implemented the worst-case scenario with the intention of maximum safety. So, it's up to you. > >> What's more, why do you not exit from >> foreach_ptr immediately after SortGroupClause has been found? I think >> the new_group_clauses should be consistent with the new_group_pathkeys. > > I wanted this to be consistent with preprocess_groupclause(), where > duplicate SortGroupClause'es are grouped together. Otherwise, we > could just delete redundant SortGroupClause'es. Hm, preprocess_groupclause is called before the standard_qp_callback forms the 'canonical form' of group_pathkeys and such behaviour needed. But the code that chooses the reordering strategy uses already processed grouping clauses, where we don't have duplicates in the first num_groupby_pathkeys elements, do we? >> 0004: >> I was also thinking about reintroducing the preprocess_groupclause >> because with the re-arrangement of GROUP-BY clauses according to >> incoming pathkeys, it doesn't make sense to have a user-defined order—at >> least while cost_sort doesn't differ costs for alternative column orderings. >> So, I'm okay with the code. But why don't you use the same approach with >> foreach_ptr as before? > > I restored the function as it was before 0452b461bc with minimal edits > to support the incremental sort. I think it would be more valuable to > keep the difference with pg16 code small rather than refactor to > simplify existing code. Got it -- regards, Andrei Lepikhov Postgres Professional
Hi! On Thu, May 30, 2024 at 7:22 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 5/29/24 19:53, Alexander Korotkov wrote: > > Thank you for your feedback. > > > > On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > >> On 5/27/24 19:41, Alexander Korotkov wrote: > >>> Any thoughts? > >> About 0001: > >> Having overviewed it, I don't see any issues (but I'm the author), > >> except grammatical ones - but I'm not a native to judge it. > >> Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear > >> to me. It may be better to write something like: 'building pathkeys by > >> the list of grouping clauses'. > > > > OK, thank you. I'll run once again for the grammar issues. I've revised some grammar including the sentence you've proposed. > >> 0002: > >> The part under USE_ASSERT_CHECKING looks good to me. But the code in > >> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some > >> doubtful work without any possible way to reproduce, but if we envision > >> some duplicated elements in the group_clauses, we should avoid usage of > >> the list_concat_unique_ptr. > > > > As I understand Tom, there is a risk that clauses list may contain > > multiple instances of equivalent SortGroupClause, not duplicate > > pointers. > Maybe. I just implemented the worst-case scenario with the intention of > maximum safety. So, it's up to you. > > > >> What's more, why do you not exit from > >> foreach_ptr immediately after SortGroupClause has been found? I think > >> the new_group_clauses should be consistent with the new_group_pathkeys. > > > > I wanted this to be consistent with preprocess_groupclause(), where > > duplicate SortGroupClause'es are grouped together. Otherwise, we > > could just delete redundant SortGroupClause'es. > Hm, preprocess_groupclause is called before the standard_qp_callback > forms the 'canonical form' of group_pathkeys and such behaviour needed. > But the code that chooses the reordering strategy uses already processed > grouping clauses, where we don't have duplicates in the first > num_groupby_pathkeys elements, do we? Yep, it seems that we work with group clauses which were processed by standard_qp_callback(). In turn standard_qp_callback() called make_pathkeys_for_sortclauses_extended() with remove_redundant == true. So, there shouldn't be duplicates. And it's unclear what should we be protected from. I leaved 0002 with just asserts. And extracted questionable changes into 0005. ------ Regards, Alexander Korotkov Supabase
Attachment
On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > I've revised some grammar including the sentence you've proposed. > -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force); +static List *preprocess_groupclause(PlannerInfo *root, List *force); changing preprocess_groupclause the second argument from "force" to "gset" would be more intuitive, I think. `elog(ERROR, "Order of group-by clauses doesn't correspond incoming sort order");` I think this error message makes people wonder what "incoming sort order" is. BTW, "correspond", generally people use "correspond to". I did some minor cosmetic changes, mainly changing foreach to foreach_node. Please check the attachment.
Attachment
Hi! On Sun, Jun 2, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote: > > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > I've revised some grammar including the sentence you've proposed. > > > > -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force); > +static List *preprocess_groupclause(PlannerInfo *root, List *force); > > changing preprocess_groupclause the second argument > from "force" to "gset" would be more intuitive, I think. Probably, but my intention is to restore preprocess_groupclause() as it was before 0452b461bc with minimal edits to support incremental sort. I'd rather avoid refactoring if this area for now. > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming > sort order");` > > I think this error message makes people wonder what "incoming sort order" is. > BTW, "correspond", generally people use "correspond to". Thank you. On the second thought, I think it would be better to turn this into an assertion like the checks before. > I did some minor cosmetic changes, mainly changing foreach to foreach_node. > Please check the attachment. I would avoid refactoring of preprocess_groupclause() for the reason described above. But I picked the grammar fix for PlannerInfo's comment. ------ Regards, Alexander Korotkov Supabase
Attachment
Hi, Alexander!
On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi!
On Sun, Jun 2, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote:
>
> On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > I've revised some grammar including the sentence you've proposed.
> >
>
> -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force);
> +static List *preprocess_groupclause(PlannerInfo *root, List *force);
>
> changing preprocess_groupclause the second argument
> from "force" to "gset" would be more intuitive, I think.
Probably, but my intention is to restore preprocess_groupclause() as
it was before 0452b461bc with minimal edits to support incremental
sort. I'd rather avoid refactoring if this area for now.
> `elog(ERROR, "Order of group-by clauses doesn't correspond incoming
> sort order");`
>
> I think this error message makes people wonder what "incoming sort order" is.
> BTW, "correspond", generally people use "correspond to".
Thank you. On the second thought, I think it would be better to turn
this into an assertion like the checks before.
> I did some minor cosmetic changes, mainly changing foreach to foreach_node.
> Please check the attachment.
I would avoid refactoring of preprocess_groupclause() for the reason
described above. But I picked the grammar fix for PlannerInfo's
comment.
Thank you for working on this patchset.
0001: Patch looks right
Comments:
Covert -> Convert
sets the uninitialized value of ec_sortref of the pathkey's EquivalenceClass -> sets the value of the pathkey's EquivalenceClass unless it's already initializedwasn't set yet -> hasn't been set yet
0002: additional assert checking only. Looks right.
0003: pure renaming, looks good.
0004: Restores pre 0452b461bc state to preprocess_groupclause with removed partial_match fallback. Looks right. I haven't checked the comments provided they are restored from pre 0452b461bc state.
0005: Looks right.
Regards,
Pavel Borisov
Supabase
On Mon, Jun 3, 2024 at 6:55 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> Hi! >> >> On Sun, Jun 2, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote: >> > >> > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> > > >> > > I've revised some grammar including the sentence you've proposed. >> > > >> > >> > -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force); >> > +static List *preprocess_groupclause(PlannerInfo *root, List *force); >> > >> > changing preprocess_groupclause the second argument >> > from "force" to "gset" would be more intuitive, I think. >> >> Probably, but my intention is to restore preprocess_groupclause() as >> it was before 0452b461bc with minimal edits to support incremental >> sort. I'd rather avoid refactoring if this area for now. >> >> > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming >> > sort order");` >> > >> > I think this error message makes people wonder what "incoming sort order" is. >> > BTW, "correspond", generally people use "correspond to". >> >> Thank you. On the second thought, I think it would be better to turn >> this into an assertion like the checks before. >> >> > I did some minor cosmetic changes, mainly changing foreach to foreach_node. >> > Please check the attachment. >> >> I would avoid refactoring of preprocess_groupclause() for the reason >> described above. But I picked the grammar fix for PlannerInfo's >> comment. > > > Thank you for working on this patchset. > > 0001: Patch looks right > > Comments: > > Covert -> Convert > sets the uninitialized value of ec_sortref of the pathkey's EquivalenceClass -> sets the value of the pathkey's EquivalenceClassunless it's already initialized > wasn't set yet -> hasn't been set yet > > 0002: additional assert checking only. Looks right. > > 0003: pure renaming, looks good. > > 0004: Restores pre 0452b461bc state to preprocess_groupclause with removed partial_match fallback. Looks right. I haven'tchecked the comments provided they are restored from pre 0452b461bc state. > > 0005: Looks right. Thank you. Revised according to your comments. I think 0001-0004 comprising a good refactoring addressing concerns from Tom [1]. 0001 Removes from get_eclass_for_sort_expr() unnatural responsibility of setting EquivalenceClass.ec_sortref. Instead this field is set in make_pathkeys_for_sortclauses_extended() while processing of group clauses. 0002 Provides additional asserts. That should helping to defend against lurking bugs. 0003 Fixes unsuitable name of data structure. 0004 Restores pre 0452b461bc state to preprocess_groupclause() and lowers the number of paths to consider. It would be good to explicitly hear Tom about this. But I'm quite sure these patches makes situation better not worse. I'm going to push them if no objections. I'm putting 0005 aside. It's unclear why and how there could be duplicate SortGroupClauses at that stage. Also, it's unclear why existing code handles them bad. So, this should wait a comment from Tom. Links. 1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us ------ Regards, Alexander Korotkov Supabase
Attachment
On Tue, Jun 4, 2024 at 1:47 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Mon, Jun 3, 2024 at 6:55 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov <aekorotkov@gmail.com> wrote: > >> On Sun, Jun 2, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote: > >> > > >> > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > >> > > > >> > > I've revised some grammar including the sentence you've proposed. > >> > > > >> > > >> > -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force); > >> > +static List *preprocess_groupclause(PlannerInfo *root, List *force); > >> > > >> > changing preprocess_groupclause the second argument > >> > from "force" to "gset" would be more intuitive, I think. > >> > >> Probably, but my intention is to restore preprocess_groupclause() as > >> it was before 0452b461bc with minimal edits to support incremental > >> sort. I'd rather avoid refactoring if this area for now. > >> > >> > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming > >> > sort order");` > >> > > >> > I think this error message makes people wonder what "incoming sort order" is. > >> > BTW, "correspond", generally people use "correspond to". > >> > >> Thank you. On the second thought, I think it would be better to turn > >> this into an assertion like the checks before. > >> > >> > I did some minor cosmetic changes, mainly changing foreach to foreach_node. > >> > Please check the attachment. > >> > >> I would avoid refactoring of preprocess_groupclause() for the reason > >> described above. But I picked the grammar fix for PlannerInfo's > >> comment. > > > > > > Thank you for working on this patchset. > > > > 0001: Patch looks right > > > > Comments: > > > > Covert -> Convert > > sets the uninitialized value of ec_sortref of the pathkey's EquivalenceClass -> sets the value of the pathkey's EquivalenceClassunless it's already initialized > > wasn't set yet -> hasn't been set yet > > > > 0002: additional assert checking only. Looks right. > > > > 0003: pure renaming, looks good. > > > > 0004: Restores pre 0452b461bc state to preprocess_groupclause with removed partial_match fallback. Looks right. I haven'tchecked the comments provided they are restored from pre 0452b461bc state. > > > > 0005: Looks right. > > Thank you. Revised according to your comments. I think 0001-0004 > comprising a good refactoring addressing concerns from Tom [1]. > 0001 Removes from get_eclass_for_sort_expr() unnatural responsibility > of setting EquivalenceClass.ec_sortref. Instead this field is set in > make_pathkeys_for_sortclauses_extended() while processing of group > clauses. > 0002 Provides additional asserts. That should helping to defend > against lurking bugs. > 0003 Fixes unsuitable name of data structure. > 0004 Restores pre 0452b461bc state to preprocess_groupclause() and > lowers the number of paths to consider. > It would be good to explicitly hear Tom about this. But I'm quite > sure these patches makes situation better not worse. I'm going to > push them if no objections. > > I'm putting 0005 aside. It's unclear why and how there could be > duplicate SortGroupClauses at that stage. Also, it's unclear why > existing code handles them bad. So, this should wait a comment from > Tom. > > Links. > 1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us Ops... I found I've published an old version of patchset. The relevant version is attached. ------ Regards, Alexander Korotkov Supabase