Re: [PATCH] Use optimized single-datum tuplesort in ExecSort - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: [PATCH] Use optimized single-datum tuplesort in ExecSort |
Date | |
Msg-id | CAAaqYe_FToSveTW_O-4fT+av+Y4nxnhCwJgj-39N8RnV7jm9bA@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Use optimized single-datum tuplesort in ExecSort (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
|
List | pgsql-hackers |
On Tue, Jul 20, 2021 at 4:35 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 20 Jul 2021 at 01:10, James Coleman <jtc331@gmail.com> wrote: > > To be clear up front: I'm in favor of the patch, and I don't want to > > put unnecessary stumbling blocks up for it getting committed. So if we > > decide to proceed as is, that's fine with me. > > Thanks for making that clear. > > > But I'm not sure that the "cost model, unfortunately, does not account > > for that" is entirely accurate. The end of cost_tuplesort contains a > > cost "per extracted tuple". It does, however, note that it doesn't > > charge cpu_tuple_cost, which maybe is what you'd want to fully > > incorporate this into the model. But given this run_cost isn't about > > accounting for comparison cost (that's been done earlier) which is the > > part that'd be the same between tuple and datum sort, it seems to me > > that we could lower the cpu_operator_cost here by something like 10% > > if it's byref and 30% if it's byval? > > I failed to notice that last part that adds the additional cpu_operator_cost. > > The default cpu_operator_cost is 0.0025, so with the 10k tuple > benchmark, that adds an additional charge of 25 to the total cost. > > If we take test 1 from my results on v5 as an example: > > > Test1 446.1 657.3 147.32% > > Looking at explain for that query: > > regression=# explain select two from tenk1 order by two offset 1000000; > QUERY PLAN > ---------------------------------------------------------------------- > Limit (cost=1133.95..1133.95 rows=1 width=4) > -> Sort (cost=1108.97..1133.95 rows=9995 width=4) > Sort Key: two > -> Seq Scan on tenk1 (cost=0.00..444.95 rows=9995 width=4) > (4 rows) > > If we want the costs to reflect reality again here then we'd have > reduce 1133.95 by something like 147.32% (the performance difference). > That would bring the cost down to 769.72, which is way more than we > have to play with than the 25 that the cpu_operator_cost * tuples > gives us. > > If we reduced the 25 by 30% in this case, we'd get 17.5 and the total > cost would become 1126.45. That's not great considering the actual > performance indicates that 769.72 would be a better number. > > If we look at John's result for test 1: He saw 588 tps on master and > 998 on v8. 1133.95 / 998.0 * 588.0 = 668.09, so we'd need even more > to get close to reality on that machine. > > My thoughts are that the small surcharge added at the end of > cost_tuplesort() is just not enough for us to play with. I think to > get us closer to fixing this correctly would require a redesign of the > tuplesort costing entirely. I think that would be about an order of > magnitude more effort than this patch was, so I really feel like I > don't want to do this. > > I kinda feel that since the comparison_cost is always just 2.0 * > cpu_operator_cost regardless of the number of columns in the sort, > then if we add too many new smarts to try and properly adjust for this > new optimization, unless we do a completly new cost modal for this, > then we might as well be putting lipstick on a pig. > > It sounds like James mostly just mentioned the sorting just to ensure > it was properly considered and does not really feel strongly that it > needs to be adjusted. Does anyone else feel that we should be > adjusting it? Thanks for doing the math measuring how much we could impact things. I'm +lots on getting this committed as is. Thanks all for your work on the improvement! James
pgsql-hackers by date: