Re: [PATCH] Use optimized single-datum tuplesort in ExecSort - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date
Msg-id CAEudQAq1t4CnqQZK+kh2MF0iwX1MEs+uHk1wTksn=QO0xFKokg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use optimized single-datum tuplesort in ExecSort  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
List pgsql-hackers
Em ter., 20 de jul. de 2021 às 05:35, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 20 Jul 2021 at 01:10, James Coleman <jtc331@gmail.com> wrote:
> To be clear up front: I'm in favor of the patch, and I don't want to
> put unnecessary stumbling blocks up for it getting committed. So if we
> decide to proceed as is, that's fine with me.

Thanks for making that clear.

> But I'm not sure that the "cost model, unfortunately, does not account
> for that" is entirely accurate. The end of cost_tuplesort contains a
> cost "per extracted tuple". It does, however, note that it doesn't
> charge cpu_tuple_cost, which maybe is what you'd want to fully
> incorporate this into the model. But given this run_cost isn't about
> accounting for comparison cost (that's been done earlier) which is the
> part that'd be the same between tuple and datum sort, it seems to me
> that we could lower the cpu_operator_cost here by something like 10%
> if it's byref and 30% if it's byval?

I failed to notice that last part that adds the additional cpu_operator_cost.

The default cpu_operator_cost is 0.0025, so with the 10k tuple
benchmark, that adds an additional charge of 25 to the total cost.

If we take test 1 from my results on v5 as an example:

> Test1 446.1 657.3 147.32%

Looking at explain for that query:

regression=# explain select two from tenk1 order by two offset 1000000;
                              QUERY PLAN
----------------------------------------------------------------------
 Limit  (cost=1133.95..1133.95 rows=1 width=4)
   ->  Sort  (cost=1108.97..1133.95 rows=9995 width=4)
         Sort Key: two
         ->  Seq Scan on tenk1  (cost=0.00..444.95 rows=9995 width=4)
(4 rows)

If we want the costs to reflect reality again here then we'd have
reduce 1133.95 by something like 147.32% (the performance difference).
That would bring the cost down to 769.72, which is way more than we
have to play with than the 25 that the cpu_operator_cost * tuples
gives us.

If we reduced the 25 by 30% in this case, we'd get 17.5 and the total
cost would become 1126.45.  That's not great considering the actual
performance indicates that 769.72 would be a better number.

If we look at John's result for test 1: He saw 588 tps on master and
998 on v8.  1133.95 / 998.0 * 588.0 = 668.09, so we'd need even more
to get close to reality on that machine.

My thoughts are that the small surcharge added at the end of
cost_tuplesort() is just not enough for us to play with.  I think to
get us closer to fixing this correctly would require a redesign of the
tuplesort costing entirely. I think that would be about an order of
magnitude more effort than this patch was, so I really feel like I
don't want to do this.
I understand that redesign would require a lot of work,
but why not do it step by step?


I kinda feel that since the comparison_cost is always just 2.0 *
cpu_operator_cost regardless of the number of columns in the sort,
then if we add too many new smarts to try and properly adjust for this
new optimization, unless we do a completly new cost modal for this,
then we might as well be putting lipstick on a pig.
I think one first step is naming this 2.0?
Does this magic number don't have a good name?
 

It sounds like James mostly just mentioned the sorting just to ensure
it was properly considered and does not really feel strongly that it
needs to be adjusted.  Does anyone else feel that we should be
adjusting it?
I took a look at cost_tuplesort and I think that some small adjustments could be made as part of the improvement process.
It is attached.
1. long is a very problematic type; better int64?
2. 1024 can be int, not long?
3. 2 changed all to 2.0 (double)?
4. If disk-based is not needed, IMO can we avoid calling relation_byte_size?

Finally, to at least document (add comments) those conclusions,
would be nice, wouldn't it?

regards,
Ranier Vilela
Attachment

pgsql-hackers by date:

Previous
From: Denis Hirn
Date:
Subject: Re: [PATCH] Allow multiple recursive self-references
Next
From: Greg Nancarrow
Date:
Subject: Re: row filtering for logical replication