cost_sort() may need to be updated - Mailing list pgsql-hackers

From Peter Geoghegan
Subject cost_sort() may need to be updated
Date
Msg-id CAM3SWZQLP6e=1si1NcQjYft7R-VYpprrf_i59tZOZX5m7VFK-w@mail.gmail.com
Whole thread Raw
Responses Re: cost_sort() may need to be updated  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
For external sorts, cost_sort() assumes the following:
* The disk traffic is assumed to be 3/4ths sequential and 1/4th random* accesses (XXX can't we refine that guess?)

...       /* Assume 3/4ths of accesses are sequential, 1/4th are not */       startup_cost += npageaccesses *
(seq_page_cost* 0.75 + random_page_cost * 0.25);
 

I think that we *can* refine this guess, and should, because random
I/O is really quite unlikely to be a large cost these days (I/O in
general often isn't a large cost, actually). More fundamentally, I
think it's a problem that cost_sort() thinks that external sorts are
far more expensive than internal sorts in general. There is good
reason to think that that does not reflect the reality. I think we can
expect external sorts to be *faster* than internal sorts with
increasing regularity in Postgres 10.

In one sense, things got worse here in 9.6 when replacement selection
was all but killed, since runs became on average half their size,
which cost_sort() was taught about at the time. But, cost_sort()
didn't care about cases where only one pass is predicted (the great
majority of external sorts), so that didn't really matter. cost_sort()
doesn't seem to care about the size of runs to be merged at all, which
seems limiting. It also seems limiting that cost_sort() doesn't
differentiate between internal and external sort *startup* costs at
all. External sorts can often start returning tuples sooner, due to
the final on-the-fly merge mechanism.

It's pretty common to see cases where merging 4 runs takes only
slightly less time than an internal sort, and yet are thought to cost
more than twice as much. I realize that costs are always a bit fudged,
but I am fairly suspicious of how big the differences between external
and internal sorts regularly are. I suggest that this be revisited
soon.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: kqueue
Next
From: Vitaly Burovoy
Date:
Subject: Re: identity columns