On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Antonin Houska <ah@cybertec.at> writes:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> These two phases overlap, though. I believe progress reporting for
>>> sorts is really hard.
>
>> Whatever complexity is hidden in the sort, cost_sort() should have taken it
>> into consideration when called via plan_cluster_use_sort(). Thus I think that
>> once we have both startup and total cost, the current progress of the sort
>> stage can be estimated from the current number of input and output
>> rows. Please remind me if my proposal appears to be too simplistic.
>
> Well, even if you assume that the planner's cost model omits nothing
> (which I wouldn't bet on), its result is only going to be as good as the
> planner's estimate of the number of rows to be sorted. And, in cases
> where people actually care about progress monitoring, it's likely that
> the planner got that wrong, maybe horribly so. I think it's a bad idea
> for progress monitoring to depend on the planner's estimates in any way
> whatsoever.
I agree.
I have been of the opinion all along that progress monitoring needs to
report facts, not theories. The number of tuples read thus far is a
fact, and is fine to report for whatever value it may have to someone.
The number of tuples that will be read in the future is a theory, and
as you say, progress monitoring is most likely to be used in cases
where theory and practice ended up being very different.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company