Thread: postgres_fdw: estimate_path_cost_size fails to re-use cached costs

postgres_fdw: estimate_path_cost_size fails to re-use cached costs

From
Etsuro Fujita
Date:
Hi,

I noticed yet another thing while updating the patch for pushing down
ORDER BY LIMIT.  Let me explain.  When costing foreign paths on the
basis of local statistics, we calculate/cache the costs of an unsorted
foreign path, and re-use them to estimate the costs of presorted foreign
paths, as shown below.  BUT: we fail to re-use them for some typical
queries, such as "select * from ft1 order by a", due to
fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
repeatedly.

        /*
         * We will come here again and again with different set of pathkeys
         * that caller wants to cost. We don't need to calculate the cost of
         * bare scan each time. Instead, use the costs if we have cached
them
         * already.
         */
        if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0)
        {
            startup_cost = fpinfo->rel_startup_cost;
            run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
        }

I think we should use "fpinfo->rel_startup_cost >= 0" here, not
"fpinfo->rel_startup_cost > 0".  Also, it would be possible that the
total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
cpu_tuple_cost=0 for the example), so I think we should change the total
cost part as well.  Attached is a patch for that.

Best regards,
Etsuro Fujita

Attachment

Re: postgres_fdw: estimate_path_cost_size fails to re-use cachedcosts

From
Etsuro Fujita
Date:
(2019/01/25 20:33), Etsuro Fujita wrote:
> I noticed yet another thing while updating the patch for pushing down
> ORDER BY LIMIT.  Let me explain.  When costing foreign paths on the
> basis of local statistics, we calculate/cache the costs of an unsorted
> foreign path, and re-use them to estimate the costs of presorted foreign
> paths, as shown below.  BUT: we fail to re-use them for some typical
> queries, such as "select * from ft1 order by a", due to
> fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
> repeatedly.
> 
>          /*
>           * We will come here again and again with different set of pathkeys
>           * that caller wants to cost. We don't need to calculate the cost of
>           * bare scan each time. Instead, use the costs if we have cached
> them
>           * already.
>           */
>          if (fpinfo->rel_startup_cost>  0&&  fpinfo->rel_total_cost>  0)
>          {
>              startup_cost = fpinfo->rel_startup_cost;
>              run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
>          }
> 
> I think we should use "fpinfo->rel_startup_cost>= 0" here, not
> "fpinfo->rel_startup_cost>  0".  Also, it would be possible that the
> total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
> cpu_tuple_cost=0 for the example), so I think we should change the total
> cost part as well.  Attached is a patch for that.

I added the commit message.  Updated patch attached.  If no objections,
I'll apply that to HEAD only as there are no reports of actual trouble
from this, as far as I know.

Best regards,
Etsuro Fujita

Attachment

Re: postgres_fdw: estimate_path_cost_size fails to re-use cachedcosts

From
Etsuro Fujita
Date:
(2019/01/28 19:37), Etsuro Fujita wrote:
> (2019/01/25 20:33), Etsuro Fujita wrote:
>> I noticed yet another thing while updating the patch for pushing down
>> ORDER BY LIMIT.  Let me explain.  When costing foreign paths on the
>> basis of local statistics, we calculate/cache the costs of an unsorted
>> foreign path, and re-use them to estimate the costs of presorted foreign
>> paths, as shown below.  BUT: we fail to re-use them for some typical
>> queries, such as "select * from ft1 order by a", due to
>> fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
>> repeatedly.
>>
>>           /*
>>            * We will come here again and again with different set of pathkeys
>>            * that caller wants to cost. We don't need to calculate the cost of
>>            * bare scan each time. Instead, use the costs if we have cached
>> them
>>            * already.
>>            */
>>           if (fpinfo->rel_startup_cost>   0&&   fpinfo->rel_total_cost>   0)
>>           {
>>               startup_cost = fpinfo->rel_startup_cost;
>>               run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
>>           }
>>
>> I think we should use "fpinfo->rel_startup_cost>= 0" here, not
>> "fpinfo->rel_startup_cost>   0".  Also, it would be possible that the
>> total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
>> cpu_tuple_cost=0 for the example), so I think we should change the total
>> cost part as well.  Attached is a patch for that.
> 
> I added the commit message.  Updated patch attached.  If no objections,
> I'll apply that to HEAD only as there are no reports of actual trouble
> from this, as far as I know.

Pushed.

Best regards,
Etsuro Fujita