Re: Performance Evaluation of Result Cache by using TPC-DS - Mailing list pgsql-hackers
From | Yuya Watari |
---|---|
Subject | Re: Performance Evaluation of Result Cache by using TPC-DS |
Date | |
Msg-id | CAJ2pMkbEHFPs7XDSaau5Thvvp7FtCcTMzWhLCt0C1eOBHzBOWA@mail.gmail.com Whole thread Raw |
In response to | Re: Performance Evaluation of Result Cache by using TPC-DS (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Performance Evaluation of Result Cache by using TPC-DS
|
List | pgsql-hackers |
Hello David, Thank you for running experiments on your machine and I really appreciate your deep analysis. Your results are very interesting. In 5 queries, the result cache is cheaper but slower. Especially, in query 88, although the cost with result cache is cheaper, it has 34.23% degradation in query execution time. This is big regression. > What can be done? > =============== > > I'm not quite sure. The biggest problem is add_path's fuzziness. I > could go and add some penalty cost to Result Cache paths so that > they're picked less often. If I make that penalty more than 1% of the > cost, then that should get around add_path rejecting the other join > method that is not fuzzily good enough. Adding some sort of penalty > might also help the 5 of 23 queries that were cheaper and slower than > the alternative. Based on your idea, I have implemented a penalty for the cost of the result cache. I attached the patch to this e-mail. Please be noted that this patch is experimental, so it lacks comments, documents, tests, etc. This patch adds a new GUC, resultcache_cost_factor. The cost of the result cache is multiplied by this factor. If the factor is greater than 1, we impose a penalty on the result cache. The cost calculation has been modified as follows. ===== @@ -2541,6 +2542,13 @@ cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath, */ startup_cost += cpu_tuple_cost; + /* + * We multiply the costs by resultcache_cost_factor to control the + * aggressiveness of result cache. + */ + startup_cost *= resultcache_cost_factor; + total_cost *= resultcache_cost_factor; ===== @@ -1618,9 +1618,14 @@ create_resultcache_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, * Add a small additional charge for caching the first entry. All the * harder calculations for rescans are performed in * cost_resultcache_rescan(). + * + * We multiply the costs by resultcache_cost_factor to control the + * aggressiveness of result cache. */ - pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost; - pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost; + pathnode->path.startup_cost = + (subpath->startup_cost + cpu_tuple_cost) * resultcache_cost_factor; + pathnode->path.total_cost = + (subpath->total_cost + cpu_tuple_cost) * resultcache_cost_factor; pathnode->path.rows = subpath->rows; return pathnode; ===== As this factor increases, the result cache becomes less and less likely to be adopted. I conducted an experiment to clarify the threshold of the factor. I ran EXPLAIN (not EXPLAIN ANALYZE) command with different factors. The threshold is defined as the factor at which the result cache no longer appears in the query plan. The factor more than the threshold indicates the planner does not select the result cache. This experiment was conducted on my machine, so the results may differ from those on your machine. I attached the thresholds as Excel and PDF files. The thresholds vary from 1.1 to 9.6. The threshold of 9.6 indicates that a penalty of 860% must be imposed to avoid the result cache. The Excel and PDF files also contain the chart showing the relationship between speedup ratio and threshold. Unfortunately, there is no clear correlation. If we set the factor to 5, we can avoid 11% degradation of query 62 because the threshold of the query is 4.7. However, we cannot gain a 62% speedup of query 61 with this factor. Therefore, this factor does not work well and should be reconsidered. In this patch, I impose a penalty on the result cache node. An alternative way is to increase the cost of a nested loop that contains a result cache. If so, there is no need to impose a penalty of 860%, but a penalty of about 1% is enough. This approach of introducing resultcache_cost_factor is not an essential solution. However, it is reasonable to offer a way of controlling the aggressiveness of the result cache. Repeatedly, this patch is experimental, so it needs feedback and modifications. Best regards, Yuya Watari
Attachment
pgsql-hackers by date: