Re: Performance Evaluation of Result Cache by using TPC-DS - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Performance Evaluation of Result Cache by using TPC-DS |
Date | |
Msg-id | CAApHDvo2SyPuFQobnjE06eA1WfvCRHi9O2EbpxXW_BwYVu-B+Q@mail.gmail.com Whole thread Raw |
In response to | Re: Performance Evaluation of Result Cache by using TPC-DS (Yuya Watari <watari.yuya@gmail.com>) |
Responses |
Re: Performance Evaluation of Result Cache by using TPC-DS
|
List | pgsql-hackers |
On Tue, 20 Apr 2021 at 16:43, Yuya Watari <watari.yuya@gmail.com> wrote: > I listed all indexes on my machine by executing your query. I attached > the result to this e-mail. I hope it will help you. Thanks for sending that. I've now run some benchmarks of TPC-DS both with enable_resultcache on and off. I think I've used the same scale of test as you did. -SCALE 10. tpcds=# \l+ tpcds List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description -------+---------+----------+-------------+-------------+-------------------+-------+------------+------------- tpcds | drowley | UTF8 | en_NZ.UTF-8 | en_NZ.UTF-8 | | 28 GB | pg_default | (1 row) The following settings were non-standard: tpcds=# select name,setting from pg_Settings where setting <> boot_val; name | setting ----------------------------------+-------------------- application_name | psql archive_command | (disabled) client_encoding | UTF8 data_directory_mode | 0700 DateStyle | ISO, DMY default_text_search_config | pg_catalog.english enable_resultcache | off fsync | off jit | off lc_collate | en_NZ.UTF-8 lc_ctype | en_NZ.UTF-8 lc_messages | en_NZ.UTF-8 lc_monetary | en_NZ.UTF-8 lc_numeric | en_NZ.UTF-8 lc_time | en_NZ.UTF-8 log_file_mode | 0600 log_timezone | Pacific/Auckland max_parallel_maintenance_workers | 10 max_parallel_workers_per_gather | 0 max_stack_depth | 2048 server_encoding | UTF8 shared_buffers | 2621440 TimeZone | Pacific/Auckland unix_socket_permissions | 0777 wal_buffers | 2048 work_mem | 262144 (26 rows) This is an AMD 3990x CPU with 64GB of RAM. I didn't run all of the queries. To reduce the benchmark times and to make the analysis easier, I just ran the queries where EXPLAIN shows at least 1 Result Cache node. The queries in question are: 1 2 6 7 15 16 21 23 24 27 34 43 44 45 66 69 73 79 88 89 91 94 99. The one exception here is query 58. It did use a Result Cache node when enable_resultcache=on, but the query took more than 6 hours to run. This slowness is not due to Result Cache. It's due to the following correlated subquery. and i.i_current_price > 1.2 * (select avg(j.i_current_price) from item j where j.i_category = i.i_category) That results in: SubPlan 2 -> Aggregate (cost=8264.44..8264.45 rows=1 width=32) (actual time=87.592..87.592 rows=1 loops=255774) 87.592 * 255774 is 6.22 hours. So 6.22 hours of executing that subplan. The query took 6.23 hours in total. (A Result Cache on the subplan would help here! :-) there are only 10 distinct categories) Results ====== Out of the 23 queries that used Result Cache, only 7 executed more quickly than with enable_resultcache = off. However, with 15 of the 23 queries, the Result Cache plan was not cheaper. This means the planner rejected some other join method that would have made a cheaper plan in 15 out of 23 queries. This is likely due to the add_path() fuzziness not keeping the cheaper plan. In only 5 of 23 queries, the Result Cache plan was both cheaper and slower to execute. These are queries 1, 6, 27, 88 and 99. These cost 0.55%, 0.04%, 0.25%, 0.25% and 0.01% more than the plan that was picked when enable_resultcache=off. None of those costs seem significantly cheaper than the alternative plan. So, in summary, I'd say there are two separate problems here: 1. The planner does not always pick the cheapest plan due to add_path fuzziness. (15 of 23 queries have this problem, however, 4 of these 15 queries were faster with result cache, despite costing more) 2. Sometimes the Result Cache plan is cheaper and slower than the plan that is picked with enable_resultcache = off. (5 of 23 queries have this problem) Overall with result cache enabled, the benchmark ran 1.15% faster. This is mostly due to query 69 which ran over 40 seconds more quickly with result cache enabled. Unfortunately, 16 of the 23 queries became slower due to result cache with only the remaining 7 becoming faster. That's not a good track record. I never expected that we'd use a Result Cache node correctly in every planning problem we ever try to solve, but only getting that right 30.4% of the time is not quite as close to that 100% mark as I'd have liked. However, maybe that's overly harsh on the Result Cache code as it was only 5 queries that we costed cheaper and were slower. So 18 of 23 seem to have more realistic costs, which is 78% of queries. What can be done? =============== I'm not quite sure. The biggest problem is add_path's fuzziness. I could go and add some penalty cost to Result Cache paths so that they're picked less often. If I make that penalty more than 1% of the cost, then that should get around add_path rejecting the other join method that is not fuzzily good enough. Adding some sort of penalty might also help the 5 of 23 queries that were cheaper and slower than the alternative. I've attached a spreadsheet with all of the results and also the EXPLAIN / EXPLAIN ANALYZE and times from both runs. The query times in the spreadsheet are to run the query once with pgbench (i.e -t 1). Not the EXPLAIN ANALYZE time. I've also zipped the entire benchmark results and attached as results.tar.bz2. David
Attachment
pgsql-hackers by date: