[PATCH] Caching for stable expressions with constant arguments v3 - Mailing list pgsql-hackers
From | Marti Raudsepp |
---|---|
Subject | [PATCH] Caching for stable expressions with constant arguments v3 |
Date | |
Msg-id | CABRT9RAsQbPuSmTAXAo77a=UduXd36nCFPQRpS9r+3hqYFCfjw@mail.gmail.com Whole thread Raw |
Responses |
Re: [PATCH] Caching for stable expressions with constant arguments
v3
Re: [PATCH] Caching for stable expressions with constant arguments v3 Re: [PATCH] Caching for stable expressions with constant arguments v3 |
List | pgsql-hackers |
Hi list! This is the third version of my CacheExpr patch. The patch is now feature-complete -- meaning that it's now capable of caching all expression types that are currently constant-foldable. Simple queries don't get CacheExprs added at all now. Code comments should also be sufficient for review. New regression tests about caching behavior are included. I'm now at a point where I can't think of anything significant to improve about the patch so feedback would be very welcome. For explanation about design decisions, please read these earlier messages: http://archives.postgresql.org/pgsql-hackers/2011-09/msg00579.php http://archives.postgresql.org/pgsql-hackers/2011-09/msg00812.php http://archives.postgresql.org/pgsql-hackers/2011-09/msg00833.php Should I remove the enable_cacheexpr GUC? The overhead of this doesn't seem big enough to warrant people turning it off. I'm not happy with some of the duplication added to const_expressions_mutator, particularly CaseExpr. However, moving that code to another function or macro would probably make it harder to understand (every parameter would only be used once). Thoughts? Performance ----------- There's no question that this patch improves performance for large sequential scan filters, so I'm mostly exploring how much overhead is added in the worst cases here. For benchmarking, I built and ran two parallel PostgreSQL installations on different ports, one with my latest patche and another with the base git revision where I branched off my development Built with GCC 4.6.1, configured with --disable-cassert --disable-debug Default PostgreSQL configuration running on Phenom II X4. CPU frequency scaling is disabled during test and PostgreSQL is isolated to a certain CPU core to reduce variance. For testing I used 'pgbench -T 10 -n -f $SCRIPT -p $PORT' This command was ran 20 times (10 for patched, 10 for unpatched) and the runs were interleaved. All result differences are statistically significant according to t-test with p<0.05 First of all, some very simple queries to get some idea of the overhead of the patch. ---- select now() base: avg=22088.9 stdev=180.0 patch: avg=22233.8 stdev=134.1 In this case the patch turns off CacheExpr insertion in order not to break PL/pgSQL simple expressions. Why performance improves by 0.6%, I don't know; it may be due to the code rearranging in simplify_function ---- select * from now() base: avg=17097.1 stdev=48.2 patch: avg=16929.2 stdev=47.8 Caching is on here, and hurts the patch because it's only evaluated once. Performance loss of 1.0% ---- Now we'll try some more complicated expressions on a table with two rows: create table two (ts timestamptz); insert into two values (timestamptz '2011-09-24 20:07:56.641322+03'), ('2011-09-24 20:07:56.642743+03'); ---- two rows with cachable WHERE select * from two where ts >= to_date(now()::date::text, 'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') + interval '1 year') base: avg=5054.2 stdev=47.1 patch: avg=4900.6 stdev=24.4 Clearly for two rows, caching the expression doesn't pay off. Performance loss of 3.1% ---- two rows with uncachable WHERE select * from two where ts >= to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') + interval '1 year') base: avg=6236.6 stdev=88.3 patched: avg=6118.7 stdev=50.1 No part of this expression is cachable because clock_timestamp() is a volatile function. Performance loss is 1.9% in the patched version, probably due to the overhead added by searching for cachable expressions. ---- Now repeating the last two tests with a 50-row table: create table fifty as select generate_series(timestamptz '2001-01-01', timestamptz '2001-01-01' + '49 days', '1 day') ts; ---- 50 rows with cachable WHERE select * from fifty where ts >= to_date(now()::date::text, 'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') + interval '1 year') base: avg=3136.6 stdev=22.3 patch: avg=4397.3 stdev=35.5 As expected, performance is much better with more rows due to caching; improvement of 28.7% ---- 50 rows with uncachable WHERE select * from fifty where ts >= to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') + interval '1 year') base: avg=3514.0 stdev=9.8 patch: avg=3500.1 stdev=18.4 Since planning overhead is less significant with more rows, the regression is just 0.4% here. ---- As always, my latest progress can be seen in my GitHub 'cahce' branch: https://github.com/intgr/postgres/commits/cache Regards, Marti
Attachment
pgsql-hackers by date: