[PATCH] Caching for stable expressions with constant arguments v3 - Mailing list pgsql-hackers

From Marti Raudsepp
Subject [PATCH] Caching for stable expressions with constant arguments v3
Date
Msg-id CABRT9RAsQbPuSmTAXAo77a=UduXd36nCFPQRpS9r+3hqYFCfjw@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Caching for stable expressions with constant arguments v3
Re: [PATCH] Caching for stable expressions with constant arguments v3
Re: [PATCH] Caching for stable expressions with constant arguments v3
List pgsql-hackers
Hi list!

This is the third version of my CacheExpr patch. The patch is now
feature-complete -- meaning that it's now capable of caching all
expression types that are currently constant-foldable. Simple queries
don't get CacheExprs added at all now. Code comments should also be
sufficient for review. New regression tests about caching behavior are
included.

I'm now at a point where I can't think of anything significant to
improve about the patch so feedback would be very welcome.

For explanation about design decisions, please read these earlier messages:
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00579.php
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00812.php
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00833.php

Should I remove the enable_cacheexpr GUC? The overhead of this doesn't
seem big enough to warrant people turning it off.

I'm not happy with some of the duplication added to
const_expressions_mutator, particularly CaseExpr. However, moving that
code to another function or macro would probably make it harder to
understand (every parameter would only be used once). Thoughts?

Performance
-----------
There's no question that this patch improves performance for large
sequential scan filters, so I'm mostly exploring how much overhead is
added in the worst cases here.

For benchmarking, I built and ran two parallel PostgreSQL
installations on different ports, one with my latest patche and
another with the base git revision where I branched off my development

Built with GCC 4.6.1, configured with --disable-cassert --disable-debug
Default PostgreSQL configuration running on Phenom II X4. CPU
frequency scaling is disabled during test and PostgreSQL is isolated
to a certain CPU core to reduce variance.

For testing I used 'pgbench -T 10 -n -f $SCRIPT -p $PORT'
This command was ran 20 times (10 for patched, 10 for unpatched) and
the runs were interleaved.

All result differences are statistically significant according to
t-test with p<0.05

First of all, some very simple queries to get some idea of the
overhead of the patch.
----
select now()

base: avg=22088.9 stdev=180.0
patch: avg=22233.8 stdev=134.1

In this case the patch turns off CacheExpr insertion in order not to
break PL/pgSQL simple expressions. Why performance improves by 0.6%, I
don't know; it may be due to the code rearranging in simplify_function

----
select * from now()

base: avg=17097.1 stdev=48.2
patch: avg=16929.2 stdev=47.8

Caching is on here, and hurts the patch because it's only evaluated
once. Performance loss of 1.0%

----
Now we'll try some more complicated expressions on a table with two rows:
create table two (ts timestamptz);
insert into two values (timestamptz '2011-09-24 20:07:56.641322+03'),
('2011-09-24 20:07:56.642743+03');

----
two rows with cachable WHERE
select * from two where ts >= to_date(now()::date::text, 'YYYY-MM-DD')
and ts < (to_date(now()::date::text, 'YYYY-MM-DD') + interval '1
year')

base: avg=5054.2 stdev=47.1
patch: avg=4900.6 stdev=24.4

Clearly for two rows, caching the expression doesn't pay off.
Performance loss of 3.1%

----
two rows with uncachable WHERE
select * from two where ts >= to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') + interval '1 year')

base: avg=6236.6 stdev=88.3
patched: avg=6118.7 stdev=50.1

No part of this expression is cachable because clock_timestamp() is a
volatile function. Performance loss is 1.9% in the patched version,
probably due to the overhead added by searching for cachable
expressions.

----
Now repeating the last two tests with a 50-row table:
create table fifty as select generate_series(timestamptz '2001-01-01',
timestamptz '2001-01-01' + '49 days', '1 day') ts;

----
50 rows with cachable WHERE
select * from fifty where ts >= to_date(now()::date::text,
'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') +
interval '1 year')

base: avg=3136.6 stdev=22.3
patch: avg=4397.3 stdev=35.5

As expected, performance is much better with more rows due to caching;
improvement of 28.7%

----
50 rows with uncachable WHERE
select * from fifty where ts >= to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') + interval '1 year')

base: avg=3514.0 stdev=9.8
patch: avg=3500.1 stdev=18.4

Since planning overhead is less significant with more rows, the
regression is just 0.4% here.

----
As always, my latest progress can be seen in my GitHub 'cahce' branch:
https://github.com/intgr/postgres/commits/cache

Regards,
Marti

Attachment

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: posix_fadvsise in base backups
Next
From: Kohei KaiGai
Date:
Subject: Re: Inserting heap tuples in bulk in COPY