Re: Repeated execution of identical subqueries - Mailing list pgsql-performance

From Craig James
Subject Re: Repeated execution of identical subqueries
Date
Msg-id 47D934CB.6090606@emolecules.com
Whole thread Raw
In response to Re: Repeated execution of identical subqueries  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-performance
Craig Ringer wrote:
> Tom Lane wrote:
>> No, not at the moment.  In principle the planner could look for such
>> duplicates, but it'd be wasted cycles so much of the time that I'd be
>> loath to do it.
>>
> Good point - there are better places to spend time, and I imagine it'd
> be an expensive thing to check too.

The one very simple case that gets me every time is when a user-defined function is called.

test=> explain analyze select chm_mf(isosmiles) from version where chm_mf(isosmiles) like '%C20%' or chm_mf(isosmiles)
like'%C21%'; 
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on version  (cost=0.00..18.57 rows=29 width=43) (actual time=48.798..1180.538 rows=50 loops=1)
   Filter: ((chm_mf(isosmiles) ~~ '%C20%'::text) OR (chm_mf(isosmiles) ~~ '%C21%'::text))
 Total runtime: 1180.683 ms
(3 rows)

This table only has 375 rows TOTAL, yet it takes over a second to answer this query: "Find me molecules with either 20
or21 carbon atoms in the molecular formula".  This is a somewhat contrived example, but we have others that really do
getus, and we go to great lengths to avoid them.  It requires us to avoid "natural" queries like the one above, and
insteadcreate convoluted application logic using temporary tables to hold the results of a function call, which we can
thenuse in a query that uses the values more than once.  Something like this: 

create temporary table foo(mf text);
explain analyze insert into foo (select chm_mf(isosmiles) from version);
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on version  (cost=0.00..15.69 rows=375 width=43) (actual time=1.829..566.233 rows=375 loops=1)
 Total runtime: 568.470 ms

Now we can use this table in a complex query where we need the MF twice, and discard it, and it's STILL faster than the
original"natural" SQL.  Notice that it takes just half the time as the first query, which tells me chm_mf() was being
calledtwice in the first example.  This function is defined as: 

  CREATE OR REPLACE FUNCTION chm_mf(text) RETURNS text
  AS '/usr/local/pgsql/lib/libchm.so', 'chm_mf'
  LANGUAGE 'C' STRICT IMMUTABLE;

I can understand how in the general case, it is very hard to identify repeated subqueries.  But it seems like an
IMMUTABLEfunction shouldn't be called twice on the same column -- isn't that the whole point of IMMUTABLE? 

Craig

pgsql-performance by date:

Previous
From: "Justin Graf"
Date:
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10
Next
From: Craig James
Date:
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10