Thread: Repeated execution of identical subqueries

Repeated execution of identical subqueries

From

Craig Ringer

Date:

13 March 2008, 00:38:13

Hi

I've been wondering about postgresql's handling of repeated subqueries
in statements for a while, and thought I'd ask here.

If the exact same subquery appears in multiple places in a complex
query, it seems to be executed separately each time it appears. I'm
wondering if there's any way, without moving over to PL/PgSQL, to
"collapse" such multiple executions down to a single one.

Consider this simplistic example, which while useless demonstrates the
issue in a self contained way:

explain analyze select (select count(id) from booking) as x
where (select count(id) from booking) > 100;

                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Result  (cost=37023.85..37023.86 rows=1 width=0)
          (actual time=668.922..668.922 rows=1 loops=1)
    One-Time Filter: ($1 > 100)
    InitPlan
      ->  Aggregate  (cost=18511.92..18511.92 rows=1 width=4)
                     (actual time=335.150..335.150 rows=1 loops=1)
            ->  Seq Scan on booking  (cost=0.00..17627.13 rows=353913
                width=4) (actual time=0.128..200.147 rows=353913 loops=1)
      ->  Aggregate  (cost=18511.92..18511.92 rows=1 width=4)
                     (actual time=333.756..333.756 rows=1 loops=1)
            ->  Seq Scan on booking  (cost=0.00..17627.13 rows=353913
                width=4) (actual time=0.133..198.261 rows=353913 loops=1)
  Total runtime: 668.993 ms
(8 rows)


The query:

(select count(id) from booking)

is executed twice, even though it's guaranteed by MVCC that the result
will be the same in both subqueries.

Ideally you'd be able to say something like:

select (select count(id) from booking) as x
where x > 100;

I realize that in this case the query can be rewritten as:

select x.c from (select count(id) AS c from booking) as x
where x.c > 100;


but in more complex queries introducing an additional FROM clause for a
  single value can be undesirable and/or ugly.

Is there any way to get postgresql to detect such repeated query parts
and evaluate them only once?

--
Craig Ringer

Re: Repeated execution of identical subqueries

From

Tom Lane

Date:

13 March 2008, 02:13:30

Craig Ringer <craig@postnewspapers.com.au> writes:
> Is there any way to get postgresql to detect such repeated query parts
> and evaluate them only once?

No, not at the moment.  In principle the planner could look for such
duplicates, but it'd be wasted cycles so much of the time that I'd be
loath to do it.

There is work afoot to implement the SQL:2003 "WITH" syntax, which
I think would offer a syntactic solution to your problem.

            regards, tom lane

Re: Repeated execution of identical subqueries

From

Craig Ringer

Date:

13 March 2008, 03:35:18

Tom Lane wrote:
> No, not at the moment.  In principle the planner could look for such
> duplicates, but it'd be wasted cycles so much of the time that I'd be
> loath to do it.
>
Good point - there are better places to spend time, and I imagine it'd
be an expensive thing to check too.
> There is work afoot to implement the SQL:2003 "WITH" syntax, which
> I think would offer a syntactic solution to your problem.
Yes, it would. In fact, I was thinking about the syntax seen in some
functional languages - like Haskell's `where' clause - that defines a
subexpression available to all parts of the expression. If the SQL:2003
WITH expression is anything like that it'd be very handy indeed.

--
Craig Ringer

Re: Repeated execution of identical subqueries

From

Craig James

Date:

13 March 2008, 11:06:14

Craig Ringer wrote:
> Tom Lane wrote:
>> No, not at the moment.  In principle the planner could look for such
>> duplicates, but it'd be wasted cycles so much of the time that I'd be
>> loath to do it.
>>
> Good point - there are better places to spend time, and I imagine it'd
> be an expensive thing to check too.

The one very simple case that gets me every time is when a user-defined function is called.

test=> explain analyze select chm_mf(isosmiles) from version where chm_mf(isosmiles) like '%C20%' or chm_mf(isosmiles)
like'%C21%'; 
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on version  (cost=0.00..18.57 rows=29 width=43) (actual time=48.798..1180.538 rows=50 loops=1)
   Filter: ((chm_mf(isosmiles) ~~ '%C20%'::text) OR (chm_mf(isosmiles) ~~ '%C21%'::text))
 Total runtime: 1180.683 ms
(3 rows)

This table only has 375 rows TOTAL, yet it takes over a second to answer this query: "Find me molecules with either 20
or21 carbon atoms in the molecular formula".  This is a somewhat contrived example, but we have others that really do
getus, and we go to great lengths to avoid them.  It requires us to avoid "natural" queries like the one above, and
insteadcreate convoluted application logic using temporary tables to hold the results of a function call, which we can
thenuse in a query that uses the values more than once.  Something like this: 

create temporary table foo(mf text);
explain analyze insert into foo (select chm_mf(isosmiles) from version);
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on version  (cost=0.00..15.69 rows=375 width=43) (actual time=1.829..566.233 rows=375 loops=1)
 Total runtime: 568.470 ms

Now we can use this table in a complex query where we need the MF twice, and discard it, and it's STILL faster than the
original"natural" SQL.  Notice that it takes just half the time as the first query, which tells me chm_mf() was being
calledtwice in the first example.  This function is defined as: 

  CREATE OR REPLACE FUNCTION chm_mf(text) RETURNS text
  AS '/usr/local/pgsql/lib/libchm.so', 'chm_mf'
  LANGUAGE 'C' STRICT IMMUTABLE;

I can understand how in the general case, it is very hard to identify repeated subqueries.  But it seems like an
IMMUTABLEfunction shouldn't be called twice on the same column -- isn't that the whole point of IMMUTABLE? 

Craig