Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows) - Mailing list pgsql-performance

From Tom Lane
Subject Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)
Date
Msg-id 23876.1147472378@sss.pgh.pa.us
Whole thread Raw
In response to Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)  ("andremachado" <andremachado@techforce.com.br>)
List pgsql-performance
"andremachado" <andremachado@techforce.com.br> writes:
> continuing the saga,
> http://archives.postgresql.org/pgsql-performance/2006-04/msg00558.php ,
> my coleague created a test database with fake data (see below).

Thanks.  I played around with this a bit, and got results like these:
original query, 8.1 branch from a couple weeks back: 945 sec
original query, 8.1 branch tip: 184 sec
modified query, 8.1 branch tip: 15 sec

The first differential is because of this patch:
http://archives.postgresql.org/pgsql-committers/2006-04/msg00355.php
viz
    Remove the restriction originally coded into
    optimize_minmax_aggregates() that MIN/MAX not be converted to
    use an index if the query WHERE clause contains any volatile
    functions or subplans.

Allowing the max(DEC2.AM_REFERENCIA) subquery to be converted to an
indexscan makes for about a 5X reduction in the number of times the
EXISTS sub-subquery is executed.  But the real problem is that Postgres
isn't excessively smart about EXISTS subqueries.  I manually changed it
into an IN to get the 15-second runtime: instead of

      (select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
        where DEC2.IN_FOI_RETIFICADA=0 and
      exists (select CAD3.ID_CADASTRO from CADASTRO CAD3 where
          CAD3.ID_DECLARACAO=DEC2.ID_DECLARACAO and
          CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )

write

      (select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
        where DEC2.IN_FOI_RETIFICADA=0 and DEC2.ID_DECLARACAO in
             (select CAD3.ID_DECLARACAO from CADASTRO CAD3 where
          CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )

I'm not clear on how Firebird is managing to do this query in under
a second --- I can believe that they know how to do EXISTS as a join
but it still seems like the subqueries need to be done many thousand
times.  I thought maybe they were caching the results of the overall
subquery for specific values of CADASTRO.ID_EMPRESA, but now that I
see your test data, there are several thousand distinct values of
that, so there's not a lot of traction to be gained that way.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)
Next
From: André Felipe Machado
Date:
Subject: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)