Re: Access to postgresql query optimizer output - Mailing list pgsql-general

From Craig Ringer
Subject Re: Access to postgresql query optimizer output
Date
Msg-id 508F3701.8000508@ringerc.id.au
Whole thread Raw
In response to Re: Access to postgresql query optimizer output  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-general
On 10/30/2012 12:33 AM, Peter Geoghegan wrote:
> On 29 October 2012 16:00, Jeff Janes <jeff.janes@gmail.com> wrote:
>> The re-writing does not occur at the level of SQL, but rather at the
>> level of an internal representation "parse tree".
>
> Right. The post parse-analysis tree (i.e. the Query struct) is
> potentially rewritten by an explicit rewrite stage after
> parse-analysis but immediately before planning. No rewriting of plans
> actually occurs. This is part of the rules subsystem, which is
> generally considered to be a foot-gun. It *is* mostly just as if the
> query had been silently rewritten, and had its SQL magically changed.
>
> Craig mentioned that two distinct queries can produce the same query
> plan. That's true, if a little academic. I guess the best example of
> that is with join syntax. Technically, these could be two distinct
> queries, in that the query trees would be substantively different
> (according to pg_stat_statements own definition of that, which is
> essentially that they're not bitwise identical in respect of their
> nodes' substantive fields):
>
> select * from foo f join bar b on f.bid = b.bid;
>
> select * from foo f, bar b where f.bid = b.bid;
>
> However, after planning, they could indeed have identical plans.

I'm talking about more substantial transformations too. For example,
given sample data:



CREATE TABLE a ( id integer primary key, somedata text not null );
INSERT INTO a(id, somedata) VALUES
(1,'ham'),(2,'eggs'),(3,'spam'),(4,'putty');
CREATE TABLE b ( a_id integer not null references a(id));
INSERT INTO b VALUES (1),(3);



regress=> EXPLAIN SELECT somedata FROM a
          WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

regress=> EXPLAIN SELECT somedata FROM a
          WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)




or:

ALTER TABLE b ADD UNIQUE(a_id);

regress=> EXPLAIN SELECT somedata FROM a INNER JOIN b ON (a.id = b.a_id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b
WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)



These are three very different ways to write the query, and they all
result in the same plan.

--
Craig Ringer


pgsql-general by date:

Previous
From: "Xiong He"
Date:
Subject: Re: Too much clients connected to the PostgreSQL Database
Next
From: "Kevin Grittner"
Date:
Subject: Re: Too much clients connected to the PostgreSQL Database