Thread: Access to postgresql query optimizer output

Access to postgresql query optimizer output

From
Seref Arikan
Date:
Greetings,
I keep seeing statements like "Postgresql optimizer rewrites this query as...." What I'm curious about is, is there a way to obtain SQL form of the re-written queries somewhere in the chain of query evaluation? It does not make a lot sense to generate sql again, but it would help a lot to see  the queries in their  rewritten form.

Kind regards
Seref

Re: Access to postgresql query optimizer output

From
Craig Ringer
Date:
On 10/29/2012 04:07 PM, Seref Arikan wrote:
> Greetings,
> I keep seeing statements like "Postgresql optimizer rewrites this query
> as...." What I'm curious about is, is there a way to obtain SQL form of
> the re-written queries somewhere in the chain of query evaluation? It
> does not make a lot sense to generate sql again, but it would help a lot
> to see  the queries in their  rewritten form.

I've never seen such a tool, and I'm not sure one is possible, though I
agree it'd be interesting.

The same SQL can result in many different query plans, and there are
many different SQL statements that can all result in the same query
plan. How would you unambiguously show what plan the SQL represented?

It's not so much that Pg's planner "rewrites" one SQL statement to
another. Usually, it's that two or more different SQL statements
optimize down to the same query plan.

EXPLAIN and EXPLAIN ANALYZE show the query plans, and I'm not really
sure you can go backwards from there to SQL in any consistent and
logical way.

--
Craig Ringer



Re: Access to postgresql query optimizer output

From
Seref Arikan
Date:
Thanks for the clarification Craig.

On Mon, Oct 29, 2012 at 8:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 10/29/2012 04:07 PM, Seref Arikan wrote:
> Greetings,
> I keep seeing statements like "Postgresql optimizer rewrites this query
> as...." What I'm curious about is, is there a way to obtain SQL form of
> the re-written queries somewhere in the chain of query evaluation? It
> does not make a lot sense to generate sql again, but it would help a lot
> to see  the queries in their  rewritten form.

I've never seen such a tool, and I'm not sure one is possible, though I
agree it'd be interesting.

The same SQL can result in many different query plans, and there are
many different SQL statements that can all result in the same query
plan. How would you unambiguously show what plan the SQL represented?

It's not so much that Pg's planner "rewrites" one SQL statement to
another. Usually, it's that two or more different SQL statements
optimize down to the same query plan.

EXPLAIN and EXPLAIN ANALYZE show the query plans, and I'm not really
sure you can go backwards from there to SQL in any consistent and
logical way.

--
Craig Ringer


Re: Access to postgresql query optimizer output

From
Jeff Janes
Date:
On Mon, Oct 29, 2012 at 1:07 AM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Greetings,
> I keep seeing statements like "Postgresql optimizer rewrites this query
> as...." What I'm curious about is, is there a way to obtain SQL form of the
> re-written queries somewhere in the chain of query evaluation? It does not
> make a lot sense to generate sql again, but it would help a lot to see  the
> queries in their  rewritten form.

The re-writing does not occur at the level of SQL, but rather at the
level of an internal representation "parse tree".

You can see these representations using these configuration settings;

debug_print_parse = on
debug_print_rewritten = on

I'm not aware of anything that turns those back into SQL.

Cheers,

Jeff


Re: Access to postgresql query optimizer output

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Oct 29, 2012 at 1:07 AM, Seref Arikan
> <serefarikan@kurumsalteknoloji.com> wrote:
>> I keep seeing statements like "Postgresql optimizer rewrites this query
>> as...." What I'm curious about is, is there a way to obtain SQL form of the
>> re-written queries somewhere in the chain of query evaluation? It does not
>> make a lot sense to generate sql again, but it would help a lot to see  the
>> queries in their  rewritten form.

> The re-writing does not occur at the level of SQL, but rather at the
> level of an internal representation "parse tree".

> You can see these representations using these configuration settings;

> debug_print_parse = on
> debug_print_rewritten = on

> I'm not aware of anything that turns those back into SQL.

It's not entirely clear whether the question is about the rule rewriter
or the planner.  The rule rewriter really is effectively a
source-to-source transformation; its output parse trees still correspond
1-to-1 to SQL and could be turned back into something meaningful by
feeding them to ruleutils.c.  There's no built-in support for that,
though, and it's not clear to me that it would be very interesting in
practice.

I suspect that the OP is more interested in actions taken by the
planner; but those are not expressed as parsetree transformations.
There are all sorts of practical obstacles to getting a useful view of
intermediate steps in the planning process, for instance that most
alternatives are rejected without ever forming any complete
representation of them.

            regards, tom lane


Re: Access to postgresql query optimizer output

From
Peter Geoghegan
Date:
On 29 October 2012 16:00, Jeff Janes <jeff.janes@gmail.com> wrote:
> The re-writing does not occur at the level of SQL, but rather at the
> level of an internal representation "parse tree".

Right. The post parse-analysis tree (i.e. the Query struct) is
potentially rewritten by an explicit rewrite stage after
parse-analysis but immediately before planning. No rewriting of plans
actually occurs. This is part of the rules subsystem, which is
generally considered to be a foot-gun. It *is* mostly just as if the
query had been silently rewritten, and had its SQL magically changed.

Craig mentioned that two distinct queries can produce the same query
plan. That's true, if a little academic. I guess the best example of
that is with join syntax. Technically, these could be two distinct
queries, in that the query trees would be substantively different
(according to pg_stat_statements own definition of that, which is
essentially that they're not bitwise identical in respect of their
nodes' substantive fields):

select * from foo f join bar b on f.bid = b.bid;

select * from foo f, bar b where f.bid = b.bid;

However, after planning, they could indeed have identical plans.
Still, those plans don't relate to some alternative query to the one
provided, assuming rule expansion didn't occur, and if rule expansion
did occur, I think you could just pretend that you'd written the SQL
that way in the first place and it would work just the same.

There might be some other instances where two distinct queries have
identical plan tree structures. Consider these two:

select upper(lower('text'));
select upper(upper('text'));

So what actually occurs is that the underlying pg_proc accessible
functions (upper() and lower()) are actually executed in
preprocess_expression() during planning, not execution proper. All
that appears within the plan tree is a simple Const node in each case,
so in that sense the plans are identical. Again, this is a little bit
academic for the most part.

I highly recommend pretending that the rules system doesn't exist.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Access to postgresql query optimizer output

From
Craig Ringer
Date:
On 10/30/2012 12:33 AM, Peter Geoghegan wrote:
> On 29 October 2012 16:00, Jeff Janes <jeff.janes@gmail.com> wrote:
>> The re-writing does not occur at the level of SQL, but rather at the
>> level of an internal representation "parse tree".
>
> Right. The post parse-analysis tree (i.e. the Query struct) is
> potentially rewritten by an explicit rewrite stage after
> parse-analysis but immediately before planning. No rewriting of plans
> actually occurs. This is part of the rules subsystem, which is
> generally considered to be a foot-gun. It *is* mostly just as if the
> query had been silently rewritten, and had its SQL magically changed.
>
> Craig mentioned that two distinct queries can produce the same query
> plan. That's true, if a little academic. I guess the best example of
> that is with join syntax. Technically, these could be two distinct
> queries, in that the query trees would be substantively different
> (according to pg_stat_statements own definition of that, which is
> essentially that they're not bitwise identical in respect of their
> nodes' substantive fields):
>
> select * from foo f join bar b on f.bid = b.bid;
>
> select * from foo f, bar b where f.bid = b.bid;
>
> However, after planning, they could indeed have identical plans.

I'm talking about more substantial transformations too. For example,
given sample data:



CREATE TABLE a ( id integer primary key, somedata text not null );
INSERT INTO a(id, somedata) VALUES
(1,'ham'),(2,'eggs'),(3,'spam'),(4,'putty');
CREATE TABLE b ( a_id integer not null references a(id));
INSERT INTO b VALUES (1),(3);



regress=> EXPLAIN SELECT somedata FROM a
          WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

regress=> EXPLAIN SELECT somedata FROM a
          WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)




or:

ALTER TABLE b ADD UNIQUE(a_id);

regress=> EXPLAIN SELECT somedata FROM a INNER JOIN b ON (a.id = b.a_id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b
WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)



These are three very different ways to write the query, and they all
result in the same plan.

--
Craig Ringer