Thread: Access to postgresql query optimizer output
Greetings,
I keep seeing statements like "Postgresql optimizer rewrites this query as...." What I'm curious about is, is there a way to obtain SQL form of the re-written queries somewhere in the chain of query evaluation? It does not make a lot sense to generate sql again, but it would help a lot to see the queries in their rewritten form.
Kind regards
Seref
I keep seeing statements like "Postgresql optimizer rewrites this query as...." What I'm curious about is, is there a way to obtain SQL form of the re-written queries somewhere in the chain of query evaluation? It does not make a lot sense to generate sql again, but it would help a lot to see the queries in their rewritten form.
Kind regards
Seref
On 10/29/2012 04:07 PM, Seref Arikan wrote: > Greetings, > I keep seeing statements like "Postgresql optimizer rewrites this query > as...." What I'm curious about is, is there a way to obtain SQL form of > the re-written queries somewhere in the chain of query evaluation? It > does not make a lot sense to generate sql again, but it would help a lot > to see the queries in their rewritten form. I've never seen such a tool, and I'm not sure one is possible, though I agree it'd be interesting. The same SQL can result in many different query plans, and there are many different SQL statements that can all result in the same query plan. How would you unambiguously show what plan the SQL represented? It's not so much that Pg's planner "rewrites" one SQL statement to another. Usually, it's that two or more different SQL statements optimize down to the same query plan. EXPLAIN and EXPLAIN ANALYZE show the query plans, and I'm not really sure you can go backwards from there to SQL in any consistent and logical way. -- Craig Ringer
Thanks for the clarification Craig.
On Mon, Oct 29, 2012 at 8:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
I've never seen such a tool, and I'm not sure one is possible, though IOn 10/29/2012 04:07 PM, Seref Arikan wrote:
> Greetings,
> I keep seeing statements like "Postgresql optimizer rewrites this query
> as...." What I'm curious about is, is there a way to obtain SQL form of
> the re-written queries somewhere in the chain of query evaluation? It
> does not make a lot sense to generate sql again, but it would help a lot
> to see the queries in their rewritten form.
agree it'd be interesting.
The same SQL can result in many different query plans, and there are
many different SQL statements that can all result in the same query
plan. How would you unambiguously show what plan the SQL represented?
It's not so much that Pg's planner "rewrites" one SQL statement to
another. Usually, it's that two or more different SQL statements
optimize down to the same query plan.
EXPLAIN and EXPLAIN ANALYZE show the query plans, and I'm not really
sure you can go backwards from there to SQL in any consistent and
logical way.
--
Craig Ringer
On Mon, Oct 29, 2012 at 1:07 AM, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote: > Greetings, > I keep seeing statements like "Postgresql optimizer rewrites this query > as...." What I'm curious about is, is there a way to obtain SQL form of the > re-written queries somewhere in the chain of query evaluation? It does not > make a lot sense to generate sql again, but it would help a lot to see the > queries in their rewritten form. The re-writing does not occur at the level of SQL, but rather at the level of an internal representation "parse tree". You can see these representations using these configuration settings; debug_print_parse = on debug_print_rewritten = on I'm not aware of anything that turns those back into SQL. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Mon, Oct 29, 2012 at 1:07 AM, Seref Arikan > <serefarikan@kurumsalteknoloji.com> wrote: >> I keep seeing statements like "Postgresql optimizer rewrites this query >> as...." What I'm curious about is, is there a way to obtain SQL form of the >> re-written queries somewhere in the chain of query evaluation? It does not >> make a lot sense to generate sql again, but it would help a lot to see the >> queries in their rewritten form. > The re-writing does not occur at the level of SQL, but rather at the > level of an internal representation "parse tree". > You can see these representations using these configuration settings; > debug_print_parse = on > debug_print_rewritten = on > I'm not aware of anything that turns those back into SQL. It's not entirely clear whether the question is about the rule rewriter or the planner. The rule rewriter really is effectively a source-to-source transformation; its output parse trees still correspond 1-to-1 to SQL and could be turned back into something meaningful by feeding them to ruleutils.c. There's no built-in support for that, though, and it's not clear to me that it would be very interesting in practice. I suspect that the OP is more interested in actions taken by the planner; but those are not expressed as parsetree transformations. There are all sorts of practical obstacles to getting a useful view of intermediate steps in the planning process, for instance that most alternatives are rejected without ever forming any complete representation of them. regards, tom lane
On 29 October 2012 16:00, Jeff Janes <jeff.janes@gmail.com> wrote: > The re-writing does not occur at the level of SQL, but rather at the > level of an internal representation "parse tree". Right. The post parse-analysis tree (i.e. the Query struct) is potentially rewritten by an explicit rewrite stage after parse-analysis but immediately before planning. No rewriting of plans actually occurs. This is part of the rules subsystem, which is generally considered to be a foot-gun. It *is* mostly just as if the query had been silently rewritten, and had its SQL magically changed. Craig mentioned that two distinct queries can produce the same query plan. That's true, if a little academic. I guess the best example of that is with join syntax. Technically, these could be two distinct queries, in that the query trees would be substantively different (according to pg_stat_statements own definition of that, which is essentially that they're not bitwise identical in respect of their nodes' substantive fields): select * from foo f join bar b on f.bid = b.bid; select * from foo f, bar b where f.bid = b.bid; However, after planning, they could indeed have identical plans. Still, those plans don't relate to some alternative query to the one provided, assuming rule expansion didn't occur, and if rule expansion did occur, I think you could just pretend that you'd written the SQL that way in the first place and it would work just the same. There might be some other instances where two distinct queries have identical plan tree structures. Consider these two: select upper(lower('text')); select upper(upper('text')); So what actually occurs is that the underlying pg_proc accessible functions (upper() and lower()) are actually executed in preprocess_expression() during planning, not execution proper. All that appears within the plan tree is a simple Const node in each case, so in that sense the plans are identical. Again, this is a little bit academic for the most part. I highly recommend pretending that the rules system doesn't exist. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 10/30/2012 12:33 AM, Peter Geoghegan wrote: > On 29 October 2012 16:00, Jeff Janes <jeff.janes@gmail.com> wrote: >> The re-writing does not occur at the level of SQL, but rather at the >> level of an internal representation "parse tree". > > Right. The post parse-analysis tree (i.e. the Query struct) is > potentially rewritten by an explicit rewrite stage after > parse-analysis but immediately before planning. No rewriting of plans > actually occurs. This is part of the rules subsystem, which is > generally considered to be a foot-gun. It *is* mostly just as if the > query had been silently rewritten, and had its SQL magically changed. > > Craig mentioned that two distinct queries can produce the same query > plan. That's true, if a little academic. I guess the best example of > that is with join syntax. Technically, these could be two distinct > queries, in that the query trees would be substantively different > (according to pg_stat_statements own definition of that, which is > essentially that they're not bitwise identical in respect of their > nodes' substantive fields): > > select * from foo f join bar b on f.bid = b.bid; > > select * from foo f, bar b where f.bid = b.bid; > > However, after planning, they could indeed have identical plans. I'm talking about more substantial transformations too. For example, given sample data: CREATE TABLE a ( id integer primary key, somedata text not null ); INSERT INTO a(id, somedata) VALUES (1,'ham'),(2,'eggs'),(3,'spam'),(4,'putty'); CREATE TABLE b ( a_id integer not null references a(id)); INSERT INTO b VALUES (1),(3); regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id); QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=44.50..73.41 rows=615 width=32) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.30 rows=1230 width=36) -> Hash (cost=42.00..42.00 rows=200 width=4) -> HashAggregate (cost=40.00..42.00 rows=200 width=4) -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) (6 rows) regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b); QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=44.50..73.41 rows=615 width=32) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.30 rows=1230 width=36) -> Hash (cost=42.00..42.00 rows=200 width=4) -> HashAggregate (cost=40.00..42.00 rows=200 width=4) -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) (6 rows) or: ALTER TABLE b ADD UNIQUE(a_id); regress=> EXPLAIN SELECT somedata FROM a INNER JOIN b ON (a.id = b.a_id); QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.00..17.58 rows=2 width=32) -> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) -> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36) Index Cond: (id = b.a_id) (4 rows) regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b); QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.00..17.58 rows=2 width=32) -> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) -> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36) Index Cond: (id = b.a_id) (4 rows) regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id); QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.00..17.58 rows=2 width=32) -> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) -> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36) Index Cond: (id = b.a_id) (4 rows) These are three very different ways to write the query, and they all result in the same plan. -- Craig Ringer