Thread: generic options for explain

generic options for explain

From
Robert Haas
Date:
Well, here we are!  Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]!  I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2].  The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords.  Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax.  See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

As written, this patch doesn't introduce any actual new functionality,
but I think it's pretty easy to see how we could build on the syntax
to add things like different types of output formats, different types
of instrumentation, etc.  A few other random notes:

- This currently lacks documentation.  If we have any consensus that
this is a reasonable approach, I'll add some.
- I noticed that we currently accept as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)).  But
you can't put parentheses around any other type of statement.  Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all.  But that's neither here nor there as far as
this patch is concerned, except that it required some minor grammar
hackery and a long comment explaining the hackery.

Thoughts?

...Robert

[1] http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
[2] http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com

Attachment

Re: generic options for explain

From
Pavel Stehule
Date:
Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

regards
Pavel Stehule

2009/5/24 Robert Haas <robertmhaas@gmail.com>:
> Well, here we are!  Yet another thread about some piece of information
> that's omitted from EXPLAIN and can't easily be added because there
> are a zillion things we want to add to EXPLAIN and it's not OK to bury
> the user[1]!  I've long been of the opinion that the right way to fix
> this problem is to extend the syntax with some sort of extensible
> options syntax[2].  The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
> syntax does not scale to large numbers of options - it requires that
> the options occur in a fixed order, and that the option names all be
> keywords.  Having gotten throughly fed up with having this
> conversation for the ump-teenth time, I wrote a patch to introduce
> just such a syntax.  See attached.
>
> What I did is borrowed the generic options stuff that Peter Eisentraut
> introduced for FOREIGN DATA WRAPPER et. al, so you can write:
>
> EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
> e.g. EXPLAIN (ANALYZE "on") query
>
> As written, this patch doesn't introduce any actual new functionality,
> but I think it's pretty easy to see how we could build on the syntax
> to add things like different types of output formats, different types
> of instrumentation, etc.  A few other random notes:
>
> - This currently lacks documentation.  If we have any consensus that
> this is a reasonable approach, I'll add some.
> - I noticed that we currently accept as a top-level SQL command an
> arbitrarily parenthesized SELECT statement, like ((SELECT 3)).  But
> you can't put parentheses around any other type of statement.  Even
> more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
> me makes no sense at all.  But that's neither here nor there as far as
> this patch is concerned, except that it required some minor grammar
> hackery and a long comment explaining the hackery.
>
> Thoughts?
>
> ...Robert
>
> [1] http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
> [2] http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: generic options for explain

From
Andres Freund
Date:
Hi Robert,

On 05/24/2009 02:47 AM, Robert Haas wrote:
> Well, here we are!  Yet another thread about some piece of information
> that's omitted from EXPLAIN and can't easily be added because there
> are a zillion things we want to add to EXPLAIN and it's not OK to bury
> the user[1]!  I've long been of the opinion that the right way to fix
> this problem is to extend the syntax with some sort of extensible
> options syntax[2].  The current "EXPLAIN [ANALYZE] [VERBOSE]<query>"
> syntax does not scale to large numbers of options - it requires that
> the options occur in a fixed order, and that the option names all be
> keywords.  Having gotten throughly fed up with having this
> conversation for the ump-teenth time, I wrote a patch to introduce
> just such a syntax.  See attached.

> What I did is borrowed the generic options stuff that Peter Eisentraut
> introduced for FOREIGN DATA WRAPPER et. al, so you can write:

> EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
> e.g. EXPLAIN (ANALYZE "on") query
Beeing the latest cause for the frustration leading to this patch I 
obviously would like something like that - and I would gladly implement 
some additional stats suggested by others(if implementable in a 
reasonable timeframe) if this approach is agreed uppon.

> - I noticed that we currently acce    pt as a top-level SQL command an
> arbitrarily parenthesized SELECT statement, like ((SELECT 3)).  But
> you can't put parentheses around any other type of statement.  Even
> more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
> me makes no sense at all.
I would guess that stems from supporting syntax like:
(SELECT 1)
UNION
(SELECT 2)
ORDER BY

and not wanting to introduce a special path for
(SELECT 1)
ORDER BY

For additional stats to be kept another discussion about appropriate, 
extensible representation suitable for different output formats probably 
would be needed - but thats a discussion for another day.

Andres


Re: generic options for explain

From
Dave Page
Date:
On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> why we develop a new syntax?
>
> we should have a secondary function explain_query(query_string,
> option) that returns setof some. Next function should be
> explain_query_xml. I thing so for typical use EXPLAIN statement is
> enough. And for machine procession some new function should be
> perfect.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:
> On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> why we develop a new syntax?
>>
>> we should have a secondary function explain_query(query_string,
>> option) that returns setof some. Next function should be
>> explain_query_xml. I thing so for typical use EXPLAIN statement is
>> enough. And for machine procession some new function should be
>> perfect.

I don't understand - why do we want to switch from a command interface
to a functional interface?  I was thinking:

EXPLAIN (format 'xml') query...
EXPLAIN (format 'json') query...

What you're proposing will certainly be harder to code as well as more
different from what we have now.  My approach has the advantage of
being able to leverage the main parser to parse both the query and the
options; with this approach, you'll need to pass the query text back
to the main parser and then write separate code of some sort to parse
the options.  Seems like extra work for no gain.

The only possible advantage I can see of a functional interface is
that the current implementation of EXPLAIN is treated as a utility
command, so you can't do something like this:

INSERT INTO foo (EXPLAIN SELECT 1);

In my mind, however, fixing that would be preferable to (and probably
easier than) changing the whole syntax.

> I agree. We shouldn't be overloading EXPLAIN with a new option every
> time someone thinks of something new they'd like to see. XML EXPLAIN
> output would obviously be extensible and machine readable. We could
> easily produce a library of XSLT stylesheets to format the output in
> different ways without cluttering the server with extra code. The
> benefits to applications that want to read the output are also pretty
> obvious.

Well, the major benefit of the existing EXPLAIN output is that you
don't need an XSLT stylesheet to read it.  You can just be sitting
there in psql and do an EXPLAIN, and look at the results, and solve
your problem.  As we add options to EXPLAIN, I would like to see us
support those in both the plaintext output and the XML output (and
hopefully the JSON output) using the same syntax.

If, for example, I need more information on what my hash joins are
doing, I'd like to be able to do:

EXPLAIN ('hash_detail', 'on') query...

...and have it tell me the # of buckets and batches for each hash join
and whatever other information might be useful in that context.  I
*don't* want to have the solution to that problem be: run
explain_xml() on your query, then install an XSLT parser, then use
this pregenerated XSLT stylesheet to extract your data from the XML
document you got back from step #1, then view the resulting HTML in
your web browser.  That may be a good solution for someone, but it
certainly doesn't sound convenient for anyone who works primarily at
the command line.

It's also worth noting that there are some kinds of statistics (for
example, I/O statistics) which can't be gathered without significantly
impacting the performance of the query.  So you can't just take the
approach of turning on every possible instrumentation when XML output
is requested.

I do agree with your point that even with an extensible options
syntax, we can't just add an option for any old  thing that someone
wants.  We should try to come up with a somewhat comprehensive list of
types of instrumentation that someone might want and then group them
into categories, with one toggle per category.

...Robert


Re: generic options for explain

From
Pavel Stehule
Date:
2009/5/24 Robert Haas <robertmhaas@gmail.com>:
> On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:
>> On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> Hello
>>>
>>> why we develop a new syntax?
>>>
>>> we should have a secondary function explain_query(query_string,
>>> option) that returns setof some. Next function should be
>>> explain_query_xml. I thing so for typical use EXPLAIN statement is
>>> enough. And for machine procession some new function should be
>>> perfect.
>
> I don't understand - why do we want to switch from a command interface
> to a functional interface?  I was thinking:
>
> EXPLAIN (format 'xml') query...
> EXPLAIN (format 'json') query...

Personally, I thing, so this format is strange - and is some MySQLism.
There are statements like SHOW TABLES and similar. My proposal doesn't
change of current EXPLAIN. The sence is don't more complicate current
syntax and do things simply. Function has more advantage then one:

a) don't need changes in parser
b) simply add new variants
c) simply add new parameters

>
> What you're proposing will certainly be harder to code as well as more
> different from what we have now.  My approach has the advantage of
> being able to leverage the main parser to parse both the query and the
> options; with this approach, you'll need to pass the query text back
> to the main parser and then write separate code of some sort to parse
> the options.  Seems like extra work for no gain.
>
> The only possible advantage I can see of a functional interface is
> that the current implementation of EXPLAIN is treated as a utility
> command, so you can't do something like this:
>
> INSERT INTO foo (EXPLAIN SELECT 1);
>
> In my mind, however, fixing that would be preferable to (and probably
> easier than) changing the whole syntax.
>
>> I agree. We shouldn't be overloading EXPLAIN with a new option every
>> time someone thinks of something new they'd like to see. XML EXPLAIN
>> output would obviously be extensible and machine readable. We could
>> easily produce a library of XSLT stylesheets to format the output in
>> different ways without cluttering the server with extra code. The
>> benefits to applications that want to read the output are also pretty
>> obvious.
>
> Well, the major benefit of the existing EXPLAIN output is that you
> don't need an XSLT stylesheet to read it.  You can just be sitting
> there in psql and do an EXPLAIN, and look at the results, and solve
> your problem.  As we add options to EXPLAIN, I would like to see us
> support those in both the plaintext output and the XML output (and
> hopefully the JSON output) using the same syntax.
>
> If, for example, I need more information on what my hash joins are
> doing, I'd like to be able to do:
>
> EXPLAIN ('hash_detail', 'on') query...
>

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

Pavel

> ...and have it tell me the # of buckets and batches for each hash join
> and whatever other information might be useful in that context.  I
> *don't* want to have the solution to that problem be: run
> explain_xml() on your query, then install an XSLT parser, then use
> this pregenerated XSLT stylesheet to extract your data from the XML
> document you got back from step #1, then view the resulting HTML in
> your web browser.  That may be a good solution for someone, but it
> certainly doesn't sound convenient for anyone who works primarily at
> the command line.
>
> It's also worth noting that there are some kinds of statistics (for
> example, I/O statistics) which can't be gathered without significantly
> impacting the performance of the query.  So you can't just take the
> approach of turning on every possible instrumentation when XML output
> is requested.
>
> I do agree with your point that even with an extensible options
> syntax, we can't just add an option for any old  thing that someone
> wants.  We should try to come up with a somewhat comprehensive list of
> types of instrumentation that someone might want and then group them
> into categories, with one toggle per category.
>
> ...Robert
>


Re: generic options for explain

From
Robert Haas
Date:
>> EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

> I am sorry - this is really strange syntax . Who will use this syntax?
> For some parser is little bit better function call, than parametrized
> statement. Some dificulties with options should be fixed with named
> param (we are speaking about 8.5).
>
> select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

...Robert


Re: generic options for explain

From
Pavel Stehule
Date:
2009/5/24 Robert Haas <robertmhaas@gmail.com>:
>>> EXPLAIN ('hash_detail', 'on') query...
>
> Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
> follow my own syntax.
>
>> I am sorry - this is really strange syntax . Who will use this syntax?
>> For some parser is little bit better function call, than parametrized
>> statement. Some dificulties with options should be fixed with named
>> param (we are speaking about 8.5).
>>
>> select explain_xml("select ...", true as hash_detail, ...)
>
I prefere little bit different syntax for named params like param =
value, or param => value, but syntax with AS is one confirmed.


> See to me THAT is a really strange syntax, so I guess we need some more votes.

ok
>
> ...Robert
>


Re: generic options for explain

From
Andrew Dunstan
Date:

Robert Haas wrote:
>>> EXPLAIN ('hash_detail', 'on') query...
>>>       
>
> Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
> follow my own syntax.
>
>   
>> I am sorry - this is really strange syntax . Who will use this syntax?
>> For some parser is little bit better function call, than parametrized
>> statement. Some dificulties with options should be fixed with named
>> param (we are speaking about 8.5).
>>
>> select explain_xml("select ...", true as hash_detail, ...)
>>     
>
> See to me THAT is a really strange syntax, so I guess we need some more votes.
>
>
>   

Both of these seem both odd an unnecessary. Why not just have a setting 
called, say, explain_format which governs the output?
   set explain_format = 'xml, verbose';   explain select * from foo;

No new function or syntax would be required.

cheers

andrew


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 11:57 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
>> follow my own syntax.
>>> I am sorry - this is really strange syntax . Who will use this syntax?
>>> For some parser is little bit better function call, than parametrized
>>> statement. Some dificulties with options should be fixed with named
>>> param (we are speaking about 8.5).
>>> select explain_xml("select ...", true as hash_detail, ...)
>> See to me THAT is a really strange syntax, so I guess we need some more
>> votes.
> Both of these seem both odd an unnecessary. Why not just have a setting
> called, say, explain_format which governs the output?
>
>   set explain_format = 'xml, verbose';
>   explain select * from foo;
>
> No new function or syntax would be required.

Well, then you have to issue two commands to do one thing.  I don't
much like the practice of using GUCs to control behavior that you may
only want for the duration of one statement.

The fundamental problem here, at least as it seems to me, is that we
want more options for EXPLAIN, but the current syntax won't support
it, because it requires making everything a keyword (and fixing the
order).  So we can either add enough punctuation to de-confuse the
parser (which is what I tried to do) or we can switch to a totally
different method of controlling EXPLAIN behavior (as you and Pavel are
advocating).

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose.  And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it?  Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again.  Blech!

My initial thought was to just use a list of key words for the
parameters, like this:

EXPLAIN (XML, VERBOSE) query...

...but I decided that wasn't a great idea, because it means that every
parameter has to be a boolean, which is probably more limiting than we
want to be.  It also gets confusing because some parameters (like
"XML" and "JSON") are incompatible while others (like "HASH_DETAIL"
and "MERGE_DETAIL") can be used together, but there's not a lot of
clue in the syntax itself that this is the case.  On the other hand,
if you write:

EXPLAIN (FORMAT 'XML', FORMAT 'JSON') ...
EXPLAIN (HASH_DETAIL 'ON', MERGE_DETAIL 'ON') ...

...it's obvious that the first case is qualitatively different from
the second one.  I think this makes it easier to code the behavior in
a way that is consistent and correct, and also easier to document for
the benefit of our users.  Now, we can do this part of it even if we
ultimately end up with a different overall syntax, for example:

explain_query(...query..., 'xml' as format, true as hash_detail)

I do suspect this will end up being a more complicated patch, and as I
said before, I don't really see the point.

As a point of historical interest, way back when (early 1997), we used
to support this syntax:

EXPLAIN WITH [COST|PLAN|FULL] query...

...but of course that's now impossible because WITH can be the first
word of a select query.

...Robert


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I wouldn't mind having a GUC to set the *default* explain behavior -
> but I'd still like to be able to override it for a particular command
> if I so choose.  And that's not going to be possible with your syntax:
> if explain_format is set to 'xml, verbose' and I want plain text
> output for one command, how do I get it?  Presumably I have to change
> explain_format, run my EXPLAIN, and then change it back again.  Blech!

You know about SET LOCAL, no?  I don't think this argument is very
convincing.

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here.  I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable.  What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I wouldn't mind having a GUC to set the *default* explain behavior -
>> but I'd still like to be able to override it for a particular command
>> if I so choose.  And that's not going to be possible with your syntax:
>> if explain_format is set to 'xml, verbose' and I want plain text
>> output for one command, how do I get it?  Presumably I have to change
>> explain_format, run my EXPLAIN, and then change it back again.  Blech!
>
> You know about SET LOCAL, no?  I don't think this argument is very
> convincing.

I completely fail to see how that helps me.  It's not faster or fewer
commands to type:

begin transaction; set local whatever; explain blah; commit transaction;

than it is to type:

set whatever; explain blah; set oldwhatever;

> On the other side of the coin, I'm strongly against inventing more than
> one new output format for EXPLAIN, and so any argument that depends on
> examples such as "xml vs json" is falling on deaf ears here.  I think
> that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
> seem untenable.  What other options than those do you really need?
> Not ones to add or remove output fields; we'd expect the client to
> ignore fields it doesn't care about.

It's not just about me; we've had many requests for new EXPLAIN features.

Personally, I'd like to see the number of buckets and batches that a
hash join uses (predicted and actual), and maybe (with ANALYZE) the
number of hash collisions.  I'd like to see memory utilization
statistics (peak memory utilization of hash join, for example).  Other
people have requested I/O statistics (which you objected to on the
grounds that it would be too much overhead, so clearly if we're ever
going to do it it will have to be optional), and most recently number
of tuples discarded by the filter condition.  We've also had requests
to suppress some information (like costs) for planner regression
testing.

I really don't see the point in restricting the syntax of EXPLAIN in
this way.  I don't know exactly what sorts of useful options people
will come up with, but I do know that as long as we have an extremely
limiting options syntax, they can all be shot down on the grounds that
including them in the default output is too cumbersome for regular use
(which is absolutely true).  On the other hand, I think it's
incredibly naive to suppose that EXPLAIN XML is going to make anyone
very happy.  There are only two ways this can work out:

1. We'll make EXPLAIN XML output everything and the kitchen sink.  In
this case, we'll soon get complaints along the lines of: "I can't use
regular EXPLAIN because it doesn't include the field that I need, but
the output of EXPLAIN XML is so voluminous that I can't read through
it by hand."
-or-
2. We'll be very restrictive about adding fields to EXPLAIN XML just
as we are now for plain EXPLAIN, in which case we haven't solved
anything.

I think XML output format is a complete distraction from the real
issue here, which is that there are certain pieces of information that
are sometimes useful but are not useful enough to justify including
them in the EXPLAIN output 100% of the time.  By just punting all that
stuff to EXPLAIN XML, we're just saying that we're not interested in
creating a workable set of options to allow users to pick and choose
the information they care about - so instead we're going to dump a
huge chunk of unreadable XML and then make it the user's problem to
find a tool that will extract the details that they care about.  Boo,
hiss.

Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
I would like to make the EXPLAIN syntax more powerful for command-line
use, and I'd implement XML format and JSON along the way just for
completeness.  But I don't have much interest in creating an XML
output format that is the ONLY way of getting more information,
because I'm a command-line user and it does me no good at all.  :-(

...Robert


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You know about SET LOCAL, no?  I don't think this argument is very
>> convincing.

> I completely fail to see how that helps me.

Mainly, what it does is avoid having to know exactly what the old
setting was.
        regards, tom lane


Re: generic options for explain

From
Joshua Tolley
Date:
On Sun, May 24, 2009 at 11:57:13AM -0400, Andrew Dunstan wrote:
>
>
> Robert Haas wrote:
>>>> EXPLAIN ('hash_detail', 'on') query...
>>>>
>>
>> Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
>> follow my own syntax.
>>
>>
>>> I am sorry - this is really strange syntax . Who will use this syntax?
>>> For some parser is little bit better function call, than parametrized
>>> statement. Some dificulties with options should be fixed with named
>>> param (we are speaking about 8.5).
>>>
>>> select explain_xml("select ...", true as hash_detail, ...)
>>>
>>
>> See to me THAT is a really strange syntax, so I guess we need some more votes.
>>
>>
>>
>
> Both of these seem both odd an unnecessary. Why not just have a setting
> called, say, explain_format which governs the output?
>
>    set explain_format = 'xml, verbose';
>    explain select * from foo;
>
> No new function or syntax would be required.

A further possibility: Oracle's equivalent of EXPLAIN doesn't actually output
anything to the screen, but rather fills in a (temporary?) table somewhere with
details of the query plan. I mostly found this irritating when working with
Oracle, because each time I used it I had to look up an example query to
generate output like PostgreSQL's EXPLAIN, which is generally what I really
wanted. But since we'd still have the old EXPLAIN behavior available, perhaps
something such as an Oracle-like table filler would be useful.

Such a proposal doesn't answer the need to allow users to specify, for
performance and other reasons, the precise subset of statistics they're
interested in; for whatever it's worth, my current favorite contender in that
field is EXPLAIN (a, b, c) <query>.

- Josh / eggyknap

Re: generic options for explain

From
Greg Smith
Date:
On Sun, 24 May 2009, Pavel Stehule wrote:

> we should have a secondary function explain_query(query_string,
> option) that returns setof some.

+1.  The incremental approach here should first be adding functions that 
actually do the work required.  Then, if there's a set of those that look 
to be extremely useful, maybe at that point it's worth talking about how 
to integrate them into the parser.  Starting with the parser changes 
rather than the parts that actually do the work is backwards.  If you do 
it the other way around, at all times you have a patch that actually 
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the 
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another 
common request in this area).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: generic options for explain

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> On Sun, 24 May 2009, Pavel Stehule wrote:
>> we should have a secondary function explain_query(query_string,
>> option) that returns setof some.

> +1.  The incremental approach here should first be adding functions that 
> actually do the work required.  Then, if there's a set of those that look 
> to be extremely useful, maybe at that point it's worth talking about how 
> to integrate them into the parser.  Starting with the parser changes 
> rather than the parts that actually do the work is backwards.  If you do 
> it the other way around, at all times you have a patch that actually 
> provides immediate useful value were it to be committed.

> Something that returns a setof can also be easily used to implement the 
> "dump EXPLAIN to a table" feature Josh Tolley brought up (which is another 
> common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are.  So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information.  We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 4:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You know about SET LOCAL, no?  I don't think this argument is very
>>> convincing.
>
>> I completely fail to see how that helps me.
>
> Mainly, what it does is avoid having to know exactly what the old
> setting was.

Ah, OK, I see what you're going for.   That's not really what I'm
complaining about with that syntax, though....

...Robert


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
>> On Sun, 24 May 2009, Pavel Stehule wrote:
>>> we should have a secondary function explain_query(query_string,
>>> option) that returns setof some.
>
>> +1.  The incremental approach here should first be adding functions that
>> actually do the work required.  Then, if there's a set of those that look
>> to be extremely useful, maybe at that point it's worth talking about how
>> to integrate them into the parser.  Starting with the parser changes
>> rather than the parts that actually do the work is backwards.  If you do
>> it the other way around, at all times you have a patch that actually
>> provides immediate useful value were it to be committed.
>
>> Something that returns a setof can also be easily used to implement the
>> "dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
>> common request in this area).
>
> A serious problem with EXPLAIN via a function returning set, or with
> putting the result into a table, is that set results are logically
> unordered, just as table contents are.  So from a strict point of view
> this only makes sense when the output format is designed to not depend
> on row ordering to convey information.  We could certainly invent such
> a format, but I think it's a mistake to go in this direction for
> EXPLAIN output that is similar to the current output.

The current output format basically prints out the node type and then
a collection of properties that are associated with that node, where
applicable: join type, scan direction, strategy, relation, alias,
startup cost, total cost, rows, width, loops, filter, join filter,
various types of condition (index/recheck/tid/merge/hash), sort key.
However, we tend to omit certain fields (particularly scan direction
and alias) when they contain information that isn't sufficiently
interesting.  That's probably not a good idea for anything that's
intended for machine-parsing; I think for XML or JSON or
output-to-a-table we'd want to include those fields whenever possible.

With that minor complaint (and the difficulty of figuring out how to
avoid duplicating large portions of explain.c), XML or JSON output
doesn't seem that difficult.  In JSON there aren't too many ways of
doing this that make sense.  I would guess we'd want something like
this:

{ "node" : "Hash Join", "jointype" : "Inner", "startup_cost" :
"11.49", "total_cost" : "92.59", "rows" : "1877", "width" : "325",  'outer' : { "node" : "Seq Scan", <more stuff> }
'inner': { "node" : "Hash", <more stuff> } 
}

XML, being XML, has 10 ways of doing something when 1 is more than
sufficient, so there are a couple of ways we could go.  I think this
is my favorite.

<HashJoin jointype="inner" startup_cost="11.49" total_cost="92.59"
rows="1877" width="325">  <SeqScan ... />  <Hash ...>     <SeqScan ... />  </Hash>
</HashJoin>

or you could just use one node type:

<node type="HashJoin" jointype="inner" startup_cost="11.49"
total_cost="92.59" rows="1877" width="325">  <node type="SeqScan" ... />  <node type="Hash" ...>     <node
type="SeqScan"... />  </node> 
</node>

...and then there's this style:

<HashJoin> <jointype>inner</jointype> <startup_cost>11.49</startup_cost> ... <outer/>    ... </outer> <inner> ...
</inner>
</HashJoin>

...which is incredibly long and redundant, but some people who use XML
like such things.  I'm sure there are other possibilities as well.

With respect to table output things are a little bit less
straightforward.  There are two issues.  With XML or JSON, any
properties that aren't relevant to a particular node can be omitted
altogether, whereas for a table the column list has to be consistent
throughout.  We can just leave the unused columns as NULL, but it will
be a fairly wide table.  Also, with apologies to Josh Berkus, we'll
need to introduce some sort of non-natural primary key to allow
children to be related to parents, because unlike XML and JSON,
there's no built-in way to make one object the child of another.

All that having been said, making EXPLAIN into a query (so that you
can do INSERT INTO foo (EXPLAIN SELECT query...) might be useful to
some people even without changing the output format at all.  I think
you could throw a windowing function in there to at least tag each
line with its original position in the output, and some people might
want just that.

...Robert


Re: generic options for explain

From
Robert Haas
Date:
On Sun, May 24, 2009 at 6:05 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Sun, 24 May 2009, Pavel Stehule wrote:
>
>> we should have a secondary function explain_query(query_string,
>> option) that returns setof some.
>
> +1.  The incremental approach here should first be adding functions that
> actually do the work required.  Then, if there's a set of those that look to
> be extremely useful, maybe at that point it's worth talking about how to
> integrate them into the parser.  Starting with the parser changes rather
> than the parts that actually do the work is backwards.  If you do it the
> other way around, at all times you have a patch that actually provides
> immediate useful value were it to be committed.

Well, perhaps I ought to be asking what sort of features people would
like to see, other than variant output formats?  Maybe if we can
develop some kind of wish list for EXPLAIN, it will become more
obvious what the best option syntax is.

...Robert


Re: generic options for explain

From
Greg Smith
Date:
On Sun, 24 May 2009, Tom Lane wrote:

> A serious problem with EXPLAIN via a function returning set, or with
> putting the result into a table, is that set results are logically
> unordered, just as table contents are.

Fair enough; I think Pavel and myself were presuming an implied "line 
number" in the output there that, as you point out, doesn't actually 
exist.  The actual implementation detail there doesn't really change the 
spirit of what I was trying to suggest though:  that this feature should 
get designed starting with the expected output and how to generate it, 
then work backwards towards how you pass it parameters.

On that topic, I though Tom Raney was the latest to update code for the 
XML output at least.  Code and a presentation going over everything is at 
http://web.cecs.pdx.edu/~raneyt/gsoc/ and there's a video at 
http://www.postgresqlconference.org/2008/west/talks/

A note about that got sent to this list at one point but I don't see any 
follow-up: 
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00802.php

Anybody have a better idea of what happened with that project than me?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: generic options for explain

From
Joshua Tolley
Date:
On Sun, May 24, 2009 at 06:53:29PM -0400, Tom Lane wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
> > On Sun, 24 May 2009, Pavel Stehule wrote:
> >> we should have a secondary function explain_query(query_string,
> >> option) that returns setof some.
>
> > +1.  The incremental approach here should first be adding functions that
> > actually do the work required.  Then, if there's a set of those that look
> > to be extremely useful, maybe at that point it's worth talking about how
> > to integrate them into the parser.  Starting with the parser changes
> > rather than the parts that actually do the work is backwards.  If you do
> > it the other way around, at all times you have a patch that actually
> > provides immediate useful value were it to be committed.
>
> > Something that returns a setof can also be easily used to implement the
> > "dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
> > common request in this area).
>
> A serious problem with EXPLAIN via a function returning set, or with
> putting the result into a table, is that set results are logically
> unordered, just as table contents are.  So from a strict point of view
> this only makes sense when the output format is designed to not depend
> on row ordering to convey information.  We could certainly invent such
> a format, but I think it's a mistake to go in this direction for
> EXPLAIN output that is similar to the current output.

The Oracle version, as it fills the table of explain results, gives each number
an id and the id of its parent row, which behavior we could presumably copy.
I'm definitely keen to keep a human-readable EXPLAIN such as we have now, to
augment the table-based proposal, but a table would provide the more flexible
output we'd need for more detailed reporting, a simple interface for
applications to consume the EXPLAIN data without human intervention, and a
convenient platform from whence the data can be transformed to XML, JSON, etc.
for those that are so inclined.

- Josh / eggyknap

Re: generic options for explain

From
Pavel Stehule
Date:
2009/5/25 Tom Lane <tgl@sss.pgh.pa.us>:
> Greg Smith <gsmith@gregsmith.com> writes:
>> On Sun, 24 May 2009, Pavel Stehule wrote:
>>> we should have a secondary function explain_query(query_string,
>>> option) that returns setof some.
>
>> +1.  The incremental approach here should first be adding functions that
>> actually do the work required.  Then, if there's a set of those that look
>> to be extremely useful, maybe at that point it's worth talking about how
>> to integrate them into the parser.  Starting with the parser changes
>> rather than the parts that actually do the work is backwards.  If you do
>> it the other way around, at all times you have a patch that actually
>> provides immediate useful value were it to be committed.
>
>> Something that returns a setof can also be easily used to implement the
>> "dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
>> common request in this area).
>
> A serious problem with EXPLAIN via a function returning set, or with
> putting the result into a table, is that set results are logically
> unordered, just as table contents are.  So from a strict point of view
> this only makes sense when the output format is designed to not depend
> on row ordering to convey information.  We could certainly invent such
> a format, but I think it's a mistake to go in this direction for
> EXPLAIN output that is similar to the current output.

I don't expect so functional EXPLAIN will be used by users directly.
It' data source for some "GUI". And currently with CTE, there are not
problem transform query to similar output like current EXPLAIN. I am
able to understand some new parameters for explain statement (when
result will be directly read by user), but some output options (or
formating options) I would to see in some other functions.

regards
Pavel Stehule

>
>                        regards, tom lane
>


Re: generic options for explain

From
Dimitri Fontaine
Date:
Hi,

After having read all the followups I already received, I prefer to
answer to this particular message.

Robert Haas <robertmhaas@gmail.com> writes:

> On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On the other side of the coin, I'm strongly against inventing more than
>> one new output format for EXPLAIN, and so any argument that depends on
>> examples such as "xml vs json" is falling on deaf ears here.  I think
>> that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
>> seem untenable.  What other options than those do you really need?
>> Not ones to add or remove output fields; we'd expect the client to
>> ignore fields it doesn't care about.
>
> It's not just about me; we've had many requests for new EXPLAIN
> features.
[...]
> I think XML output format is a complete distraction from the real
> issue here, which is that there are certain pieces of information that
> are sometimes useful but are not useful enough to justify including
> them in the EXPLAIN output 100% of the time.

I think the summary here is to say that we want two modes of operations:- the current one, which continues to get
refinements
- a new one conveying all possible information in machine readable  formats, possibly with some tools to handle it
easily:XML output and  maybe XSLT stylesheets 

> Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
> I would like to make the EXPLAIN syntax more powerful for command-line
> use, and I'd implement XML format and JSON along the way just for
> completeness.  But I don't have much interest in creating an XML
> output format that is the ONLY way of getting more information,
> because I'm a command-line user and it does me no good at all.  :-(

That's only because you seem to be thinking that having core PostgreSQL
do the first half of the work means you as a user will have to do the
second part. I assume pgadmin and phppgadmin developers will offer their
users some graphical approach to the output reading, with dynamic
filtering, eg.

I don't see anything stopping you to provide a simple way to have the
same facility into psql. You can already have the query output filtered
by any script you want this way:=# \o |my_presentation_script <style name> | <stylesheet full path>=# explain XML ...=#
\o

Now, we talked about a better filtering integration into psql more than
once in the past, so there's a path for you to have both complete
EXPLAIN and command line tool suite integration, I'd say.


Oh and by the way, even if I don't like XML that much, this battled is
lost already. The way I understand it, -core will accept only two
EXPLAIN outputs: fit on screen and complete. The second one will have to
be easy for machine consumption, and so use a format anyone can work
with. The choice has been made, it's XML.

If you want to develop analysis tools which will parse JSON rather than
XML, nothing stops you writing a stylesheet to provide JSON output from
XML. Of course then you want this to happen server-side, so you want the
EXPLAIN output to be processed from within the database, all the more if
it's XML and we have the ability to apply XSLT to XML content already in
the server.

But being able to apply SQL level function calls to EXPLAIN output seems
a different subject alltogether...

Regards,
--
dim


Re: generic options for explain

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Sent by mistake to Robert Haas only at first try. No cure for fat
fingers, I guess]

On Sun, May 24, 2009 at 04:05:18PM -0400, Robert Haas wrote:

[...]

> I think XML output format is a complete distraction from the real
> issue here, which is that there are certain pieces of information that
> are sometimes useful but are not useful enough to justify including
> them in the EXPLAIN output 100% of the time.  By just punting all that
> stuff to EXPLAIN XML, we're just saying that we're not interested in
> creating a workable set of options to allow users to pick and choose
> the information they care about - so instead we're going to dump a
> huge chunk of unreadable XML and then make it the user's problem to
> find a tool that will extract the details that they care about.  Boo,
> hiss.

+1

In my experience, this happens with "XML the data description language"
many times, but I haven' seen the problem as well-stated as in your
mail.

Hard for humans to read, hard for machines to read, and often trying to
solve a problem it can't (in this case, selecting the needed information
_beforehand_).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKGntjBcgs9XrR2kYRAk1/AJ4rnZFnU4PFM8AJkaYYYLRInYHJDQCbBbt2
lTwxydBBnXP1MgDxz+vcpM8=
=o2qW
-----END PGP SIGNATURE-----


Re: generic options for explain

From
Robert Haas
Date:
On Mon, May 25, 2009 at 6:24 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:
> I think the summary here is to say that we want two modes of operations:
>  - the current one, which continues to get refinements
>
>  - a new one conveying all possible information in machine readable
>   formats, possibly with some tools to handle it easily: XML output and
>   maybe XSLT stylesheets

I don't agree with that summary.  Many people who responded to this
thread were fine with the idea of some sort of options syntax, but we
had at least four different proposals for how to implement it:

Robert Haas: EXPLAIN (foo 'bar', baz 'bletch', ...) query
Pavel Stehule: explain_query(query, options...) [exact format of
options not specified]
Andrew Dunstan:  SET explain_format = 'foo, baz'; EXPLAIN query
Josh Tolley: EXPLAIN (foo, baz, ...) query [also suggested by me as an
idea I rejected]

Tom Lane was the only person to suggest that we only ever need one
more option to EXPLAIN and that it should be called XML.  Even though
I prefer my format to the other options suggested (which I would
probably rank in order of descending preference Josh-Andrew-Pavel), I
am actually someone encouraged that we might have some kind of fragile
consensus that an extensible options syntax is useful (a point that
Andres Freund and Greg Smith also seemed to agree with).

>> Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
>> I would like to make the EXPLAIN syntax more powerful for command-line
>> use, and I'd implement XML format and JSON along the way just for
>> completeness.  But I don't have much interest in creating an XML
>> output format that is the ONLY way of getting more information,
>> because I'm a command-line user and it does me no good at all.  :-(
>
> That's only because you seem to be thinking that having core PostgreSQL
> do the first half of the work means you as a user will have to do the
> second part. I assume pgadmin and phppgadmin developers will offer their
> users some graphical approach to the output reading, with dynamic
> filtering, eg.
>
> I don't see anything stopping you to provide a simple way to have the
> same facility into psql. You can already have the query output filtered
> by any script you want this way:
>  =# \o |my_presentation_script <style name> | <stylesheet full path>
>  =# explain XML ...
>  =# \o

This is all much more complicated than what I proposed, and I fail to
see what it buys us.  I'd say that you're just reinforcing the point I
made upthread, which is that insisting that XML is the only way to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

...Robert


Re: generic options for explain

From
Joshua Tolley
Date:
On Mon, May 25, 2009 at 07:14:56AM -0400, Robert Haas wrote:
> Many people who responded to this
> thread were fine with the idea of some sort of options syntax, but we
> had at least four different proposals for how to implement it:
>
> Robert Haas: EXPLAIN (foo 'bar', baz 'bletch', ...) query
> Pavel Stehule: explain_query(query, options...) [exact format of
> options not specified]
> Andrew Dunstan:  SET explain_format = 'foo, baz'; EXPLAIN query
> Josh Tolley: EXPLAIN (foo, baz, ...) query [also suggested by me as an
> idea I rejected]

I hadn't actually caught that there were two ideas on the table with syntax
similar to "EXPLAIN (...) <query>", and don't mean to champion either of the
two specifically (at least until I've read closely enough to note the
difference). I just kinda liked the "EXPLAIN (some options of some sort)
<query>" syntax better than other proposals.

That said, I think I'm changing my vote in favor of Pavel. It's my guess that
some variant of his version would be the easiest to make compliant with the
bit I'm most interested in, which is not being limited to, say,
XML/JSON/YAML/etc. in the output. Applications that use PostgreSQL will of
necessity need to know how to handle data presented in tables, so let's
present our explain results as a table. As has been said in other branches of
this thread, that way we don't force applications also to support
XML/JSON/YAML/etc. We might consider providing functions to convert the
tabular result to one or more of those formats, but at its inception, the data
should live as tuples in a relation.

In other messages, I've advocated actually inserting the data into a table. I
think that was a mistake. Who makes the table? What's it called? What schema
is it in? Who cleans it up when we're done with it? ...etc. I'd much rather
see a bunch of rows returned as a set, which I can then insert into a table,
pass into a function for reformatting, or just consume in an application.

All of which leads me to this variant of the functional approach as my answer:

SELECT * FROM pg_explain_query("<query>", <options in a still-unspecified
format>);

I could then do things like this:

CREATE TABLE explain_results AS SELECT * FROM pg_explain_query(...);

and this:

SELECT xmlify_a_record(pg_explain_query(...));

- Josh / eggyknap

Re: generic options for explain

From
Tom Lane
Date:
Joshua Tolley <eggyknap@gmail.com> writes:
> The Oracle version, as it fills the table of explain results, gives
> each number an id and the id of its parent row, which behavior we
> could presumably copy.  I'm definitely keen to keep a human-readable
> EXPLAIN such as we have now, to augment the table-based proposal, but
> a table would provide the more flexible output we'd need for more
> detailed reporting, a simple interface for applications to consume the
> EXPLAIN data without human intervention, and a convenient platform
> from whence the data can be transformed to XML, JSON, etc.  for those
> that are so inclined.

I would think a table would be considerably *less* flexible --- you
could not easily change the output column set.  Unless you're imagining
just dumping something equivalent to the current output into a text
column.  Which would be flexible, but it'd hardly have any of the
other desirable properties you list.
        regards, tom lane


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> This is all much more complicated than what I proposed, and I fail to
> see what it buys us.  I'd say that you're just reinforcing the point I
> made upthread, which is that insisting that XML is the only way to get
> more detailed information will just create a cottage industry of
> beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the worst
option available, except for all the others.  We need something that can
represent a fairly complex data structure, easily supports addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have to
write your own parser, in most languages".  How many realistic
alternatives are there?
        regards, tom lane


Re: generic options for explain

From
Joshua Tolley
Date:
On Mon, May 25, 2009 at 10:55:48AM -0400, Tom Lane wrote:
> Joshua Tolley <eggyknap@gmail.com> writes:
> > The Oracle version, as it fills the table of explain results, gives
> > each number an id and the id of its parent row, which behavior we
> > could presumably copy.  I'm definitely keen to keep a human-readable
> > EXPLAIN such as we have now, to augment the table-based proposal, but
> > a table would provide the more flexible output we'd need for more
> > detailed reporting, a simple interface for applications to consume the
> > EXPLAIN data without human intervention, and a convenient platform
> > from whence the data can be transformed to XML, JSON, etc.  for those
> > that are so inclined.
>
> I would think a table would be considerably *less* flexible --- you
> could not easily change the output column set.  Unless you're imagining
> just dumping something equivalent to the current output into a text
> column.  Which would be flexible, but it'd hardly have any of the
> other desirable properties you list.

I'm not sure I see why it would be less flexible. I'm imagining we define some
record type, and a function that returns a set of those records. The fields in
the record would include data element this version of explain could possibly
return. Then you call a function to explain a query, passing it some options
to select which of those data elements you're interested in. The function
returns the same data type at each call, filling with NULLs the fields you've
told it you're uninterested in. Changes between versions would modify this
data type, but provided consumer applications have specified the columns
they're interested in (rather than, say, SELECT *) that shouldn't bother
anyone.

The functions to change this into some other format would probably need to be
more intelligent than just that. That seems a fair price to pay.

- Josh / eggyknap

Re: generic options for explain

From
"Massa, Harald Armin"
Date:
<br /><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204);
margin:0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> The impression I have is that (to misquote Churchill) XML is the
worst<br/> option available, except for all the others.  We need something that can<br /> represent a fairly complex
datastructure, easily supports addition or<br /> removal of particular fields in the structure (including fields not<br
/>foreseen in the original design), is not hard for programs to parse,<br /> and is widely supported --- ie, "not hard"
includes"you don't have to<br /> write your own parser, in most languages".  How many realistic<br /> alternatives are
there?<br/><br /></blockquote></div>One realistic alternative may be JSON: parsers for nearly all languages are freely
available;everything web-affected speeks it perfectly; it's easier on the eye then XML, less bytes per information,
additionand removal of fields as well as complex structures are possible.<br /><br />Harald<br /><br /><br /><br
clear="all"/><br />-- <br />GHUM Harald Massa<br />persuadere et programmare<br />Harald Armin Massa<br />Spielberger
Straße49<br />70435 Stuttgart<br />0173/9409607<br />no fx, no carrier pigeon <br />-<br />LASIK good, steroids bad?<br
/>

Re: generic options for explain

From
Tom Lane
Date:
Joshua Tolley <eggyknap@gmail.com> writes:
> I'm not sure I see why it would be less flexible. I'm imagining we define some
> record type, and a function that returns a set of those records.

I'm unimpressed by the various proposals to change EXPLAIN into a
function.  Quoting the command-to-explain is going to be a pain in the
neck.  And can you really imagine using it manually, especially if it
returns so many fields that you *have to* write out the list of fields
you actually want, else the result is unreadable?  It's going to be just
as much of something you can only use through a helper application as
the XML way would be.
        regards, tom lane


Re: generic options for explain

From
David Fetter
Date:
On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > This is all much more complicated than what I proposed, and I fail
> > to see what it buys us.  I'd say that you're just reinforcing the
> > point I made upthread, which is that insisting that XML is the
> > only way to get more detailed information will just create a
> > cottage industry of beating that XML output format into
> > submission.
> 
> The impression I have is that (to misquote Churchill) XML is the
> worst option available, except for all the others.  We need
> something that can represent a fairly complex data structure, easily
> supports addition or removal of particular fields in the structure
> (including fields not foreseen in the original design), is not hard
> for programs to parse, and is widely supported --- ie, "not hard"
> includes "you don't have to write your own parser, in most
> languages".  How many realistic alternatives are there?

JSON for one, and it's *much* lighter in just about every way.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: generic options for explain

From
Tom Raney
Date:
David Fetter wrote:
> On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote:
>   
>> Robert Haas <robertmhaas@gmail.com> writes:
>>     
>>> This is all much more complicated than what I proposed, and I fail
>>> to see what it buys us.  I'd say that you're just reinforcing the
>>> point I made upthread, which is that insisting that XML is the
>>> only way to get more detailed information will just create a
>>> cottage industry of beating that XML output format into
>>> submission.
>>>       
>> The impression I have is that (to misquote Churchill) XML is the
>> worst option available, except for all the others.  We need
>> something that can represent a fairly complex data structure, easily
>> supports addition or removal of particular fields in the structure
>> (including fields not foreseen in the original design), is not hard
>> for programs to parse, and is widely supported --- ie, "not hard"
>> includes "you don't have to write your own parser, in most
>> languages".  How many realistic alternatives are there?
>>     
>
> JSON for one, and it's *much* lighter in just about every way.
>
> Cheers,
> David.
>   

For what it's worth, if this revised output form is destined for 
consumption by a machine, it really doesn't matter what protocol is used 
and how 'readable' it is by humans, as long as the protocol can express 
all the present and anticipated variations of the data without breaking 
parsers along the way.

While building the Visual Planner tool, I selected XML output for no 
other reason than it was easy to parse on the receiving end and was 
hierarchical, making it perfect for representing a plan tree - or 
thousands.  I'm sure other alternatives would have been fine as well.  
But, once that decision was made, I never had any reason again to look 
at the XML stream. 

If we're worried about the excess 'weight' of XML, I found this to be a 
non-issue in practice.  The output generated by the 
Visual-Planner-Enabled Postgres server contains MUCH more information 
that one would typically see with standard EXPLAIN.  The tool returns 
not only the most-optimal plan, but all discarded plans as well.  A four 
way join results in output of 24k lines of XML.  While it parses nearly 
instantly, the biggest delay is in the network.  And, even this is minimal.

So, why not put ALL interesting data in the EXPLAIN XML feed?  I'm not 
suggesting for this discussion that we include discarded plans, but that 
we include every piece of data that may be of interest to folks building 
connecting tools.  The parsers can pick and choose what they use easily 
and, because the feed isn't positional, won't break when addition data 
is added.  A GUC parameter could govern the data included in this 
variant of EXPLAIN, but even that seems unnecessary.  This approach will 
allow the standard EXPLAIN to evolve in whatever way pleases the humans 
without interfering with the machines.

Regards,

Tom Raney






Re: generic options for explain

From
Michael Glaesemann
Date:
On May 25, 2009, at 0:47 , Joshua Tolley wrote:

> On Sun, May 24, 2009 at 06:53:29PM -0400, Tom Lane wrote:
>> Greg Smith <gsmith@gregsmith.com> writes:
>>> On Sun, 24 May 2009, Pavel Stehule wrote:
>>>> we should have a secondary function explain_query(query_string,
>>>> option) that returns setof some.
>>
>>> +1.  The incremental approach here should first be adding  
>>> functions that
>>> actually do the work required.  Then, if there's a set of those  
>>> that look
>>> to be extremely useful, maybe at that point it's worth talking  
>>> about how
>>> to integrate them into the parser.  Starting with the parser changes
>>> rather than the parts that actually do the work is backwards.  If  
>>> you do
>>> it the other way around, at all times you have a patch that actually
>>> provides immediate useful value were it to be committed.
>>
>>> Something that returns a setof can also be easily used to  
>>> implement the
>>> "dump EXPLAIN to a table" feature Josh Tolley brought up (which is  
>>> another
>>> common request in this area).
>>
>> A serious problem with EXPLAIN via a function returning set, or with
>> putting the result into a table, is that set results are logically
>> unordered, just as table contents are.  So from a strict point of  
>> view
>> this only makes sense when the output format is designed to not  
>> depend
>> on row ordering to convey information.  We could certainly invent  
>> such
>> a format, but I think it's a mistake to go in this direction for
>> EXPLAIN output that is similar to the current output.
>
> The Oracle version, as it fills the table of explain results, gives  
> each number
> an id and the id of its parent row, which behavior we could  
> presumably copy.

Or some other schema that allows us to preserve the tree.

Michael Glaesemann
grzm seespotcode net





Re: generic options for explain

From
Joshua Tolley
Date:
On Mon, May 25, 2009 at 11:22:24AM -0400, Tom Lane wrote:
> Joshua Tolley <eggyknap@gmail.com> writes:
> > I'm not sure I see why it would be less flexible. I'm imagining we define some
> > record type, and a function that returns a set of those records.
>
> I'm unimpressed by the various proposals to change EXPLAIN into a
> function.  Quoting the command-to-explain is going to be a pain in the
> neck.

Yeah, that's been bugging me, despite my recent support of that plan.

> And can you really imagine using it manually, especially if it
> returns so many fields that you *have to* write out the list of fields
> you actually want, else the result is unreadable?  It's going to be just
> as much of something you can only use through a helper application as
> the XML way would be.

Good point. The reason, as I remember it, that we wanted to be able to specify
what fields are returned was so that fields that are expensive to calculate
are calculated only when the user wants them. If that's the only
consideration, perhaps we should have a standard version and a "FULL" version,
e.g.

EXPLAIN [ANALYZE] [FULL] <query>

...where FULL would indicate the user wanted the all available statistics, not
just the cheap ones. Somewhere in there we'd also need an indicator to say we
wanted an output format other than the usual text version (such as the "WITH
XML" clause I think someone suggested); I maintain it's all just a table of
data, and should be represented the same way we represent any other table of
data.

- Josh / eggyknap

Re: generic options for explain

From
Andres Freund
Date:
Hi Tom,

On 05/25/2009 08:04 PM, Tom Raney wrote:
> So, why not put ALL interesting data in the EXPLAIN XML feed? I'm not
> suggesting for this discussion that we include discarded plans, but that
> we include every piece of data that may be of interest to folks building
> connecting tools. The parsers can pick and choose what they use easily
> and, because the feed isn't positional, won't break when addition data
> is added. A GUC parameter could govern the data included in this variant
> of EXPLAIN, but even that seems unnecessary. This approach will allow
> the standard EXPLAIN to evolve in whatever way pleases the humans
> without interfering with the machines.
Well, there is the problem Robert Haas described - some stats may be too 
expensive to gather (like the io-stats) for regular use, but still be 
quite usefull.

Andres


Re: generic options for explain

From
Robert Haas
Date:
On Mon, May 25, 2009 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Joshua Tolley <eggyknap@gmail.com> writes:
>> I'm not sure I see why it would be less flexible. I'm imagining we define some
>> record type, and a function that returns a set of those records.
>
> I'm unimpressed by the various proposals to change EXPLAIN into a
> function.  Quoting the command-to-explain is going to be a pain in the
> neck.

I agree with this, but there is a lot of sentiment (which I share)
that it should be possible to capture EXPLAIN output using subselect
or CTAS syntax, regardless of exactly what that output ends up being.
That seems to require that EXPLAIN be a fully-reserved keyword, so I
wonder what we think about that.

(The way I tested this quickly is to make '(' ExplainStmt ')' a third
production for select_with_parens.  I'm not 100% sure that's the right
place for it, but a couple of other reasonable-looking places produced
non-obvious parsing conflicts.)

> And can you really imagine using it manually, especially if it
> returns so many fields that you *have to* write out the list of fields
> you actually want, else the result is unreadable?  It's going to be just
> as much of something you can only use through a helper application as
> the XML way would be.

Nothing could possibly be as bad as XML.  I'm with Josh: if we produce
table-formatted output, someone can always turn it into XML or JSON or
whatever they want.  The reverse figures to be a whole lot more
difficult.

...Robert


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I agree with this, but there is a lot of sentiment (which I share)
> that it should be possible to capture EXPLAIN output using subselect
> or CTAS syntax, regardless of exactly what that output ends up being.

Well, it should be possible to capture the output, but let's not
prejudge the syntax.

> That seems to require that EXPLAIN be a fully-reserved keyword, so I
> wonder what we think about that.

Nonstarter, especially when it's so obviously possible to do it without
that.  The threshold for reserving words that aren't reserved by SQL
spec has to be really high, because you will break innocent applications
that way.

Before anyone gets overly excited about having special syntax for this,
I should point out that you can do it today, for instance like so:

regression=# create function expl(q text) returns setof text as $$
regression$# declare r record;
regression$# begin
regression$#   for r in execute 'explain ' || q loop
regression$#     return next r."QUERY PLAN";
regression$#   end loop;
regression$# end$$ language plpgsql;
CREATE FUNCTION

regression=# select * from expl('select * from tenk1');                           expl                             
-------------------------------------------------------------Seq Scan on tenk1  (cost=0.00..458.00 rows=10000
width=244)
(1 row)

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist.  There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Mon, May 25, 2009 at 6:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I agree with this, but there is a lot of sentiment (which I share)
>> that it should be possible to capture EXPLAIN output using subselect
>> or CTAS syntax, regardless of exactly what that output ends up being.
>
> Well, it should be possible to capture the output, but let's not
> prejudge the syntax.
>
>> That seems to require that EXPLAIN be a fully-reserved keyword, so I
>> wonder what we think about that.
>
> Nonstarter, especially when it's so obviously possible to do it without
> that.  The threshold for reserving words that aren't reserved by SQL
> spec has to be really high, because you will break innocent applications
> that way.
>
> Before anyone gets overly excited about having special syntax for this,
> I should point out that you can do it today, for instance like so:
>
> regression=# create function expl(q text) returns setof text as $$
> regression$# declare r record;
> regression$# begin
> regression$#   for r in execute 'explain ' || q loop
> regression$#     return next r."QUERY PLAN";
> regression$#   end loop;
> regression$# end$$ language plpgsql;
> CREATE FUNCTION
>
> regression=# select * from expl('select * from tenk1');
>                            expl
> -------------------------------------------------------------
>  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
> (1 row)
>
> Admittedly this is a bit inconvenient, but the point is that the
> functionality does exist.  There is no need to have a built-in
> version of this function unless we get significant advantages
> from having it built-in, and right now I'm not seeing those.

The only problem I see with this is that there's no convenient way of
specifying the options you want.  Granted, it wouldn't be all that
difficult to add a couple of boolean options to specify the state of
EXPLAIN and ANALYZE, but if we have more options, it starts to get a
bit complicated, especially if they do things like change the set of
output columns.  I'm still liking that idea, but even if we don't end
up implementing that particular thing, I'm really doubtful that the
need to make EXPLAIN do more things is likely to go away.

...Robert


Re: generic options for explain

From
Greg Stark
Date:
On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Admittedly this is a bit inconvenient, but the point is that the
> functionality does exist.  There is no need to have a built-in
> version of this function unless we get significant advantages
> from having it built-in, and right now I'm not seeing those.

I assume people don't want the *text* of the current output format but
the actual values in separate columns. So you could do things like
accumulate the data in a table and later use sql to search for queries
using specific indexes or where estimates are off etc.

--
greg


Re: generic options for explain

From
Tom Lane
Date:
Greg Stark <stark@enterprisedb.com> writes:
> On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Admittedly this is a bit inconvenient, but the point is that the
>> functionality does exist. �There is no need to have a built-in
>> version of this function unless we get significant advantages
>> from having it built-in, and right now I'm not seeing those.

> I assume people don't want the *text* of the current output format but
> the actual values in separate columns.

Well, I notice that everyone is carefully dodging the subject of exactly
what columns they want, but my example would clearly scale easily to any
specific set of output columns that EXPLAIN might return instead of one
text column.  Since we were previously told that any particular release
of PG need only offer one set of possible output columns, I figured the
problem was solved ;-)
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Mon, May 25, 2009 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@enterprisedb.com> writes:
>> On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Admittedly this is a bit inconvenient, but the point is that the
>>> functionality does exist.  There is no need to have a built-in
>>> version of this function unless we get significant advantages
>>> from having it built-in, and right now I'm not seeing those.
>
>> I assume people don't want the *text* of the current output format but
>> the actual values in separate columns.
>
> Well, I notice that everyone is carefully dodging the subject of exactly
> what columns they want,

I had a try at this upthread, actually, but it's not a real easy problem.

> but my example would clearly scale easily to any
> specific set of output columns that EXPLAIN might return instead of one
> text column.  Since we were previously told that any particular release
> of PG need only offer one set of possible output columns, I figured the
> problem was solved ;-)

I was totally unconvinced by that argument.

I actually think that the best data structure for this would be
something like hstore.  It would sure be nice to be able to manipulate
this data using SQL: I am sure there are people on this mailing list
who hate XML and maybe a few who hate JSON, but if they hate SQL then
they're off my list of people I care about making happy.  :-)   At the
same time, the variable number of output columns is problematic for a
flat table representation.  It may not be so problematic that we can't
work around it, but it's definitely not great.

It's really the pits to think that our data model is so impoverished
that it can't in a reasonable way handle the output of our EXPLAIN
command.  It would be awfully sweet to be able to do things like: show
me all the plan nodes where the expected and actual row counts
differed by more than a factor of 10.  And why should I need an
external XML/JSON parser to do that, rather than just a WHERE clause?

...Robert


Re: generic options for explain

From
Peter Eisentraut
Date:
On Monday 25 May 2009 18:02:53 Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > This is all much more complicated than what I proposed, and I fail to
> > see what it buys us.  I'd say that you're just reinforcing the point I
> > made upthread, which is that insisting that XML is the only way to get
> > more detailed information will just create a cottage industry of
> > beating that XML output format into submission.
>
> The impression I have is that (to misquote Churchill) XML is the worst
> option available, except for all the others.  We need something that can
> represent a fairly complex data structure, easily supports addition or
> removal of particular fields in the structure (including fields not
> foreseen in the original design), is not hard for programs to parse,
> and is widely supported --- ie, "not hard" includes "you don't have to
> write your own parser, in most languages".  How many realistic
> alternatives are there?

I think we are going in the wrong direction.  No one has said that they want a 
machine-readable EXPLAIN format.  OK, there are historically about three 
people that want one, but they have already solved the problem of parsing the 
current format.  And without having writtens such a parser myself I think that 
the current format is not inherently hard to parse.

What people really want is optional additional information in the human-
readable format.  Giving them a machine readable format does not solve the 
problem.  Giving them a machine readable format with all-or-none of the 
optional information and saying "figure it out yourself" does not solve 
anything either.  The same people who currently complain will continue to 
complain.


Re: generic options for explain

From
Magnus Hagander
Date:
Peter Eisentraut wrote:
> On Monday 25 May 2009 18:02:53 Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> This is all much more complicated than what I proposed, and I fail to
>>> see what it buys us.  I'd say that you're just reinforcing the point I
>>> made upthread, which is that insisting that XML is the only way to get
>>> more detailed information will just create a cottage industry of
>>> beating that XML output format into submission.
>> The impression I have is that (to misquote Churchill) XML is the worst
>> option available, except for all the others.  We need something that can
>> represent a fairly complex data structure, easily supports addition or
>> removal of particular fields in the structure (including fields not
>> foreseen in the original design), is not hard for programs to parse,
>> and is widely supported --- ie, "not hard" includes "you don't have to
>> write your own parser, in most languages".  How many realistic
>> alternatives are there?
> 
> I think we are going in the wrong direction.  No one has said that they want a 
> machine-readable EXPLAIN format. 

That is not true. Tool developers like pgAdmin (I know that one for
sure), phpPgAdmin (I think they have said it too) and third party tools
have asked for this.

Right now we parse the EXPLAIN output. Which doesn't get easier with
each new thing we add to it :-)

It would be very nice to have it tool parseable.

I'm also fairly certain that people using auto_explain would have use
for a format that's easier to parse.


> What people really want is optional additional information in the human-
> readable format.  Giving them a machine readable format does not solve the 
> problem.  Giving them a machine readable format with all-or-none of the 
> optional information and saying "figure it out yourself" does not solve 
> anything either.  The same people who currently complain will continue to 
> complain.

I agree that this is a separate issue. But that doesn't mean they don't
both exist.

//Magnus



Re: generic options for explain

From
Dave Page
Date:
On Tue, May 26, 2009 at 8:15 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

> I think we are going in the wrong direction.  No one has said that they want a
> machine-readable EXPLAIN format.  OK, there are historically about three
> people that want one, but they have already solved the problem of parsing the
> current format.

Pretty sure I've said I want one. And whilst it's true, we already
parse the current output in pgAdmin, it's a PITA whenever the format
changes. I also want a format in which Tom is not going to refuse to
include additional data (such as the schema a relation is in) because
it clutters the output. A machine readable format would seem to the
idea way to include all data we may need, without making
human-readable output an unreadable mess.

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: generic options for explain

From
Robert Haas
Date:
On May 26, 2009, at 8:15 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

> On Monday 25 May 2009 18:02:53 Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> This is all much more complicated than what I proposed, and I fail  
>>> to
>>> see what it buys us.  I'd say that you're just reinforcing the  
>>> point I
>>> made upthread, which is that insisting that XML is the only way to  
>>> get
>>> more detailed information will just create a cottage industry of
>>> beating that XML output format into submission.
>>
>> The impression I have is that (to misquote Churchill) XML is the  
>> worst
>> option available, except for all the others.  We need something  
>> that can
>> represent a fairly complex data structure, easily supports addition  
>> or
>> removal of particular fields in the structure (including fields not
>> foreseen in the original design), is not hard for programs to parse,
>> and is widely supported --- ie, "not hard" includes "you don't have  
>> to
>> write your own parser, in most languages".  How many realistic
>> alternatives are there?
>
> I think we are going in the wrong direction.  No one has said that  
> they want a
> machine-readable EXPLAIN format.  OK, there are historically about  
> three
> people that want one, but they have already solved the problem of  
> parsing the
> current format.  And without having writtens such a parser myself I  
> think that
> the current format is not inherently hard to parse.
>
> What people really want is optional additional information in the  
> human-
> readable format.  Giving them a machine readable format does not  
> solve the
> problem.  Giving them a machine readable format with all-or-none of  
> the
> optional information and saying "figure it out yourself" does not  
> solve
> anything either.  The same people who currently complain will  
> continue to
> complain.

Peter,

The check is in the mail. :-)

In all seriousness, I have no problem at all with providing machine- 
readable formats, but the problem you're describing here is definitely  
my primary pain point.

...Robert


Re: generic options for explain

From
Greg Stark
Date:
Well I want an SQL query-able format.  I also want a way to retrieve  
the data for a query run from within an application without disturbing  
the application i.e. while still returning the regular result set.

But I also like being able to conveniently run explain and get the  
results formatted to fit on the screen in a single step. I don't see  
anything wrong with Robert's direction to pass options to explain. It  
doesn't solve every problem but it doesn't make any of the other  
things we need harder either.

On a bike-shedding note I would rather have the rhs of the option be  
optional and default to true for boolean options.

Actually if we make a set of explain_* guc options we could make the  
options just locally set those options.

-- 
Greg


On 26 May 2009, at 13:15, Peter Eisentraut <peter_e@gmx.net> wrote:

> On Monday 25 May 2009 18:02:53 Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> This is all much more complicated than what I proposed, and I fail  
>>> to
>>> see what it buys us.  I'd say that you're just reinforcing the  
>>> point I
>>> made upthread, which is that insisting that XML is the only way to  
>>> get
>>> more detailed information will just create a cottage industry of
>>> beating that XML output format into submission.
>>
>> The impression I have is that (to misquote Churchill) XML is the  
>> worst
>> option available, except for all the others.  We need something  
>> that can
>> represent a fairly complex data structure, easily supports addition  
>> or
>> removal of particular fields in the structure (including fields not
>> foreseen in the original design), is not hard for programs to parse,
>> and is widely supported --- ie, "not hard" includes "you don't have  
>> to
>> write your own parser, in most languages".  How many realistic
>> alternatives are there?
>
> I think we are going in the wrong direction.  No one has said that  
> they want a
> machine-readable EXPLAIN format.  OK, there are historically about  
> three
> people that want one, but they have already solved the problem of  
> parsing the
> current format.  And without having writtens such a parser myself I  
> think that
> the current format is not inherently hard to parse.
>
> What people really want is optional additional information in the  
> human-
> readable format.  Giving them a machine readable format does not  
> solve the
> problem.  Giving them a machine readable format with all-or-none of  
> the
> optional information and saying "figure it out yourself" does not  
> solve
> anything either.  The same people who currently complain will  
> continue to
> complain.
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: generic options for explain

From
Robert Haas
Date:
On May 26, 2009, at 8:46 AM, Greg Stark <greg.stark@enterprisedb.com>  
wrote:

> Well I want an SQL query-able format.  I also want a way to retrieve  
> the data for a query run from within an application without  
> disturbing the application i.e. while still returning the regular  
> result set.
>
> But I also like being able to conveniently run explain and get the  
> results formatted to fit on the screen in a single step. I don't see  
> anything wrong with Robert's direction to pass options to explain.  
> It doesn't solve every problem but it doesn't make any of the other  
> things we need harder either.

Your check is in the mail, too.

> On a bike-shedding note I would rather have the rhs of the option be  
> optional and default to true for boolean options.

I was thinking about that, too, so +1.

> Actually if we make a set of explain_* guc options we could make the  
> options just locally set those options.

I think that's probably over-complicated, but that's just MHO.

...Robert

>
>
> -- 
> Greg
>
>
> On 26 May 2009, at 13:15, Peter Eisentraut <peter_e@gmx.net> wrote:
>
>> On Monday 25 May 2009 18:02:53 Tom Lane wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> This is all much more complicated than what I proposed, and I  
>>>> fail to
>>>> see what it buys us.  I'd say that you're just reinforcing the  
>>>> point I
>>>> made upthread, which is that insisting that XML is the only way  
>>>> to get
>>>> more detailed information will just create a cottage industry of
>>>> beating that XML output format into submission.
>>>
>>> The impression I have is that (to misquote Churchill) XML is the  
>>> worst
>>> option available, except for all the others.  We need something  
>>> that can
>>> represent a fairly complex data structure, easily supports  
>>> addition or
>>> removal of particular fields in the structure (including fields not
>>> foreseen in the original design), is not hard for programs to parse,
>>> and is widely supported --- ie, "not hard" includes "you don't  
>>> have to
>>> write your own parser, in most languages".  How many realistic
>>> alternatives are there?
>>
>> I think we are going in the wrong direction.  No one has said that  
>> they want a
>> machine-readable EXPLAIN format.  OK, there are historically about  
>> three
>> people that want one, but they have already solved the problem of  
>> parsing the
>> current format.  And without having writtens such a parser myself I  
>> think that
>> the current format is not inherently hard to parse.
>>
>> What people really want is optional additional information in the  
>> human-
>> readable format.  Giving them a machine readable format does not  
>> solve the
>> problem.  Giving them a machine readable format with all-or-none of  
>> the
>> optional information and saying "figure it out yourself" does not  
>> solve
>> anything either.  The same people who currently complain will  
>> continue to
>> complain.
>>
>> -- 
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers


Re: generic options for explain

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I think we are going in the wrong direction.  No one has said that
> they want a machine-readable EXPLAIN format.  OK, there are
> historically about three people that want one, but they have already
> solved the problem of parsing the current format.

Well, obviously the set of tool designers is smaller than the set of
casual users of EXPLAIN, but their problems are none the less real and
very important.

> What people really want is optional additional information in the human-
> readable format.  Giving them a machine readable format does not solve the 
> problem.

Actually, the exact problem is this: those two goals are in conflict.
There'd be little objection to adding any random set of optional stuff
to EXPLAIN's textual output, if it weren't for the fact that it would
make machine parsing of that output even harder than it is already.

So my feeling is that we need a machine-readable format containing all
the data in order to satisfy the needs of tool designers.  Once they
are freed from having to parse EXPLAIN's textual output, we can whack
the textual output around all we want.  (Which kills my previous
argument that we only need one new option, but such is life.)

Now there is a third set of desires having to do with being able to do
simple SQL-based analysis of EXPLAIN output.  That's the piece I think
we don't have a good handle on.  In particular, it's not clear whether
a SQL-friendly output format can be the same as either of the other
two.  (I don't personally find this goal very compelling --- there is
no natural law saying that SQL is a good tool for analyzing EXPLAIN
output --- but I'm willing to look at it to see if it's feasible.)
        regards, tom lane


Re: generic options for explain

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Now there is a third set of desires having to do with being able to do
> simple SQL-based analysis of EXPLAIN output.  That's the piece I think
> we don't have a good handle on.  In particular, it's not clear whether
> a SQL-friendly output format can be the same as either of the other
> two.  (I don't personally find this goal very compelling --- there is
> no natural law saying that SQL is a good tool for analyzing EXPLAIN
> output --- but I'm willing to look at it to see if it's feasible.)
>
>             
>   

In libxml-enabled builds at least, this could presumably be done fairly 
easily via the XML functions, especially if we get XSLT processing into 
the core XML functionality as I hope we can do this release. In fact, 
the ability to leverage existing XML functionality to munge the output 
is the thing that swings me in favor of XML as the machine readable 
output format instead of JSON, since we don't have and aren't terribly 
likely to get an inbuilt JSON parser. It means we wouldn't need some 
external tool at all.

cheers

andrew


Re: generic options for explain

From
Dave Page
Date:
On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> In libxml-enabled builds at least, this could presumably be done fairly
> easily via the XML functions, especially if we get XSLT processing into the
> core XML functionality as I hope we can do this release. In fact, the
> ability to leverage existing XML functionality to munge the output is the
> thing that swings me in favor of XML as the machine readable output format
> instead of JSON, since we don't have and aren't terribly likely to get an
> inbuilt JSON parser. It means we wouldn't need some external tool at all.

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: generic options for explain

From
Magnus Hagander
Date:
Dave Page wrote:
> On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> In libxml-enabled builds at least, this could presumably be done fairly
>> easily via the XML functions, especially if we get XSLT processing into the
>> core XML functionality as I hope we can do this release. In fact, the
>> ability to leverage existing XML functionality to munge the output is the
>> thing that swings me in favor of XML as the machine readable output format
>> instead of JSON, since we don't have and aren't terribly likely to get an
>> inbuilt JSON parser. It means we wouldn't need some external tool at all.

Actually, I think a number of users would be *very* happy if we had a
builtin JSON parser. I'm unsure on how feasible that is though.


> I was thinking something similar, but from the pgAdmin perspective. We
> already use libxml2, but JSON would introduce another dependency for
> us.

Yeah, but probably not a huge one. There is one for wx, but I don't
think it's included by default.

-- Magnus HaganderSelf: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: generic options for explain

From
Peter Eisentraut
Date:
On Tuesday 26 May 2009 16:55:55 Dave Page wrote:
> I was thinking something similar, but from the pgAdmin perspective. We
> already use libxml2, but JSON would introduce another dependency for
> us.

I was actually looking for a C library for JSON (json type for PostgreSQL; you 
know it is coming :-) ), but only found a library tied to glib, which, 
considering the experience with libxml, did not excite me.  If someone knows 
of a different, small, and independent JSON library for C, I would like to 
hear about it.


Re: generic options for explain

From
Magnus Hagander
Date:
Peter Eisentraut wrote:
> On Tuesday 26 May 2009 16:55:55 Dave Page wrote:
>> I was thinking something similar, but from the pgAdmin perspective. We
>> already use libxml2, but JSON would introduce another dependency for
>> us.
> 
> I was actually looking for a C library for JSON (json type for PostgreSQL; you 
> know it is coming :-) ), but only found a library tied to glib, which, 
> considering the experience with libxml, did not excite me.  If someone knows 
> of a different, small, and independent JSON library for C, I would like to 
> hear about it.

The JSon page (http://json.org/) lists for example
http://fara.cs.uni-potsdam.de/~jsg/json_parser/ which appears to not
need it. But it seems very simple - though I haven't actually looked
into the details.


-- Magnus HaganderSelf: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: generic options for explain

From
Andrew Dunstan
Date:

Peter Eisentraut wrote:
> On Tuesday 26 May 2009 16:55:55 Dave Page wrote:
>   
>> I was thinking something similar, but from the pgAdmin perspective. We
>> already use libxml2, but JSON would introduce another dependency for
>> us.
>>     
>
> I was actually looking for a C library for JSON (json type for PostgreSQL; you 
> know it is coming :-) ), but only found a library tied to glib, which, 
> considering the experience with libxml, did not excite me.  If someone knows 
> of a different, small, and independent JSON library for C, I would like to 
> hear about it.
>
>   

There are several listed at <http://www.json.org/>

cheers

andrew


Re: generic options for explain

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, May 26, 2009 at 04:36:56PM +0200, Magnus Hagander wrote:

> > I was thinking something similar, but from the pgAdmin perspective. We
> > already use libxml2, but JSON would introduce another dependency for
> > us.
> 
> Yeah, but probably not a huge one. There is one for wx, but I don't
> think it's included by default.

...and to put things into perspective:

tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
Size: 814356
Size: 33538

(not that I would recommend this one, since that's the one tied to glib,
but seems that XML parsing is nearly one and a half orders of magnitude
more complex than JSON).

- -- tomás  who thinks that XML-as-a-data-description-language is a denial of  service attack on us all
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKHATbBcgs9XrR2kYRArJmAJ4wJlvbnuWKYTvIDrSoLJccCyMTLwCbBM39
NCVSrDaEVad3NfogJrwRtiY=
=Volp
-----END PGP SIGNATURE-----


Re: generic options for explain

From
Aidan Van Dyk
Date:
* tomas@tuxteam.de <tomas@tuxteam.de> [090526 11:03]:
> ...and to put things into perspective:
> 
> tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
> Size: 814356
> Size: 33538

And including glib, which does all the work for libjson-glib:

mountie@pumpkin:~/projects/postgresql/PostgreSQL$ apt-cache show libxml2
libjson-glib-1.0-0 libglib2.0-0 | grep ^Size
Size: 870188
Size: 36132
Size: 845166

glib also pulls in libpcre:
Size: 214650

So:- XML: 870188(libxml) + 76038 (zlib1g) = 946226- JSON: 36132 (json) + 845166 (glib) + 214650 (pcre) = 1095948

;-)


-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: generic options for explain

From
Dimitri Fontaine
Date:
Hi,

Peter Eisentraut <peter_e@gmx.net> writes:
> I was actually looking for a C library for JSON (json type for PostgreSQL; you 
> know it is coming :-) ), but only found a library tied to glib, which, 
> considering the experience with libxml, did not excite me.  If someone knows 
> of a different, small, and independent JSON library for C, I would like to 
> hear about it.

Looking at http://json.org/, it seems this particular project could fit:
 http://lloyd.github.com/yajl/
 Yet Another JSON Library. YAJL is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small
validatingJSON generator. YAJL is released under the BSD license.
 
 ...
 It's all ANSI C. It's been successfully compiled on debian linux, OSX 10.4 i386 & ppc, OSX 10.5 i386, winXP, FreeBSD
4.10,FreeBSD 6.1 amd64, FreeBSD 7 i386, and windows vista. More platforms and binaries as time permits. 
 
 ...
 A second motivation for writing YAJL, was that many available free JSON parsers fall over on large or complex inputs.
YAJLis careful to minimize memory copying and input re-scanning when possible. The result is a parser that should be
fastenough for most applications or tunable for any application. On my mac pro (2.66 ghz) it takes 1s to verify a 60meg
jsonfile. Minimizing that same file with json_reformat takes 4s. 
 
 Largely because YAJL deals with streams, it's possible to parse JSON in low memory environments. Oftentimes with other
parsersan application must hold both the input text and the memory representation of the tree in memory at one time.
WithYAJL you can incrementally read the input stream and hold only the in memory representation. Or for filtering or
validationtasks, it's not required to hold the entire input text in memory. 
 


Hope this helps, regards,
-- 
dim


Re: generic options for explain

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, May 26, 2009 at 11:15:21AM -0400, Aidan Van Dyk wrote:
> * tomas@tuxteam.de <tomas@tuxteam.de> [090526 11:03]:
>  
> > ...and to put things into perspective:
> > 
> > tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
> > Size: 814356
> > Size: 33538
> 
> And including glib, which does all the work for libjson-glib:
> 
> mountie@pumpkin:~/projects/postgresql/PostgreSQL$ apt-cache show libxml2
> libjson-glib-1.0-0 libglib2.0-0 | grep ^Size
> Size: 870188
> Size: 36132
> Size: 845166
> 
> glib also pulls in libpcre:
> Size: 214650
> 
> So:
>  - XML: 870188(libxml) + 76038 (zlib1g) = 946226
>  - JSON: 36132 (json) + 845166 (glib) + 214650 (pcre) = 1095948
> 
> ;-)

OK, OK, you win (darn: should have known those bloatophile gnomies.
Surprise that they don't pull in Mono :-(

But json-c (just downloaded & compiled) is more in the ballpark of 100K,
if I count all produced *.o

And it's BSD.

Regards
- -- tomás

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKHAvaBcgs9XrR2kYRAkasAJwPzzw3Os8e7QA2HvMSkQ0iRGWz+ACfYlp+
Y/v3EO+8sRiPzJNumADatdM=
=EjCU
-----END PGP SIGNATURE-----


Re: generic options for explain

From
Joshua Tolley
Date:
On Tue, May 26, 2009 at 09:55:55AM -0400, Dave Page wrote:
> from the pgAdmin perspective. We
> already use libxml2, but JSON would introduce another dependency for
> us.

...and using XML introduces a dependency for those that apps that don't already
use some XML parser. I realize that since the pool of apps that care to
mechanically parse EXPLAIN output is small, it wouldn't necessarily be a big
deal to hand each of them a new dependency in the form of a parser for XML,
JSON, etc. But we know the least common denominator is to return a set of
tuples; let's make sure that really is unworkable before forcing even that
dependency.

- Josh / eggyknap

Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 10:36 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Dave Page wrote:
>> On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> In libxml-enabled builds at least, this could presumably be done fairly
>>> easily via the XML functions, especially if we get XSLT processing into the
>>> core XML functionality as I hope we can do this release. In fact, the
>>> ability to leverage existing XML functionality to munge the output is the
>>> thing that swings me in favor of XML as the machine readable output format
>>> instead of JSON, since we don't have and aren't terribly likely to get an
>>> inbuilt JSON parser. It means we wouldn't need some external tool at all.
>
> Actually, I think a number of users would be *very* happy if we had a
> builtin JSON parser. I'm unsure on how feasible that is though.

I think it's likely that with proper design the amount of extra code
that is required to support both XML and JSON is likely to be very
small.  I don't think we're going to get away without supporting XML
because there are so many people already using XML-based tools, and I
find Andrew's argument that we already have some built-in XML support
that could possibly be used to smooth the road here as well pretty
compelling.

On the other hand, XML can be a really difficult technology to work
with because it doesn't map cleanly to the data structures that most
modern scripting languages (Perl, Python, Ruby, and probably Java and
others) use.  As a simple example, if you have a hash like { a => 1, b
=> 2 } (using the Perl syntax) you can map it to
<hash><a>1</a><b>2</b></hash>.  That's easy to generate, but the
reverse transformation is full of error-handling cases, like
<hash><a>1</a><b>2<c/></b></hash> and <hash><a>1</a><a>2</a></hash>.
I'm sure experienced XML hackers have ways to work around these
problems, but the XML libraries I've worked with basically don't even
try to turn the thing into any sort of general-purpose data structure.They just let you ask questions like "What is the
rootelement?  OK,
 
now what elements does it contain?  OK, there's an <a> tag there, what
does that have inside it?  Any more-deeply-nested tags?".  On the
other hand, JSON is explicitly designed to serialize and deserialize
data structures of this type, and it pretty much just works, even
between completely different programming languages.

So to summarize that - if we're only going to support one
machine-readable output format, it's probably got to be XML.  But if
the additional effort to also support JSON is small, which I believe
to be the case, then I think it's worth doing because it's actually
better technology for this type of application.  Maybe someone will
feel inspired to work up a contrib/json.

...Robert


Re: generic options for explain

From
Andrew Dunstan
Date:

Robert Haas wrote:
> On the other hand, XML can be a really difficult technology to work
> with because it doesn't map cleanly to the data structures that most
> modern scripting languages (Perl, Python, Ruby, and probably Java and
> others) use.  As a simple example, if you have a hash like { a => 1, b
> => 2 } (using the Perl syntax) you can map it to
> <hash><a>1</a><b>2</b></hash>.  That's easy to generate, but the
> reverse transformation is full of error-handling cases, like
> <hash><a>1</a><b>2<c/></b></hash> and <hash><a>1</a><a>2</a></hash>.
> I'm sure experienced XML hackers have ways to work around these
> problems, but the XML libraries I've worked with basically don't even
> try to turn the thing into any sort of general-purpose data structure.
>  They just let you ask questions like "What is the root element?  OK,
> now what elements does it contain?  OK, there's an <a> tag there, what
> does that have inside it?  Any more-deeply-nested tags?".  On the
> other hand, JSON is explicitly designed to serialize and deserialize
> data structures of this type, and it pretty much just works, even
> between completely different programming languages.
>
>
>   

Since we will be controlling the XML output, we can restrict it to a 
form that is equivalent to what JSON and similar serialisation languages 
use. We can even produce an XSD schema specifying what is allowed, if 
anyone is so minded, and a validating parser could be told to validate 
the XML against that schema. And XSLT processing is a very powerful 
transformation tool. We could even provide a stylesheet that would turn 
the XML into JSON. :-)

Anyway, I think we're getting closer to consensus here.

I think there's a good case for being able to stash the EXPLAIN output 
in a table as XML - that way we could slice and dice it several ways 
without having to rerun the EXPLAIN.


cheers

andrew


Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Robert Haas wrote:
>> On the other hand, XML can be a really difficult technology to work
>> with because it doesn't map cleanly to the data structures that most
>> modern scripting languages (Perl, Python, Ruby, and probably Java and
>> others) use.  As a simple example, if you have a hash like { a => 1, b
>> => 2 } (using the Perl syntax) you can map it to
>> <hash><a>1</a><b>2</b></hash>.  That's easy to generate, but the
>> reverse transformation is full of error-handling cases, like
>> <hash><a>1</a><b>2<c/></b></hash> and <hash><a>1</a><a>2</a></hash>.
>> I'm sure experienced XML hackers have ways to work around these
>> problems, but the XML libraries I've worked with basically don't even
>> try to turn the thing into any sort of general-purpose data structure.
>>  They just let you ask questions like "What is the root element?  OK,
>> now what elements does it contain?  OK, there's an <a> tag there, what
>> does that have inside it?  Any more-deeply-nested tags?".  On the
>> other hand, JSON is explicitly designed to serialize and deserialize
>> data structures of this type, and it pretty much just works, even
>> between completely different programming languages.
>
> Since we will be controlling the XML output, we can restrict it to a form
> that is equivalent to what JSON and similar serialisation languages use. We
> can even produce an XSD schema specifying what is allowed, if anyone is so
> minded, and a validating parser could be told to validate the XML against
> that schema. And XSLT processing is a very powerful transformation tool. We
> could even provide a stylesheet that would turn the XML into JSON. :-)

Yeah, that's fine.  I think we should target 4/1/2010 as the
submission date for that stylesheet.  :-)

> Anyway, I think we're getting closer to consensus here.
>
> I think there's a good case for being able to stash the EXPLAIN output in a
> table as XML - that way we could slice and dice it several ways without
> having to rerun the EXPLAIN.

Yes, I think there is an excellent case for being able to stash any
output format into a table.

...Robert


Re: generic options for explain

From
Greg Stark
Date:
(sorry for top posting - stupid apple)

So the real elephant in the room is that the existing explain code is  
not really designed to be extensible, configurable, or to be printed  
in different formats.


The current code is basically just gobs of text printed by different  
routines all over the code base. There are no data structures which  
represent what explain prints. The closest thing is the  
instrumentation objects which obtain the timing and counts but not the  
planner expectations or any associated data.

If we're going to support multiple output formats or options to turn  
off and on sections I think we need to build a data structure  
independent of the format, have code to include or exclude stats as  
requested and then pass that to the requested formatter.

-- 
Greg


On 26 May 2009, at 18:53, Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, May 26, 2009 at 1:48 PM, Andrew Dunstan  
> <andrew@dunslane.net> wrote:
>> Robert Haas wrote:
>>> On the other hand, XML can be a really difficult technology to work
>>> with because it doesn't map cleanly to the data structures that most
>>> modern scripting languages (Perl, Python, Ruby, and probably Java  
>>> and
>>> others) use.  As a simple example, if you have a hash like { a =>  
>>> 1, b
>>> => 2 } (using the Perl syntax) you can map it to
>>> <hash><a>1</a><b>2</b></hash>.  That's easy to generate, but the
>>> reverse transformation is full of error-handling cases, like
>>> <hash><a>1</a><b>2<c/></b></hash> and <hash><a>1</a><a>2</a></hash>.
>>> I'm sure experienced XML hackers have ways to work around these
>>> problems, but the XML libraries I've worked with basically don't  
>>> even
>>> try to turn the thing into any sort of general-purpose data  
>>> structure.
>>>  They just let you ask questions like "What is the root element?   
>>> OK,
>>> now what elements does it contain?  OK, there's an <a> tag there,  
>>> what
>>> does that have inside it?  Any more-deeply-nested tags?".  On the
>>> other hand, JSON is explicitly designed to serialize and deserialize
>>> data structures of this type, and it pretty much just works, even
>>> between completely different programming languages.
>>
>> Since we will be controlling the XML output, we can restrict it to  
>> a form
>> that is equivalent to what JSON and similar serialisation languages  
>> use. We
>> can even produce an XSD schema specifying what is allowed, if  
>> anyone is so
>> minded, and a validating parser could be told to validate the XML  
>> against
>> that schema. And XSLT processing is a very powerful transformation  
>> tool. We
>> could even provide a stylesheet that would turn the XML into  
>> JSON. :-)
>
> Yeah, that's fine.  I think we should target 4/1/2010 as the
> submission date for that stylesheet.  :-)
>
>> Anyway, I think we're getting closer to consensus here.
>>
>> I think there's a good case for being able to stash the EXPLAIN  
>> output in a
>> table as XML - that way we could slice and dice it several ways  
>> without
>> having to rerun the EXPLAIN.
>
> Yes, I think there is an excellent case for being able to stash any
> output format into a table.
>
> ...Robert
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: generic options for explain

From
Tom Lane
Date:
Greg Stark <greg.stark@enterprisedb.com> writes:
> So the real elephant in the room is that the existing explain code is  
> not really designed to be extensible, configurable, or to be printed  
> in different formats.

These are implementation details ;-).  Let's get a definition that
everyone can sign off on, and then worry about what has to be done
to the code to make it happen.  Even if we end up throwing away and
rewriting all of explain.c, that's not *that* much code.
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 3:04 PM, Greg Stark <greg.stark@enterprisedb.com> wrote:
> (sorry for top posting - stupid apple)
>
> So the real elephant in the room is that the existing explain code is not
> really designed to be extensible, configurable, or to be printed in
> different formats.
>
> The current code is basically just gobs of text printed by different
> routines all over the code base. There are no data structures which

All over the code base?  It looks to me like most of it is in
explain.c, specifically explain_outNode().

(On an unrelated point, it's difficult to imagine why someone thought
that was a good way of capitalizing & punctuating that function name.)

> represent what explain prints. The closest thing is the instrumentation
> objects which obtain the timing and counts but not the planner expectations
> or any associated data.
>
> If we're going to support multiple output formats or options to turn off and
> on sections I think we need to build a data structure independent of the
> format, have code to include or exclude stats as requested and then pass
> that to the requested formatter.

That sounds about right to me.  I think that representation can be
pretty thin, though, maybe just a big struct with all the attributes
that are applicable to any node type and pointers to its left and
right children.

...Robert


Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <greg.stark@enterprisedb.com> writes:
>> So the real elephant in the room is that the existing explain code is
>> not really designed to be extensible, configurable, or to be printed
>> in different formats.
>
> These are implementation details ;-).  Let's get a definition that
> everyone can sign off on, and then worry about what has to be done
> to the code to make it happen.  Even if we end up throwing away and
> rewriting all of explain.c, that's not *that* much code.

I'm actually not sure there's a whole lot to hash out... I was going
to take a crack at writing some code.

...Robert


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> These are implementation details ;-). �Let's get a definition that
>> everyone can sign off on, and then worry about what has to be done
>> to the code to make it happen.

> I'm actually not sure there's a whole lot to hash out... I was going
> to take a crack at writing some code.

I still haven't seen anything but formless handwaving as far as the "SQL
table" output format goes.  For that matter, there's not much more than
handwaving behind the "XML" meme either.  Show us a spec for the output
format, then think about code.  (This was somewhere around slide ten
here:  http://momjian.us/main/writings/pgsql/patch.pdf  ;-))
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> These are implementation details ;-).  Let's get a definition that
>>> everyone can sign off on, and then worry about what has to be done
>>> to the code to make it happen.
>
>> I'm actually not sure there's a whole lot to hash out... I was going
>> to take a crack at writing some code.
>
> I still haven't seen anything but formless handwaving as far as the "SQL
> table" output format goes.  For that matter, there's not much more than
> handwaving behind the "XML" meme either.  Show us a spec for the output
> format, then think about code.  (This was somewhere around slide ten
> here:  http://momjian.us/main/writings/pgsql/patch.pdf  ;-))

OK, how about this:

http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com

I note in passing that there have been 51 messages posted to this
thread since I wrote that email, and none of the were responses to it.

At any rate, that email might not be as detailed as what you're
looking for, but it's certainly a start.  I don't really know how the
table-format output is going to work out; I have to look at the code
more to get a feeling for that.  But I think with respect to XML or
JSON, there really aren't too many options for how it can look, modulo
minor syntax tweaks like arguing about whether the join type should be
labelled "JoinType" or "jointype" or "join_type".  Still, if you have
comments or think I'm overlooking something important, I definitely
would like to know about that now before I put more time into it.

I recognize that we haven't come to a consensus on the best possible
syntax for EXPLAIN options, but it seems to me that threshold issue
for improving EXPLAIN is everyone agreeing that we're going to allow
for some kind of extendable syntax that doesn't rely on all options
being keywords (presented in a fixed order, no less!).  You caved in
on that point upthread and I don't think we have any other holdouts.
Now, of course, my syntax is the best possible one in the entire
universe, but if by chance there is a technically feasible alternative
syntax on which more than one person can agree (note: this has not
happened yet), adjusting my patch to use that syntax rather than the
one I stole from Peter shouldn't be too hard.

A second issue on which we don't have consensus is a method to capture
explain output.  I am 100% of the opinion that there are only two
sensible things to do here: (1) make EXPLAIN a fully reserved keyword
so that we can use it just like a SELECT, or (2) provide a built-in
function like pg_explain() that calls EXPLAIN with a user-specified
set of arguments, and which third-party tools can count on to be
installed.  Since you labelled (1) as a non-starter and AFAICS you're
the only holdout on making (2) a built-in rather than something
everyone has to define for themselves, I'm hopeful that we'll bring
you around.  :-)

The final issue on which we don't have a clear consensus is what OTHER
new options we want for EXPLAIN aside from choice of output format.  I
posted a few ideas that I have and solicited some others upthread, but
I think that the volume of email on other aspects of this patch has
deprived people of the necessary time and space to think about how
they might like to use an extensible options syntax once we have it -
not to mention that the original patch was only posted 3 days ago and
on a day when many of us were on airplanes, about to get on airplanes,
or still jet-lagged.  Personally, I think that that's the most
interesting aspect of this whole project so I hope it gets some
attention going forward, but I'm not too concerned about the exact
timing of that attention.  The point is that people not-infrequently
come up with more stuff they'd like to see in EXPLAIN output, and
those ideas get shot down because we don't have the syntax.  If we fix
the syntax, those ideas will come back around again in due course, and
we'll be able to consider them on their merits rather than
peremptorily shooting them down.

...Robert


Re: generic options for explain

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I still haven't seen anything but formless handwaving as far as the "SQL
>> table" output format goes. �For that matter, there's not much more than
>> handwaving behind the "XML" meme either.

> OK, how about this:
> http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com

> I note in passing that there have been 51 messages posted to this
> thread since I wrote that email, and none of the were responses to it.

Well, we were having too much fun arguing about trivia ;-).  And I
suspect a number of people were too jet-lagged to keep track of what
they'd read and what not.  Anyway, good, we have a starting point.

Some issues that I see here:

1.  You seem to be assuming that each table row will represent exactly
one plan node, no more, no less.  That's okay as a first approximation
but it breaks down on closer examination.  In particular, where will you
hang the information that's already available about trigger execution
costs?  Those are not associated with any particular plan node, as they
occur atop the whole plan.  The same goes for the total execution time
of course, and I can foresee other types of stats that we might gather
someday that would be hard to tie to any specific plan node.

In XML this is soluble by having a toplevel node <ExplainResults> that
contains not only the plan tree but other children.  I'm not seeing how
to translate that into a SQL table, though.  Or at least not just one
SQL table.

2.  You didn't say anything about how any but simple scalar fields will
be represented.  Filter conditions and sort keys are particularly
interesting here.  I'm not really happy with just plopping down the same
textual output we have now --- that is just as human-friendly-and-not-
machine-friendly as before, only with a slightly smaller scope.  I can
foresee for example that someone might wish to extract the second or
third sort key expression from a Sort node's sort key list.  Or what
about problems such as "find which nodes this field is used in"?

3.  You left us with a handwave about how the tree structure will be
represented in a table.  Needs to be explicit.  And it's not just
simple child relationships that should be represented ... tell us
about initplans and subplans, too.

4.  The point about having lots of NULL columns is an annoyance that
could escalate to the point of near unusability.  To get a feeling for
how workable that is, we need a pretty exact list of the set of output
columns, not just a rough list of the kinds of things that will be
there.
        regards, tom lane


Re: generic options for explain

From
Robert Haas
Date:
On Tue, May 26, 2009 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I still haven't seen anything but formless handwaving as far as the "SQL
>>> table" output format goes.  For that matter, there's not much more than
>>> handwaving behind the "XML" meme either.
>
>> OK, how about this:
>> http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com
>
>> I note in passing that there have been 51 messages posted to this
>> thread since I wrote that email, and none of the were responses to it.
>
> Well, we were having too much fun arguing about trivia ;-).  And I
> suspect a number of people were too jet-lagged to keep track of what
> they'd read and what not.  Anyway, good, we have a starting point.
>
> Some issues that I see here:
>
> 1.  You seem to be assuming that each table row will represent exactly
> one plan node, no more, no less.  That's okay as a first approximation
> but it breaks down on closer examination.  In particular, where will you
> hang the information that's already available about trigger execution
> costs?  Those are not associated with any particular plan node, as they
> occur atop the whole plan.  The same goes for the total execution time
> of course, and I can foresee other types of stats that we might gather
> someday that would be hard to tie to any specific plan node.
>
> In XML this is soluble by having a toplevel node <ExplainResults> that
> contains not only the plan tree but other children.  I'm not seeing how
> to translate that into a SQL table, though.  Or at least not just one
> SQL table.
>
> 2.  You didn't say anything about how any but simple scalar fields will
> be represented.  Filter conditions and sort keys are particularly
> interesting here.  I'm not really happy with just plopping down the same
> textual output we have now --- that is just as human-friendly-and-not-
> machine-friendly as before, only with a slightly smaller scope.  I can
> foresee for example that someone might wish to extract the second or
> third sort key expression from a Sort node's sort key list.  Or what
> about problems such as "find which nodes this field is used in"?
>
> 3.  You left us with a handwave about how the tree structure will be
> represented in a table.  Needs to be explicit.  And it's not just
> simple child relationships that should be represented ... tell us
> about initplans and subplans, too.
>
> 4.  The point about having lots of NULL columns is an annoyance that
> could escalate to the point of near unusability.  To get a feeling for
> how workable that is, we need a pretty exact list of the set of output
> columns, not just a rough list of the kinds of things that will be
> there.

Responding to these in bulk, I think that 1, 3, and 4 are pretty
convincing arguments that the SQL-based output format is
underspecified.  I hereby promise not to do anything about that
without further discussion, which is an easy promise to make
considering that in light of those comments I have no idea what it
should look like.  I think (1) is the most damning point.  However, as
far as I can see, none of these will affect XML or JSON.

With respect to (2), I think we should output the same text format
that we have now, for starters.  I agree that's not the only thing
that someone might want, but I think there's a pretty good argument
that it's ONE thing that someone might reasonably want, depending on
the application.  If someone cares to build a better mousetrap in this
area, it can be added on once we figure out the design, and without
breaking anything! - that's sort of the whole point of this exercise.

...Robert


Re: generic options for explain

From
"Kevin Grittner"
Date:
Sorry to come in on this discussion so late.  Just catching up....
Robert Haas <robertmhaas@gmail.com> wrote: 
> Responding to these in bulk, I think that 1, 3, and 4 are pretty
> convincing arguments that the SQL-based output format is
> underspecified.  I hereby promise not to do anything about that
> without further discussion, which is an easy promise to make
> considering that in light of those comments I have no idea what it
> should look like.  I think (1) is the most damning point.  However,
> as far as I can see, none of these will affect XML or JSON.
Personally, I find XML to be very hard to read; however, I can see the
value of writing to that and having someone who can tolerate XSLT turn
XML into anything else we want.  (That could include morphing it into
SELECT statements with the literals to present it as a tuple set, I
should think.)  As long as nobody considers this issue "done" until
there are useful and convenient ways to display and use the data
within psql without having to look at the XML, that seems a reasonable
approach.
The big plus of the current technique is that it is so convenient to
Ctrl+C something which is running too long, arrow up, hit Home, and
put the EXPLAIN word in front.  Turning the query into a character
string literal and feeding it to a function would be a big step
backward.
A big down side of the current technique is that you can't get both
the results of a SELECT and its plan.  I haven't seen any discussion
here about emitting the EXPLAIN output through some INFO messages or
some such, and letting the query return its normal results, but I feel
that would be a significant improvement, if it that be done.
Also, something I miss from previous database products is a way to
control the verbosity of the output when planning.  I do think that
needs to be some sort of option up front, not a filter phase, because
of the high cost it can have.  If there was a way to show all the
candidate plans and their cost estimates in a run time environment,
without any special build or configuration needed, I'd use it every
now and then.
-Kevin


Re: generic options for explain

From
ioguix
Date:
Magnus Hagander a écrit :
> Dave Page wrote:
>   
>> I was thinking something similar, but from the pgAdmin perspective. We
>> already use libxml2, but JSON would introduce another dependency for
>> us.
>>     
>
> Yeah, but probably not a huge one. There is one for wx, but I don't
> think it's included by default.
>
>   
+1 for the machine readable explain.

FWIW, I have an early patch for phpPgAdmin about a graphical explain. 
IIRC when I wrote it, I told myself the parser might actually be broken 
with multi-level sub-queries or something. But I ended with the same 
parsing code than pgAdmin anyway.

About the format, JSON would be the best here, as it is a one function 
call in PHP to retrieve an associative array from the JSON code.

-- 
Guillaume (ioguix) de Rorthais


Re: generic options for explain

From
Greg Stark
Date:
On Wed, May 27, 2009 at 1:30 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> The big plus of the current technique is that it is so convenient to
> Ctrl+C something which is running too long, arrow up, hit Home, and
> put the EXPLAIN word in front.  Turning the query into a character
> string literal and feeding it to a function would be a big step
> backward.
>
> A big down side of the current technique is that you can't get both
> the results of a SELECT and its plan.  I haven't seen any discussion
> here about emitting the EXPLAIN output through some INFO messages or
> some such, and letting the query return its normal results, but I feel
> that would be a significant improvement, if it that be done.

Would something like this address both of your issues?

http://article.gmane.org/gmane.comp.db.postgresql.devel.patches/21614/match=siginfo

It let you hit a control character while the query was running to view
the explain analyze for the results so far. The query kept running and
you could request further updates whenever you wanted.

--
greg


Re: generic options for explain

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote:
>
http://article.gmane.org/gmane.comp.db.postgresql.devel.patches/21614/match=siginfo
> 
> It let you hit a control character while the query was running to
view
> the explain analyze for the results so far. The query kept running
and
> you could request further updates whenever you wanted.
I'll have to check that out.  It almost seems too easy....   :-)
-Kevin