Thread: FDW API: don't like the EXPLAIN mechanism

FDW API: don't like the EXPLAIN mechanism

From
Tom Lane
Date:
I've been poking at the FDW stuff and file_fdw, and I find myself
dissatisfied with the way that the EXPLAIN support is designed,
namely that we have to compute a string at plan time to be displayed
by EXPLAIN.  There are a couple of problems with that:

1. The explainInfo string is useless overhead during a normal query.

2. There is no way to vary the display depending on EXPLAIN options such
as VERBOSE and COSTS OFF.

It seems fairly obvious to me that there might be scope for showing
more info in VERBOSE mode.  But even more to the point, the current
regression test's example output:

EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv;Foreign Scan on public.agg_csv  Output: a, b  Foreign Plan:
file="@abs_srcdir@/data/agg.csv",size=46
 

is extremely bogus.  COSTS OFF ought to suppress highly-volatile details
like the file size.  To stick with this design, we'd have to have a
convention that explainInfo never shows any more data than would be
appropriate in COSTS OFF mode.

And then there's

3. There is no way to report actual inside-the-FDW execution stats in
EXPLAIN ANALYZE mode.

So this seems very far short of satisfactory.  I think we should drop
FdwPlan.explainInfo and instead define an additional callback function
that is called by EXPLAIN to produce the extra data for EXPLAIN to
display.  This function could have access to the EXPLAIN options as well
as (in ANALYZE mode) the final state of the execution node, so it could
tailor its output appropriately.


BTW, another thing that strikes me as poorly done in the file_fdw code
is that it gathers up all the options of the foreign table, server, and
wrapper at plan time, and stores those in the plan, and uses that
information at runtime.  What happens if the options change underneath
a prepared plan?
        regards, tom lane


Re: FDW API: don't like the EXPLAIN mechanism

From
Tom Lane
Date:
I wrote:
> ... I think we should drop
> FdwPlan.explainInfo and instead define an additional callback function
> that is called by EXPLAIN to produce the extra data for EXPLAIN to
> display.  This function could have access to the EXPLAIN options as well
> as (in ANALYZE mode) the final state of the execution node, so it could
> tailor its output appropriately.

My initial idea was to define the new callback function with signature

char *
ExplainForeignScan (ForeignScanState *node,                   ExplainState *es);

and stick to pretty much the functionality implemented in the submitted
patch, ie, you could expect to get output looking like this:
Foreign Scan on public.agg_csv  FDW Info: file="@abs_srcdir@/data/agg.csv", size=46

However, it occurs to me that as long as we're passing the function the
ExplainState, it has what it needs to add arbitrary EXPLAIN result
fields.  Although it could do this the hard way, we could make it a lot
easier by exporting the ExplainProperty<Type> functions.  Then it'd be
possible to produce output like
Foreign Scan on public.agg_csv  Foreign File: @abs_srcdir@/data/agg.csv  Foreign File Size: 46

which seems a whole lot nicer than the current design.  In this scheme
the callback function would just be declared to return void.

Anybody have an objection to doing it like that?
        regards, tom lane


Re: FDW API: don't like the EXPLAIN mechanism

From
Robert Haas
Date:
On Sat, Feb 19, 2011 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anybody have an objection to doing it like that?

I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: FDW API: don't like the EXPLAIN mechanism

From
Andrew Dunstan
Date:

On 02/19/2011 11:07 PM, Tom Lane wrote:
>
> However, it occurs to me that as long as we're passing the function the
> ExplainState, it has what it needs to add arbitrary EXPLAIN result
> fields.  Although it could do this the hard way, we could make it a lot
> easier by exporting the ExplainProperty<Type>  functions.  Then it'd be
> possible to produce output like
>
>     Foreign Scan on public.agg_csv
>       Foreign File: @abs_srcdir@/data/agg.csv
>       Foreign File Size: 46
>
> which seems a whole lot nicer than the current design.  In this scheme
> the callback function would just be declared to return void.
>
> Anybody have an objection to doing it like that?
>
>             


If we allow the invention of new explain states we'll never be able to 
publish an authoritative schema definition of the data. That's not 
necessarily an argument against doing it, just something to be aware of. 
Maybe we don't care about having EXPLAIN XML output validated.

cheers

andrew


Re: FDW API: don't like the EXPLAIN mechanism

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 02/19/2011 11:07 PM, Tom Lane wrote:
>> However, it occurs to me that as long as we're passing the function the
>> ExplainState, it has what it needs to add arbitrary EXPLAIN result
>> fields.

> If we allow the invention of new explain states we'll never be able to 
> publish an authoritative schema definition of the data. That's not 
> necessarily an argument against doing it, just something to be aware of. 
> Maybe we don't care about having EXPLAIN XML output validated.

I thought one of the principal arguments for outputting XML/etc formats
was exactly that we'd be able to add fields without breaking readers.
If that's not the case, why did we bother?
        regards, tom lane


Re: FDW API: don't like the EXPLAIN mechanism

From
Andrew Dunstan
Date:

On 02/21/2011 11:23 AM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 02/19/2011 11:07 PM, Tom Lane wrote:
>>> However, it occurs to me that as long as we're passing the function the
>>> ExplainState, it has what it needs to add arbitrary EXPLAIN result
>>> fields.
>> If we allow the invention of new explain states we'll never be able to
>> publish an authoritative schema definition of the data. That's not
>> necessarily an argument against doing it, just something to be aware of.
>> Maybe we don't care about having EXPLAIN XML output validated.
> I thought one of the principal arguments for outputting XML/etc formats
> was exactly that we'd be able to add fields without breaking readers.
> If that's not the case, why did we bother?
>
>             


Well, I thought the motivation was to allow easy construction of parsers 
for the data, since creating a parser for those formats is pretty trivial.

Anyway, if we don't care about validation that's fine. I just didn't 
want us to make that decision unconsciously.

cheers

andrew


Re: FDW API: don't like the EXPLAIN mechanism

From
Alvaro Herrera
Date:
Excerpts from Andrew Dunstan's message of lun feb 21 13:11:25 -0300 2011:

> If we allow the invention of new explain states we'll never be able to 
> publish an authoritative schema definition of the data. That's not 
> necessarily an argument against doing it, just something to be aware of. 
> Maybe we don't care about having EXPLAIN XML output validated.

The alternative seems to be to let this information proliferate in free
form (say as text inside some other node), which may be even less convenient.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: FDW API: don't like the EXPLAIN mechanism

From
Mark Mielke
Date:
On 02/21/2011 11:38 AM, Andrew Dunstan wrote:
>
> On 02/21/2011 11:23 AM, Tom Lane wrote:
>> Andrew Dunstan<andrew@dunslane.net>  writes:
>>> If we allow the invention of new explain states we'll never be able to
>>> publish an authoritative schema definition of the data. That's not
>>> necessarily an argument against doing it, just something to be aware 
>>> of.
>>> Maybe we don't care about having EXPLAIN XML output validated.
>> I thought one of the principal arguments for outputting XML/etc formats
>> was exactly that we'd be able to add fields without breaking readers.
>> If that's not the case, why did we bother?
>>
>
> Well, I thought the motivation was to allow easy construction of 
> parsers for the data, since creating a parser for those formats is 
> pretty trivial.
> Anyway, if we don't care about validation that's fine. I just didn't 
> want us to make that decision unconsciously.

Parsing XML isn't trivial, not if done correctly... :-)

I don't see the benefit of validation beyond test suites, and then the 
specification can be published with the version of PostgreSQL (as XSD?) 
if so necessary.

Primary benefits include:

1) Open and widely recognized format.
2) Well tested and readily available parsers already exist.
3) Able to easily add content without breaking existing parsers or 
analyzers, provided the parsers and analyzers are written properly.

Any XML parser that does:  m[<tag>(.*?)</tag>]  ... is not written properly.

-- 
Mark Mielke<mark@mielke.cc>



Re: FDW API: don't like the EXPLAIN mechanism

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Andrew Dunstan's message of lun feb 21 13:11:25 -0300 2011:
>> If we allow the invention of new explain states we'll never be able to 
>> publish an authoritative schema definition of the data. That's not 
>> necessarily an argument against doing it, just something to be aware of. 
>> Maybe we don't care about having EXPLAIN XML output validated.

> The alternative seems to be to let this information proliferate in free
> form (say as text inside some other node), which may be even less convenient.

Yeah; that's exactly what the patch-as-submitted did, and that way was
surely not better from the standpoint of machine parsing the output.
If you really want to lock down the output format to only what is
specified by the core code, then the answer is to forget about giving
FDWs an explain hook at all.
        regards, tom lane


Re: FDW API: don't like the EXPLAIN mechanism

From
Andrew Dunstan
Date:

On 02/21/2011 11:45 AM, Mark Mielke wrote:
> On 02/21/2011 11:38 AM, Andrew Dunstan wrote:
>>
>> On 02/21/2011 11:23 AM, Tom Lane wrote:
>>> Andrew Dunstan<andrew@dunslane.net>  writes:
>>>> If we allow the invention of new explain states we'll never be able to
>>>> publish an authoritative schema definition of the data. That's not
>>>> necessarily an argument against doing it, just something to be 
>>>> aware of.
>>>> Maybe we don't care about having EXPLAIN XML output validated.
>>> I thought one of the principal arguments for outputting XML/etc formats
>>> was exactly that we'd be able to add fields without breaking readers.
>>> If that's not the case, why did we bother?
>>>
>>
>> Well, I thought the motivation was to allow easy construction of 
>> parsers for the data, since creating a parser for those formats is 
>> pretty trivial.
>> Anyway, if we don't care about validation that's fine. I just didn't 
>> want us to make that decision unconsciously.
>
> Parsing XML isn't trivial, not if done correctly... :-)
>
> I don't see the benefit of validation beyond test suites, and then the 
> specification can be published with the version of PostgreSQL (as 
> XSD?) if so necessary.
>
> Primary benefits include:
>
> 1) Open and widely recognized format.
> 2) Well tested and readily available parsers already exist.
> 3) Able to easily add content without breaking existing parsers or 
> analyzers, provided the parsers and analyzers are written properly.
>
> Any XML parser that does:  m[<tag>(.*?)</tag>]  ... is not written 
> properly.


Parsing all these formats is trivial enough if you use a library to do 
the heavy lifting for you. I do *lots* of XML/XSL work and I don't use 
stuff like "m[<tag>(.*?)</tag>]".  Here's a fragment from a working 
program written a week or two back in perl:
   my $parser= XML::DOM::Parser->new();   my $xp = $parser->parsefile($xmlfile);   my ($provider) =
$xp->findvalue("//SERVICE_PROVIDER_CODE");  my ($invoice_num) = $xp->findvalue("//invoice_num");
 

Not that hard, is it? No regex matching there. :-)

Regarding your other suggestion, the whole point I have been making is 
that if external modules can invent arbitrary nodes then we can't 
publish an XSD (or RelaxNG or DTD) spec that is worth anything at all.

Anyway, it seems like that's what people want.


cheers

andrew


Re: FDW API: don't like the EXPLAIN mechanism

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Regarding your other suggestion, the whole point I have been making is 
> that if external modules can invent arbitrary nodes then we can't 
> publish an XSD (or RelaxNG or DTD) spec that is worth anything at all.

Well, sure we can.  But if you're using any external FDW, you'll need to
consult its documentation to see what additions it makes.

It may be sufficient to say something like "ForeignScan can have
unspecified additional children".  Dunno if we can formalize that
in any useful way ...
        regards, tom lane


Re: FDW API: don't like the EXPLAIN mechanism

From
Andrew Dunstan
Date:

On 02/21/2011 12:28 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> Regarding your other suggestion, the whole point I have been making is
>> that if external modules can invent arbitrary nodes then we can't
>> publish an XSD (or RelaxNG or DTD) spec that is worth anything at all.
> Well, sure we can.  But if you're using any external FDW, you'll need to
> consult its documentation to see what additions it makes.
>
> It may be sufficient to say something like "ForeignScan can have
> unspecified additional children".  Dunno if we can formalize that
> in any useful way ...
>
>             


Well, you can override definitions, so the FDW could provide a spec that 
imported the base spec and then overrode the relevant parts to plug in 
its extra nodes. But that would get pretty hairy with more than one FDW.

Still, we've got by so far with no spec at all so maybe it really 
doesn't matter.

cheers

andrew




Re: FDW API: don't like the EXPLAIN mechanism

From
"David E. Wheeler"
Date:
On Feb 21, 2011, at 9:12 AM, Andrew Dunstan wrote:

>   my $parser= XML::DOM::Parser->new();
>   my $xp = $parser->parsefile($xmlfile);
>   my ($provider) = $xp->findvalue("//SERVICE_PROVIDER_CODE");
>   my ($invoice_num) = $xp->findvalue("//invoice_num");
> 
> Not that hard, is it? No regex matching there. :-)

See also
 https://github.com/theory/explain-table

Best,

David



Re: FDW API: don't like the EXPLAIN mechanism

From
David E. Wheeler
Date:
On Feb 21, 2011, at 10:07 AM, David E. Wheeler wrote:

> See also
> 
>  https://github.com/theory/explain-table

Oops, sorry, make that
 https://github.com/pgexperts/explain-table

Best,

David




Re: FDW API: don't like the EXPLAIN mechanism

From
David E. Wheeler
Date:
On Feb 21, 2011, at 10:11 AM, David E. Wheeler wrote:

> Oops, sorry, make that
> 
>  https://github.com/pgexperts/explain-table

And now I've renamed it (sorry) and released it on PGXN. New links:
 https://github.com/pgexperts/explanation http://master.pgxn.org/dist/explanation/

Best,

David