Thread: FDW API: don't like the EXPLAIN mechanism
I've been poking at the FDW stuff and file_fdw, and I find myself dissatisfied with the way that the EXPLAIN support is designed, namely that we have to compute a string at plan time to be displayed by EXPLAIN. There are a couple of problems with that: 1. The explainInfo string is useless overhead during a normal query. 2. There is no way to vary the display depending on EXPLAIN options such as VERBOSE and COSTS OFF. It seems fairly obvious to me that there might be scope for showing more info in VERBOSE mode. But even more to the point, the current regression test's example output: EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv;Foreign Scan on public.agg_csv Output: a, b Foreign Plan: file="@abs_srcdir@/data/agg.csv",size=46 is extremely bogus. COSTS OFF ought to suppress highly-volatile details like the file size. To stick with this design, we'd have to have a convention that explainInfo never shows any more data than would be appropriate in COSTS OFF mode. And then there's 3. There is no way to report actual inside-the-FDW execution stats in EXPLAIN ANALYZE mode. So this seems very far short of satisfactory. I think we should drop FdwPlan.explainInfo and instead define an additional callback function that is called by EXPLAIN to produce the extra data for EXPLAIN to display. This function could have access to the EXPLAIN options as well as (in ANALYZE mode) the final state of the execution node, so it could tailor its output appropriately. BTW, another thing that strikes me as poorly done in the file_fdw code is that it gathers up all the options of the foreign table, server, and wrapper at plan time, and stores those in the plan, and uses that information at runtime. What happens if the options change underneath a prepared plan? regards, tom lane
I wrote: > ... I think we should drop > FdwPlan.explainInfo and instead define an additional callback function > that is called by EXPLAIN to produce the extra data for EXPLAIN to > display. This function could have access to the EXPLAIN options as well > as (in ANALYZE mode) the final state of the execution node, so it could > tailor its output appropriately. My initial idea was to define the new callback function with signature char * ExplainForeignScan (ForeignScanState *node, ExplainState *es); and stick to pretty much the functionality implemented in the submitted patch, ie, you could expect to get output looking like this: Foreign Scan on public.agg_csv FDW Info: file="@abs_srcdir@/data/agg.csv", size=46 However, it occurs to me that as long as we're passing the function the ExplainState, it has what it needs to add arbitrary EXPLAIN result fields. Although it could do this the hard way, we could make it a lot easier by exporting the ExplainProperty<Type> functions. Then it'd be possible to produce output like Foreign Scan on public.agg_csv Foreign File: @abs_srcdir@/data/agg.csv Foreign File Size: 46 which seems a whole lot nicer than the current design. In this scheme the callback function would just be declared to return void. Anybody have an objection to doing it like that? regards, tom lane
On Sat, Feb 19, 2011 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anybody have an objection to doing it like that? I don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/19/2011 11:07 PM, Tom Lane wrote: > > However, it occurs to me that as long as we're passing the function the > ExplainState, it has what it needs to add arbitrary EXPLAIN result > fields. Although it could do this the hard way, we could make it a lot > easier by exporting the ExplainProperty<Type> functions. Then it'd be > possible to produce output like > > Foreign Scan on public.agg_csv > Foreign File: @abs_srcdir@/data/agg.csv > Foreign File Size: 46 > > which seems a whole lot nicer than the current design. In this scheme > the callback function would just be declared to return void. > > Anybody have an objection to doing it like that? > > If we allow the invention of new explain states we'll never be able to publish an authoritative schema definition of the data. That's not necessarily an argument against doing it, just something to be aware of. Maybe we don't care about having EXPLAIN XML output validated. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 02/19/2011 11:07 PM, Tom Lane wrote: >> However, it occurs to me that as long as we're passing the function the >> ExplainState, it has what it needs to add arbitrary EXPLAIN result >> fields. > If we allow the invention of new explain states we'll never be able to > publish an authoritative schema definition of the data. That's not > necessarily an argument against doing it, just something to be aware of. > Maybe we don't care about having EXPLAIN XML output validated. I thought one of the principal arguments for outputting XML/etc formats was exactly that we'd be able to add fields without breaking readers. If that's not the case, why did we bother? regards, tom lane
On 02/21/2011 11:23 AM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 02/19/2011 11:07 PM, Tom Lane wrote: >>> However, it occurs to me that as long as we're passing the function the >>> ExplainState, it has what it needs to add arbitrary EXPLAIN result >>> fields. >> If we allow the invention of new explain states we'll never be able to >> publish an authoritative schema definition of the data. That's not >> necessarily an argument against doing it, just something to be aware of. >> Maybe we don't care about having EXPLAIN XML output validated. > I thought one of the principal arguments for outputting XML/etc formats > was exactly that we'd be able to add fields without breaking readers. > If that's not the case, why did we bother? > > Well, I thought the motivation was to allow easy construction of parsers for the data, since creating a parser for those formats is pretty trivial. Anyway, if we don't care about validation that's fine. I just didn't want us to make that decision unconsciously. cheers andrew
Excerpts from Andrew Dunstan's message of lun feb 21 13:11:25 -0300 2011: > If we allow the invention of new explain states we'll never be able to > publish an authoritative schema definition of the data. That's not > necessarily an argument against doing it, just something to be aware of. > Maybe we don't care about having EXPLAIN XML output validated. The alternative seems to be to let this information proliferate in free form (say as text inside some other node), which may be even less convenient. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 02/21/2011 11:38 AM, Andrew Dunstan wrote: > > On 02/21/2011 11:23 AM, Tom Lane wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> If we allow the invention of new explain states we'll never be able to >>> publish an authoritative schema definition of the data. That's not >>> necessarily an argument against doing it, just something to be aware >>> of. >>> Maybe we don't care about having EXPLAIN XML output validated. >> I thought one of the principal arguments for outputting XML/etc formats >> was exactly that we'd be able to add fields without breaking readers. >> If that's not the case, why did we bother? >> > > Well, I thought the motivation was to allow easy construction of > parsers for the data, since creating a parser for those formats is > pretty trivial. > Anyway, if we don't care about validation that's fine. I just didn't > want us to make that decision unconsciously. Parsing XML isn't trivial, not if done correctly... :-) I don't see the benefit of validation beyond test suites, and then the specification can be published with the version of PostgreSQL (as XSD?) if so necessary. Primary benefits include: 1) Open and widely recognized format. 2) Well tested and readily available parsers already exist. 3) Able to easily add content without breaking existing parsers or analyzers, provided the parsers and analyzers are written properly. Any XML parser that does: m[<tag>(.*?)</tag>] ... is not written properly. -- Mark Mielke<mark@mielke.cc>
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Andrew Dunstan's message of lun feb 21 13:11:25 -0300 2011: >> If we allow the invention of new explain states we'll never be able to >> publish an authoritative schema definition of the data. That's not >> necessarily an argument against doing it, just something to be aware of. >> Maybe we don't care about having EXPLAIN XML output validated. > The alternative seems to be to let this information proliferate in free > form (say as text inside some other node), which may be even less convenient. Yeah; that's exactly what the patch-as-submitted did, and that way was surely not better from the standpoint of machine parsing the output. If you really want to lock down the output format to only what is specified by the core code, then the answer is to forget about giving FDWs an explain hook at all. regards, tom lane
On 02/21/2011 11:45 AM, Mark Mielke wrote: > On 02/21/2011 11:38 AM, Andrew Dunstan wrote: >> >> On 02/21/2011 11:23 AM, Tom Lane wrote: >>> Andrew Dunstan<andrew@dunslane.net> writes: >>>> If we allow the invention of new explain states we'll never be able to >>>> publish an authoritative schema definition of the data. That's not >>>> necessarily an argument against doing it, just something to be >>>> aware of. >>>> Maybe we don't care about having EXPLAIN XML output validated. >>> I thought one of the principal arguments for outputting XML/etc formats >>> was exactly that we'd be able to add fields without breaking readers. >>> If that's not the case, why did we bother? >>> >> >> Well, I thought the motivation was to allow easy construction of >> parsers for the data, since creating a parser for those formats is >> pretty trivial. >> Anyway, if we don't care about validation that's fine. I just didn't >> want us to make that decision unconsciously. > > Parsing XML isn't trivial, not if done correctly... :-) > > I don't see the benefit of validation beyond test suites, and then the > specification can be published with the version of PostgreSQL (as > XSD?) if so necessary. > > Primary benefits include: > > 1) Open and widely recognized format. > 2) Well tested and readily available parsers already exist. > 3) Able to easily add content without breaking existing parsers or > analyzers, provided the parsers and analyzers are written properly. > > Any XML parser that does: m[<tag>(.*?)</tag>] ... is not written > properly. Parsing all these formats is trivial enough if you use a library to do the heavy lifting for you. I do *lots* of XML/XSL work and I don't use stuff like "m[<tag>(.*?)</tag>]". Here's a fragment from a working program written a week or two back in perl: my $parser= XML::DOM::Parser->new(); my $xp = $parser->parsefile($xmlfile); my ($provider) = $xp->findvalue("//SERVICE_PROVIDER_CODE"); my ($invoice_num) = $xp->findvalue("//invoice_num"); Not that hard, is it? No regex matching there. :-) Regarding your other suggestion, the whole point I have been making is that if external modules can invent arbitrary nodes then we can't publish an XSD (or RelaxNG or DTD) spec that is worth anything at all. Anyway, it seems like that's what people want. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Regarding your other suggestion, the whole point I have been making is > that if external modules can invent arbitrary nodes then we can't > publish an XSD (or RelaxNG or DTD) spec that is worth anything at all. Well, sure we can. But if you're using any external FDW, you'll need to consult its documentation to see what additions it makes. It may be sufficient to say something like "ForeignScan can have unspecified additional children". Dunno if we can formalize that in any useful way ... regards, tom lane
On 02/21/2011 12:28 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> Regarding your other suggestion, the whole point I have been making is >> that if external modules can invent arbitrary nodes then we can't >> publish an XSD (or RelaxNG or DTD) spec that is worth anything at all. > Well, sure we can. But if you're using any external FDW, you'll need to > consult its documentation to see what additions it makes. > > It may be sufficient to say something like "ForeignScan can have > unspecified additional children". Dunno if we can formalize that > in any useful way ... > > Well, you can override definitions, so the FDW could provide a spec that imported the base spec and then overrode the relevant parts to plug in its extra nodes. But that would get pretty hairy with more than one FDW. Still, we've got by so far with no spec at all so maybe it really doesn't matter. cheers andrew
On Feb 21, 2011, at 9:12 AM, Andrew Dunstan wrote: > my $parser= XML::DOM::Parser->new(); > my $xp = $parser->parsefile($xmlfile); > my ($provider) = $xp->findvalue("//SERVICE_PROVIDER_CODE"); > my ($invoice_num) = $xp->findvalue("//invoice_num"); > > Not that hard, is it? No regex matching there. :-) See also https://github.com/theory/explain-table Best, David
On Feb 21, 2011, at 10:07 AM, David E. Wheeler wrote: > See also > > https://github.com/theory/explain-table Oops, sorry, make that https://github.com/pgexperts/explain-table Best, David
On Feb 21, 2011, at 10:11 AM, David E. Wheeler wrote: > Oops, sorry, make that > > https://github.com/pgexperts/explain-table And now I've renamed it (sorry) and released it on PGXN. New links: https://github.com/pgexperts/explanation http://master.pgxn.org/dist/explanation/ Best, David