Thread: Arguments to foreign tables?
Is there any fundamental or philosophical reason why a foreign table can't accept arguments? Should that be a TODO? Right now, to accept arguments to a from-clause item, you have to use an SRF, which is much more limited than a foreign table. If foreign tables could accept arguments, then SRFs could just be a simplified special case of foreign tables. I looked for some previous discussions on this topic and nothing turned up. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Is there any fundamental or philosophical reason why a foreign table > can't accept arguments? That isn't a table; it's some sort of function. Now that we have LATERAL, there is no good reason to contort SQL's syntax and semantics in the direction you suggest. regards, tom lane
I don't understand the question. A table accepting arguments? What do you mean with that? Can you give an code example in an other RDBMS (like oracle)?
Perhaps nothing came up because there are more people not understanding what you want to accomplish?
regards,
Willem
> Subject: [HACKERS] Arguments to foreign tables?
> From: pgsql@j-davis.com
> To: pgsql-hackers@postgresql.org
> Date: Sun, 4 Nov 2012 11:59:20 -0800
>
> Is there any fundamental or philosophical reason why a foreign table
> can't accept arguments? Should that be a TODO?
>
> Right now, to accept arguments to a from-clause item, you have to use an
> SRF, which is much more limited than a foreign table. If foreign tables
> could accept arguments, then SRFs could just be a simplified special
> case of foreign tables.
>
> I looked for some previous discussions on this topic and nothing turned
> up.
>
> Regards,
> Jeff Davis
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> From: pgsql@j-davis.com
> To: pgsql-hackers@postgresql.org
> Date: Sun, 4 Nov 2012 11:59:20 -0800
>
> Is there any fundamental or philosophical reason why a foreign table
> can't accept arguments? Should that be a TODO?
>
> Right now, to accept arguments to a from-clause item, you have to use an
> SRF, which is much more limited than a foreign table. If foreign tables
> could accept arguments, then SRFs could just be a simplified special
> case of foreign tables.
>
> I looked for some previous discussions on this topic and nothing turned
> up.
>
> Regards,
> Jeff Davis
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
On 11/04/12 11:59 AM, Jeff Davis wrote: > Is there any fundamental or philosophical reason why a foreign table > can't accept arguments? Should that be a TODO? what does that even mean? how would 'data' accept 'arguments' ??! -- john r pierce N 37, W 122 santa cruz ca mid-left coast
>
> what does that even mean? how would 'data' accept 'arguments' ??!
>
>
That's Sherlock Holmes! "Don't argue with your data"
> what does that even mean? how would 'data' accept 'arguments' ??!
>
>
That's Sherlock Holmes! "Don't argue with your data"
On Sun, 2012-11-04 at 15:13 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Is there any fundamental or philosophical reason why a foreign table > > can't accept arguments? > > That isn't a table; it's some sort of function. Now that we have > LATERAL, there is no good reason to contort SQL's syntax and semantics > in the direction you suggest. Maybe I should rephrase this as a problem with SRFs: you don't get to define the init/exec/end executor functions, and you don't get access to the optimizer information. It seems like foreign tables are a better mechanism (except for the simple cases where you don't care about the details), and the only thing an SRF can do that a foreign table can't is accept arguments. So, I thought maybe it would make more sense to combine the mechanisms somehow. Take something as simple as generate_series: right now, it materializes the entire thing if it's in the FROM clause, but it wouldn't need to if it could use the foreign table mechanism. Regards,Jeff Davis
Maybe you could set some options on the foreign table before selecting from it ?
Another way you could achieve the same result would be to give some column a special meaning (like it is done in the twitter_fdw for example).
If you don't mind, do you have a specific use-case for this ?
--
Ronan Dunklau
2012/11/6 Jeff Davis <pgsql@j-davis.com>
On Sun, 2012-11-04 at 15:13 -0500, Tom Lane wrote:Maybe I should rephrase this as a problem with SRFs: you don't get to
> Jeff Davis <pgsql@j-davis.com> writes:
> > Is there any fundamental or philosophical reason why a foreign table
> > can't accept arguments?
>
> That isn't a table; it's some sort of function. Now that we have
> LATERAL, there is no good reason to contort SQL's syntax and semantics
> in the direction you suggest.
define the init/exec/end executor functions, and you don't get access to
the optimizer information.
It seems like foreign tables are a better mechanism (except for the
simple cases where you don't care about the details), and the only thing
an SRF can do that a foreign table can't is accept arguments. So, I
thought maybe it would make more sense to combine the mechanisms
somehow.
Take something as simple as generate_series: right now, it materializes
the entire thing if it's in the FROM clause, but it wouldn't need to if
it could use the foreign table mechanism.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Davis <pgsql@j-davis.com> writes: > Take something as simple as generate_series: right now, it materializes > the entire thing if it's in the FROM clause, but it wouldn't need to if > it could use the foreign table mechanism. So, my understanding of your proposal is that a good way to implement streaming SRF would be on top of the internals of Foreign Data Wrappers? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 6 November 2012 06:09, Jeff Davis <pgsql@j-davis.com> wrote: > On Sun, 2012-11-04 at 15:13 -0500, Tom Lane wrote: >> Jeff Davis <pgsql@j-davis.com> writes: >> > Is there any fundamental or philosophical reason why a foreign table >> > can't accept arguments? >> >> That isn't a table; it's some sort of function. Now that we have >> LATERAL, there is no good reason to contort SQL's syntax and semantics >> in the direction you suggest. > > Maybe I should rephrase this as a problem with SRFs: you don't get to > define the init/exec/end executor functions, and you don't get access to > the optimizer information. > > It seems like foreign tables are a better mechanism (except for the > simple cases where you don't care about the details), and the only thing > an SRF can do that a foreign table can't is accept arguments. So, I > thought maybe it would make more sense to combine the mechanisms > somehow. > > Take something as simple as generate_series: right now, it materializes > the entire thing if it's in the FROM clause, but it wouldn't need to if > it could use the foreign table mechanism. This seems a useful thought to me. Tom is complaining about conflating the two features, which does seem unnecessary. But we can still merge code. Providing run-time parameters or table-level options to Foreign Scans seems sensible. I can see why you'd want to pass through parameters like "work_mem" to the foreign scan and yet not have those as permanent table options. (On another thought: why do Foreign Scans get to avoid the materialization we hate with SRFs?) -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I have been working on external replication on Postgresql 9.2 for a little while (with too many interruptions blocking my progress!) Who knows a good utility to aggressively analyze and recover Postgresql Databases? It seems the standard reply that I see is "Make regular backups", but that guarantees maximum full data loss defined by the backup time interval. Our MariaDB Mysql/ExtraDB/Innodb friends and Aria_check and some other tools to "recover" as much as possible up to the moment of failure. While full replication is the ultimate safeguard, in "split brain" mode, I could see a hardware failure causing loss of data up to the last replication exchange or last backup interval. During a data crash, I want the recovery tool to HELP me get as much data recovered and get back to operations. What I do not want to do is a bunch of manual command line file copy and deletes to "guess" my way back to operational mode (some data loss is inevitable) I could make a daily snapshot of the system catalog to assist the recovery tool in restoring the database. Who has ideas on this?
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Jeff Davis <pgsql@j-davis.com> writes: >> Take something as simple as generate_series: right now, it materializes >> the entire thing if it's in the FROM clause, but it wouldn't need to if >> it could use the foreign table mechanism. > So, my understanding of your proposal is that a good way to implement > streaming SRF would be on top of the internals of Foreign Data Wrappers? I'm inclined to think that this proposal is at best premature. The FDW mechanisms are still very much a work in progress, and I will *not* promise anybody that we aren't going to change those APIs repeatedly before all the dust has settled. As such, I think we'd better confine the audience to those few souls brave enough to try to write actual foreign data wrappers. If we put up signs telling the average Joe to use those APIs for SRFs, we're going to have a lot of pressure to preserve the existing APIs, no matter how broken they turn out to be. I'd also opine that the FDW APIs are pretty darn heavyweight for an SRF. There might be a small number of SRFs for which it's actually worth dealing with the planner in full generality, but surely not very many. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > (On another thought: why do Foreign Scans get to avoid the > materialization we hate with SRFs?) [ shrug... ] That works both ways. If it would be convenient for an FDW to return a tuplestore, which is hardly unlikely, it's out of luck; it has to do the expansion itself. The fact that nodeFunctionscan forces materialization is not by any means a fundamental property. It's just that nobody bothered to implement two code paths there. regards, tom lane
On Tue, 2012-11-06 at 08:35 +0000, Simon Riggs wrote: > Tom is complaining about conflating the two features, which does seem > unnecessary. But we can still merge code. That's a good way to put it. > Providing run-time parameters or table-level options to Foreign Scans > seems sensible. I can see why you'd want to pass through parameters > like "work_mem" to the foreign scan and yet not have those as > permanent table options. Right, if you want to set the foreign work_mem differently than the local work_mem. > (On another thought: why do Foreign Scans get to avoid the > materialization we hate with SRFs?) I don't have an authoritative answer for this, but I *think* it's because you can specify the begin/iterate/end for the FDW, which allows you to do proper initialization and teardown regardless of whether the function runs to completion. Also, you can specify the rescan, which means that it's up to the FDW author to make sure they produce the same results again (an SRF isn't required to because it can just output the materialized results again). There's probably more work to be done here in FDWs for a few remaining edge cases, like if the results can't be reproduced on the foreign side (e.g. twitter feed or something) and parameters are changing. Regards,Jeff Davis
On Tue, 2012-11-06 at 09:39 +0100, Dimitri Fontaine wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Take something as simple as generate_series: right now, it materializes > > the entire thing if it's in the FROM clause, but it wouldn't need to if > > it could use the foreign table mechanism. > > So, my understanding of your proposal is that a good way to implement > streaming SRF would be on top of the internals of Foreign Data Wrappers? Yes, I was talking about combining the mechanisms and combining the capabilities of each mechanism. That would hopefully be a net decrease in code, and offer the ability of FDWs to be strictly more powerful than SRFs (though perhaps more challenging to develop). Regards,Jeff Davis
On Tue, 2012-11-06 at 09:19 +0100, Ronan Dunklau wrote: > Maybe you could set some options on the foreign table before selecting > from it ? > Another way you could achieve the same result would be to give some > column a special meaning (like it is done in the twitter_fdw for > example). > > > If you don't mind, do you have a specific use-case for this ? > Let's say that your remote data source is a stream of data that is not actually being stored anywhere, e.g. network events. The data you want to retrieve are all events with a timestamp less than X, and you assume that the timestamp is monotonically increasing (so as soon as you get to X, the read is finished). An SRF isn't good enough because it always materializes (and that's the only way it allows you to control initialization and teardown of the stream connection). But you don't want to have to define a new foreign table each time. It would be better if there were a way to pass the argument X to the FDW mechanism. Regards,Jeff Davis
2012/11/6 Jeff Davis <pgsql@j-davis.com>: > On Tue, 2012-11-06 at 09:19 +0100, Ronan Dunklau wrote: >> Maybe you could set some options on the foreign table before selecting >> from it ? >> Another way you could achieve the same result would be to give some >> column a special meaning (like it is done in the twitter_fdw for >> example). >> >> >> If you don't mind, do you have a specific use-case for this ? >> > Let's say that your remote data source is a stream of data that is not > actually being stored anywhere, e.g. network events. The data you want > to retrieve are all events with a timestamp less than X, and you assume > that the timestamp is monotonically increasing (so as soon as you get to > X, the read is finished). > > An SRF isn't good enough because it always materializes (and that's the > only way it allows you to control initialization and teardown of the > stream connection). But you don't want to have to define a new foreign > table each time. It would be better if there were a way to pass the > argument X to the FDW mechanism. An SRF must not materialize always - depends on implementation Regards Pavel > > Regards, > Jeff Davis > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, 2012-11-06 at 10:55 -0500, Tom Lane wrote: > I'd also opine that the FDW APIs are pretty darn heavyweight for an SRF. > There might be a small number of SRFs for which it's actually worth > dealing with the planner in full generality, but surely not very many. I was not suggesting that we remove the user-facing simplicity of SRFs. Granted, this is not very well thought-through from an API standpoint. But I was imagining adding something to the FDW API that allows for arguments to be passed from the FROM clause down the the FDW code. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2012-11-06 at 10:55 -0500, Tom Lane wrote: >> I'd also opine that the FDW APIs are pretty darn heavyweight for an SRF. >> There might be a small number of SRFs for which it's actually worth >> dealing with the planner in full generality, but surely not very many. > I was not suggesting that we remove the user-facing simplicity of SRFs. > Granted, this is not very well thought-through from an API standpoint. > But I was imagining adding something to the FDW API that allows for > arguments to be passed from the FROM clause down the the FDW code. Well, that's not terribly exciting. An FDW can already look at the RTE for the relation, which would contain whatever information is available about parameters-in-FROM. But the real problem is how do you get into the FDW code path to start with, when the FROM item looks like a function call? regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2012-11-06 at 09:19 +0100, Ronan Dunklau wrote: >> If you don't mind, do you have a specific use-case for this ? > Let's say that your remote data source is a stream of data that is not > actually being stored anywhere, e.g. network events. The data you want > to retrieve are all events with a timestamp less than X, and you assume > that the timestamp is monotonically increasing (so as soon as you get to > X, the read is finished). > An SRF isn't good enough because it always materializes (and that's the > only way it allows you to control initialization and teardown of the > stream connection). But you don't want to have to define a new foreign > table each time. It would be better if there were a way to pass the > argument X to the FDW mechanism. That particular example can be handled perfectly well today, with select * from stream_table where tscol < 'whatever'; The FDW could be coded to throw an error if the query doesn't provide a WHERE clause that constrains the timestamp column suitably. It does mean that you have to expose the "argument" as a result column, but at least for this use-case that hardly seems like a problem. regards, tom lane
On Tue, 2012-11-06 at 12:57 -0500, Tom Lane wrote: > That particular example can be handled perfectly well today, with > > select * from stream_table where tscol < 'whatever'; > > The FDW could be coded to throw an error if the query doesn't provide a > WHERE clause that constrains the timestamp column suitably. It does > mean that you have to expose the "argument" as a result column, but at > least for this use-case that hardly seems like a problem. Oh, interesting idea to require a WHERE clause that way. I guess, in theory, you could have an operator that always returns TRUE, and pass everything down in WHERE clauses that way. I think there is a case for actual arguments though: remote settings for GUCs (as Simon pointed out); maybe the host to get the data from if it's available from several locations; and if we support defining the schema on the fly (like SRFs that return RECORD) then you could pass column lists. Regards,Jeff Davis