Re: SQL/MED - core functionality - Mailing list pgsql-hackers

From David Fetter
Subject Re: SQL/MED - core functionality
Date
Msg-id 20101222155255.GB21669@fetter.org
Whole thread Raw
In response to Re: SQL/MED - core functionality  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Tue, Dec 21, 2010 at 07:33:04PM +0000, Simon Riggs wrote:
> On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote:
> 
> > Attached are revised version of SQL/MED core functionality patches.
> 
> Looks very interesting new feature, well done.

While, "read SQL:2008" is generally not super useful advice, in the
particular case of 6WD2_09_MED_2007-12, it's actually pretty clear.
Doubtless, as more competition arises, SQL/MED's specification will
get cloudier, but for now, it's a decent place to start.

> 4. In Hot Standby, we are creating many copies of the data tables on
> different servers.  That seems to break the concept that data is in
> only one place, when we assume that a foreign table is on only one
> foreign server.  How will we represent the concept that data is
> potentially available identically from more than one place?  Any
> other comments about how this will work with Hot Standby?

Your premise, that we're creating many copies, refers to a potential
optimization of SQL/MED which IMHO we should not even vaguely consider
until we've given SQL/MED a chance to shake out.  It's a basic
question about cache coherency, and we just don't need to go there on
the first round.  Possibly not even on the second.

> 5. In PL/Proxy, we have the concept that a table is sharded across
> multiple nodes.  Is that possible here?  Again, we seem to have the
> concept that a table is only ever in a single place.

It's not super relevant.

> 6. Can we do CREATE FOREIGN TABLE .... AS SELECT ...
> I guess the answer depends on (1)

The answer to (1) is "reference to data not managed by the instance of
PostgreSQL doing the queries," so it should be possible, depending on
what capabilities the particular foreign data wrapper exposes.

> 7. Why does ANALYZE skip foreign tables? Surely its really important
> we know things about a foreign table, otherwise we are going to
> optimize things very badly.

The only thing known about a table may be the structure of its rows.
Streams would be one example of this.

> 8. Is the WHERE clause passed down into a ForeignScan?

Dunno.  I'd presume that predicate pushing would be a very important
part of how to get this working at a reasonable speed, but as I've
demonstrated with DBI-Link, it's not strictly necessary for base
functionality.

> 9. The docs for CHECK constraints imply that the CHECK is executed
> against any rows returned from FDW. Are we really going to execute that
> as an additional filter on each row retrieved?

Depends.  If there's some reliable way to push the CHECK to the
foreign data source, that is of course preferable.

> 10. Can a foreign table be referenced by a FK?

It's conceivable, at least in some cases.  The penalties could
potentially be quite high.

> 11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE?

Don't see why not.  Again, bearing in mind that not all foreign data
sources are remotely similar in capability.

> 12. I think it would be useful for both review and afterwards to
> write the documentation section now, so we can begin to understand
> this.  Will there be a documentation section on writing a FDW also?
> There are enough open questions here that I think we need docs and a
> review guide, otherwise we'll end up with some weird missing
> feature, which would be a great shame.

Excellent idea.  Next on the agenda: carving out the needed resources
for this project.

> 13. How does this relate to dblink? Is that going to be replaced by
> this feature?

As I understand it, dblink is already using some of the
infrastructure.  For legacy reasons, I suspect dblink will be
maintained, even when we have a full-blown SQL/MED implementation.

> 14. How do we do scrollable cursors with foreign tables? Do we
> materialize them always? Or...

That will depend on the nature of the foreign data source.

> 15. In terms of planning queries, do we have a concept of additional
> cost per row on a foreign server?  How does the planner decide how costly
> retrieving data is from the FDW?

Dunno.  I return to my refrain of, "depends on the foreign data
source."  You could imagine that a future version of PostgreSQL would
have handy interfaces available to MED and others, where a CSV file
would be much less likely to.

> 16. If we cancel a query, is there an API call to send query cancel to
> the FDW and so on to the foreign server? Does that still work if we hot
> other kinds of ERROR, or FATAL?

Depends...

> 17. Can we request different types of transaction isolation on the
> foreign server, or do certain states get passed through from our
> session? e.g. if we are running a serializable transaction, does
> that state get passed through to the FDW, so it knows to request
> that on the foreign server?  That seems essential if we are going to
> make pg_dump work correctly.

Assuming that the foreign server even has a concept of transaction
isolation, which it may well not, we should be able to pass same.

> 18. Does pg_dump dump the data in the FDW or just of the definition of
> the data? Can we have an option for either?

I'm thinking it should not dump foreign data in general, and if and
when we build in that capability, it should come with loud, clear
warnings about the caching issues involved.

> 19. If we PREPARE a statement, are there API calls to pass thru the
> PREPARE to the FDW? Or are calls always dynamic?

Depends...

> 20. If default privileges include INSERT, UPDATE or DELETE, does this
> cause error, or does it silently get ignored for foreign tables? I think
> I would want the latter.

Access control should probably be controlled by PostgreSQL, and as I
understand the API, they will be.

> 21. Can we LOCK a foreign table? I guess so. Presumably no LOCK is
> passed through to the FDW?

Depends...

> 22. Can we build an local index on a foreign table?

Dunno.  In principle, it might be possible, but there are pretty
thorny caching issues that make this an edge case.  I can imagine some
gigantic read-only store where this would make sense, but I'm having
trouble thinking of another case.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: How much do the hint bits help?
Next
From: Alvaro Herrera
Date:
Subject: Re: SQL/MED - core functionality