Thread: Remote connections?

Remote connections?

From
mlw
Date:
I just found out something about Oracle which that looks like something
that could be doable in PostgreSQL. 

What do you all think:

Oracle's version is something like this:

create [public] database link using [...]

select * from sometable@remotelink


I was thinking how this could be done with postgreSQL. How hard would it
be to make something that is similar to a view, but executes a query
remotely? I envision something like this:

create [public] link name query using [...]

The table link will be similar to a view. It could be used like this:

CREATE LINK test as select * from test WITH 'user=postgres host=remote
db=data';

SELECT * from test;

or 

SELECT * from fubar join test on (fubar.id = test.id) ;

So, what do you think? Impossible, possible, too hard? too easy?


Re: Remote connections?

From
"Ross J. Reedstrom"
Date:
On Thu, Dec 06, 2001 at 01:28:04PM -0500, mlw wrote:
> I just found out something about Oracle which that looks like something
> that could be doable in PostgreSQL. 
> 
> What do you all think:
> 
> Oracle's version is something like this:
> 
> create [public] database link using [...]
> 
> select * from sometable@remotelink
> 
> 
> I was thinking how this could be done with postgreSQL. How hard would it
> be to make something that is similar to a view, but executes a query
> remotely? I envision something like this:
> 
> create [public] link name query using [...]
> 
> The table link will be similar to a view. It could be used like this:
> 
> CREATE LINK test as select * from test WITH 'user=postgres host=remote
> db=data';
> 
> SELECT * from test;
> 
> or 
> 
> SELECT * from fubar join test on (fubar.id = test.id) ;
> 
> So, what do you think? Impossible, possible, too hard? too easy?

Here we come, full circle. This is just about where I came on board.
Many moons ago, I started looking at Mariposa, in the hopes of forward
patching it into PostgreSQL, and generalizing the 'remote' part to allow
exactly the sort of access you described above.

The biggest problem with this is transactional semantics: you need
two-stage commits to get this right, and we don't hav'em. (Has there
been an indepth discussion concerning what how hard it would be to do
that with postgresql?) 

The _actual_ biggest problem was my lack of knowledge of the PostgreSQL
codebase ;-)

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Executive Director                                  phone: 713-348-6166
Gulf Coast Consortium for Bioinformatics              fax: 713-348-6182
Rice University MS-39
Houston, TX 77005


Re: Remote connections?

From
mlw
Date:
"Ross J. Reedstrom" wrote:
> 
> On Thu, Dec 06, 2001 at 01:28:04PM -0500, mlw wrote:
> > I just found out something about Oracle which that looks like something
> > that could be doable in PostgreSQL.
> >
> > What do you all think:
> >
> > Oracle's version is something like this:
> >
> > create [public] database link using [...]
> >
> > select * from sometable@remotelink
> >
> >
> > I was thinking how this could be done with postgreSQL. How hard would it
> > be to make something that is similar to a view, but executes a query
> > remotely? I envision something like this:
> >
> > create [public] link name query using [...]
> >
> > The table link will be similar to a view. It could be used like this:
> >
> > CREATE LINK test as select * from test WITH 'user=postgres host=remote
> > db=data';
> >
> > SELECT * from test;
> >
> > or
> >
> > SELECT * from fubar join test on (fubar.id = test.id) ;
> >
> > So, what do you think? Impossible, possible, too hard? too easy?
> 
> Here we come, full circle. This is just about where I came on board.
> Many moons ago, I started looking at Mariposa, in the hopes of forward
> patching it into PostgreSQL, and generalizing the 'remote' part to allow
> exactly the sort of access you described above.
> 
> The biggest problem with this is transactional semantics: you need
> two-stage commits to get this right, and we don't hav'em. (Has there
> been an indepth discussion concerning what how hard it would be to do
> that with postgresql?)
> 
> The _actual_ biggest problem was my lack of knowledge of the PostgreSQL
> codebase ;-)

I think we can we can dispense worrying about two stage commits, if we
assume that remote connections are treated as views with no rules. As
long as remote tables are "read only" then the implementation is much
easier.

I too find the internals of PostgreSQL virtually incomprehensible at the
internal level. If there were a document somewhere which published how a
function could return multiple tuples, remote views would be a trivial
undertaking. It could look like:

select * from remote('select *from table', 'user=postgres host=outland
db=remote');


Re: Remote connections?

From
Joe Conway
Date:
mlw wrote:

> I too find the internals of PostgreSQL virtually incomprehensible at the
> internal level. If there were a document somewhere which published how a
> function could return multiple tuples, remote views would be a trivial
> undertaking. It could look like:
> 
> select * from remote('select *from table', 'user=postgres host=outland
> db=remote');
> 

See contrib/dblink in the 7.2 beta. It was my attempt inspired by 
Oracle's dblink and some code that you (mlw) posted a while back. Based 
on the limitations wrt returning muliple tuples, I wound up with 
something like:

lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select 
dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres 
password=postgres','select proname from pg_proc') as dblink_p) as t1;

Which, as has been pointed out more than once, is pretty ugly, but at 
least it's a start ;-)


Joe



Re: Remote connections?

From
Alex Pilosov
Date:
On Thu, 6 Dec 2001, mlw wrote:

> I too find the internals of PostgreSQL virtually incomprehensible at the
> internal level. If there were a document somewhere which published how a
> function could return multiple tuples, remote views would be a trivial
> undertaking. It could look like:
> 
> select * from remote('select *from table', 'user=postgres host=outland
> db=remote');
This isn't possible yet. I was working on implementation of this, about
80% done, but never finished. Now I'm out of time to work more on this for
a while. :(

Let me know if you want my code.

-alex



Re: Remote connections?

From
mlw
Date:
Hey this looks really cool. It looks like something I was thinking about doing.
I have a couple suggestions that could make it a little better, I hope you will
not be offended. (If you want my help, I'll chip in!)

Why not use a binary cursor? That way native types can slip through without the
overhead of conversion.

Right now you get all rows up front, you may be able to increase overall
performance by fetching only a few rows at a time, rather than get everything
all at once. (Think on the order of 4 million rows from your remote query!)
Execute the commit at the end of processing. There are even some asynchronous
functions you may be able to utilize to reduce the I/O bottleneck. Use the
synchronous function first, then before you return initiate an asynchronous
read. Every successive pass through the function, read the newly arrived tuple,
and initiate the next asynchronous read. (The two machine could be processing
the query simultaneously, and this could even IMPROVE performance over a single
system for heavy duty queries.)

Setup a hash table for field names, rather than requiring field numbers. (Keep
field number for efficiency, of course.)

You could eliminate having to pass the result pointer around by keeping a
static array in your extension. Use something like Oracle's "contains" notation
of result number. Where each instantiation of "contains()" and "score()"
require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have
some code that does this for a PostgreSQL extension (it implements contains) on
my website (pgcontains, under download). It is ugly but works for the most
part.

Seriously, your stuff looks great. I think it could be the beginning of a
fairly usable system for my company. Any help you need/want, just let me know.


Joe Conway wrote:
> 
> mlw wrote:
> 
> > I too find the internals of PostgreSQL virtually incomprehensible at the
> > internal level. If there were a document somewhere which published how a
> > function could return multiple tuples, remote views would be a trivial
> > undertaking. It could look like:
> >
> > select * from remote('select *from table', 'user=postgres host=outland
> > db=remote');
> >
> 
> See contrib/dblink in the 7.2 beta. It was my attempt inspired by
> Oracle's dblink and some code that you (mlw) posted a while back. Based
> on the limitations wrt returning muliple tuples, I wound up with
> something like:
> 
> lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select
> dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres
> password=postgres','select proname from pg_proc') as dblink_p) as t1;
> 
> Which, as has been pointed out more than once, is pretty ugly, but at
> least it's a start ;-)
> 
> Joe
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Re: Remote connections?

From
Oleg Bartunov
Date:
On Fri, 7 Dec 2001, mlw wrote:

>
> You could eliminate having to pass the result pointer around by keeping a
> static array in your extension. Use something like Oracle's "contains" notation
> of result number. Where each instantiation of "contains()" and "score()"
> require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have
> some code that does this for a PostgreSQL extension (it implements contains) on
> my website (pgcontains, under download). It is ugly but works for the most
> part.

contrib/intarray does this job very well

>
> Seriously, your stuff looks great. I think it could be the beginning of a
> fairly usable system for my company. Any help you need/want, just let me know.
>
>
> Joe Conway wrote:
> >
> > mlw wrote:
> >
> > > I too find the internals of PostgreSQL virtually incomprehensible at the
> > > internal level. If there were a document somewhere which published how a
> > > function could return multiple tuples, remote views would be a trivial
> > > undertaking. It could look like:
> > >
> > > select * from remote('select *from table', 'user=postgres host=outland
> > > db=remote');
> > >
> >
> > See contrib/dblink in the 7.2 beta. It was my attempt inspired by
> > Oracle's dblink and some code that you (mlw) posted a while back. Based
> > on the limitations wrt returning muliple tuples, I wound up with
> > something like:
> >
> > lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select
> > dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres
> > password=postgres','select proname from pg_proc') as dblink_p) as t1;
> >
> > Which, as has been pointed out more than once, is pretty ugly, but at
> > least it's a start ;-)
> >
> > Joe
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Remote connections?

From
Joe Conway
Date:
mlw wrote:

> Hey this looks really cool. It looks like something I was thinking about doing.
> I have a couple suggestions that could make it a little better, I hope you will
> not be offended. (If you want my help, I'll chip in!)
> 


Thanks! Suggestions welcomed.

> Why not use a binary cursor? That way native types can slip through without the
> overhead of conversion.
>


I wasn't sure that would work. Would you create dblink_tok as returning 
opaque then?

> Right now you get all rows up front, you may be able to increase overall
> performance by fetching only a few rows at a time, rather than get everything
> all at once. (Think on the order of 4 million rows from your remote query!)
> Execute the commit at the end of processing. There are even some asynchronous
> functions you may be able to utilize to reduce the I/O bottleneck. Use the
> synchronous function first, then before you return initiate an asynchronous
> read. Every successive pass through the function, read the newly arrived tuple,
> and initiate the next asynchronous read. (The two machine could be processing
> the query simultaneously, and this could even IMPROVE performance over a single
> system for heavy duty queries.)


Interesting . . . but aren't there some issues with the asynch functions?

> 
> Setup a hash table for field names, rather than requiring field numbers. (Keep
> field number for efficiency, of course.)
> 
> You could eliminate having to pass the result pointer around by keeping a
> static array in your extension. Use something like Oracle's "contains" notation
> of result number. Where each instantiation of "contains()" and "score()"
> require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have
> some code that does this for a PostgreSQL extension (it implements contains) on
> my website (pgcontains, under download). It is ugly but works for the most
> part.
> 


I thought about the static array, but I'm not familiar with Oracle 
contains() and score() -- I'm only fluent enough with Oracle to be 
dangerous. Guess I'll have to dig out the books . . .


> Seriously, your stuff looks great. I think it could be the beginning of a
> fairly usable system for my company. Any help you need/want, just let me know.
> 

I am planning to improve dblink during the next release cycle, so I'll 
keep all this in mind (and might take you up on the help offer too!). I 
was hoping we'd have functions returning tuples by now, which would 
improve this extension dramatically. Unfortunately, it sounds like Alex 
won't have time to finish that even for 7.3 :(

Alex, can we get a look at your latest code? Is it any different the 
your last submission to PATCHES?

Joe



Re: Remote connections?

From
mlw
Date:
The dblink code is a very cool idea.

It got me thinking, what if, just thinking out load here, it was redesigned as
something a little more grandeous.

Imagine this:


select dblink('select * from table', 'table_name', 'db=oracle.test user=chris
passwd=knight', 1) as t1, dblink('table2_name', 1) as t2


Just something to think about. 

The first instance of dblink would take 4 parameters: query, table which it
returns, connect string, and link token.

The second instance of dblink would just take the name of the table which it
returns and a link token.

The cool bit is the notion that the query string could specify different
databases or even .DBF libraries. 

Just something to think about.

It would REALLY be great if functions could return multiple tuples!