Thread: Remote connections?
I just found out something about Oracle which that looks like something that could be doable in PostgreSQL. What do you all think: Oracle's version is something like this: create [public] database link using [...] select * from sometable@remotelink I was thinking how this could be done with postgreSQL. How hard would it be to make something that is similar to a view, but executes a query remotely? I envision something like this: create [public] link name query using [...] The table link will be similar to a view. It could be used like this: CREATE LINK test as select * from test WITH 'user=postgres host=remote db=data'; SELECT * from test; or SELECT * from fubar join test on (fubar.id = test.id) ; So, what do you think? Impossible, possible, too hard? too easy?
On Thu, Dec 06, 2001 at 01:28:04PM -0500, mlw wrote: > I just found out something about Oracle which that looks like something > that could be doable in PostgreSQL. > > What do you all think: > > Oracle's version is something like this: > > create [public] database link using [...] > > select * from sometable@remotelink > > > I was thinking how this could be done with postgreSQL. How hard would it > be to make something that is similar to a view, but executes a query > remotely? I envision something like this: > > create [public] link name query using [...] > > The table link will be similar to a view. It could be used like this: > > CREATE LINK test as select * from test WITH 'user=postgres host=remote > db=data'; > > SELECT * from test; > > or > > SELECT * from fubar join test on (fubar.id = test.id) ; > > So, what do you think? Impossible, possible, too hard? too easy? Here we come, full circle. This is just about where I came on board. Many moons ago, I started looking at Mariposa, in the hopes of forward patching it into PostgreSQL, and generalizing the 'remote' part to allow exactly the sort of access you described above. The biggest problem with this is transactional semantics: you need two-stage commits to get this right, and we don't hav'em. (Has there been an indepth discussion concerning what how hard it would be to do that with postgresql?) The _actual_ biggest problem was my lack of knowledge of the PostgreSQL codebase ;-) Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Executive Director phone: 713-348-6166 Gulf Coast Consortium for Bioinformatics fax: 713-348-6182 Rice University MS-39 Houston, TX 77005
"Ross J. Reedstrom" wrote: > > On Thu, Dec 06, 2001 at 01:28:04PM -0500, mlw wrote: > > I just found out something about Oracle which that looks like something > > that could be doable in PostgreSQL. > > > > What do you all think: > > > > Oracle's version is something like this: > > > > create [public] database link using [...] > > > > select * from sometable@remotelink > > > > > > I was thinking how this could be done with postgreSQL. How hard would it > > be to make something that is similar to a view, but executes a query > > remotely? I envision something like this: > > > > create [public] link name query using [...] > > > > The table link will be similar to a view. It could be used like this: > > > > CREATE LINK test as select * from test WITH 'user=postgres host=remote > > db=data'; > > > > SELECT * from test; > > > > or > > > > SELECT * from fubar join test on (fubar.id = test.id) ; > > > > So, what do you think? Impossible, possible, too hard? too easy? > > Here we come, full circle. This is just about where I came on board. > Many moons ago, I started looking at Mariposa, in the hopes of forward > patching it into PostgreSQL, and generalizing the 'remote' part to allow > exactly the sort of access you described above. > > The biggest problem with this is transactional semantics: you need > two-stage commits to get this right, and we don't hav'em. (Has there > been an indepth discussion concerning what how hard it would be to do > that with postgresql?) > > The _actual_ biggest problem was my lack of knowledge of the PostgreSQL > codebase ;-) I think we can we can dispense worrying about two stage commits, if we assume that remote connections are treated as views with no rules. As long as remote tables are "read only" then the implementation is much easier. I too find the internals of PostgreSQL virtually incomprehensible at the internal level. If there were a document somewhere which published how a function could return multiple tuples, remote views would be a trivial undertaking. It could look like: select * from remote('select *from table', 'user=postgres host=outland db=remote');
mlw wrote: > I too find the internals of PostgreSQL virtually incomprehensible at the > internal level. If there were a document somewhere which published how a > function could return multiple tuples, remote views would be a trivial > undertaking. It could look like: > > select * from remote('select *from table', 'user=postgres host=outland > db=remote'); > See contrib/dblink in the 7.2 beta. It was my attempt inspired by Oracle's dblink and some code that you (mlw) posted a while back. Based on the limitations wrt returning muliple tuples, I wound up with something like: lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres password=postgres','select proname from pg_proc') as dblink_p) as t1; Which, as has been pointed out more than once, is pretty ugly, but at least it's a start ;-) Joe
On Thu, 6 Dec 2001, mlw wrote: > I too find the internals of PostgreSQL virtually incomprehensible at the > internal level. If there were a document somewhere which published how a > function could return multiple tuples, remote views would be a trivial > undertaking. It could look like: > > select * from remote('select *from table', 'user=postgres host=outland > db=remote'); This isn't possible yet. I was working on implementation of this, about 80% done, but never finished. Now I'm out of time to work more on this for a while. :( Let me know if you want my code. -alex
Hey this looks really cool. It looks like something I was thinking about doing. I have a couple suggestions that could make it a little better, I hope you will not be offended. (If you want my help, I'll chip in!) Why not use a binary cursor? That way native types can slip through without the overhead of conversion. Right now you get all rows up front, you may be able to increase overall performance by fetching only a few rows at a time, rather than get everything all at once. (Think on the order of 4 million rows from your remote query!) Execute the commit at the end of processing. There are even some asynchronous functions you may be able to utilize to reduce the I/O bottleneck. Use the synchronous function first, then before you return initiate an asynchronous read. Every successive pass through the function, read the newly arrived tuple, and initiate the next asynchronous read. (The two machine could be processing the query simultaneously, and this could even IMPROVE performance over a single system for heavy duty queries.) Setup a hash table for field names, rather than requiring field numbers. (Keep field number for efficiency, of course.) You could eliminate having to pass the result pointer around by keeping a static array in your extension. Use something like Oracle's "contains" notation of result number. Where each instantiation of "contains()" and "score()" require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have some code that does this for a PostgreSQL extension (it implements contains) on my website (pgcontains, under download). It is ugly but works for the most part. Seriously, your stuff looks great. I think it could be the beginning of a fairly usable system for my company. Any help you need/want, just let me know. Joe Conway wrote: > > mlw wrote: > > > I too find the internals of PostgreSQL virtually incomprehensible at the > > internal level. If there were a document somewhere which published how a > > function could return multiple tuples, remote views would be a trivial > > undertaking. It could look like: > > > > select * from remote('select *from table', 'user=postgres host=outland > > db=remote'); > > > > See contrib/dblink in the 7.2 beta. It was my attempt inspired by > Oracle's dblink and some code that you (mlw) posted a while back. Based > on the limitations wrt returning muliple tuples, I wound up with > something like: > > lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select > dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres > password=postgres','select proname from pg_proc') as dblink_p) as t1; > > Which, as has been pointed out more than once, is pretty ugly, but at > least it's a start ;-) > > Joe > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
On Fri, 7 Dec 2001, mlw wrote: > > You could eliminate having to pass the result pointer around by keeping a > static array in your extension. Use something like Oracle's "contains" notation > of result number. Where each instantiation of "contains()" and "score()" > require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have > some code that does this for a PostgreSQL extension (it implements contains) on > my website (pgcontains, under download). It is ugly but works for the most > part. contrib/intarray does this job very well > > Seriously, your stuff looks great. I think it could be the beginning of a > fairly usable system for my company. Any help you need/want, just let me know. > > > Joe Conway wrote: > > > > mlw wrote: > > > > > I too find the internals of PostgreSQL virtually incomprehensible at the > > > internal level. If there were a document somewhere which published how a > > > function could return multiple tuples, remote views would be a trivial > > > undertaking. It could look like: > > > > > > select * from remote('select *from table', 'user=postgres host=outland > > > db=remote'); > > > > > > > See contrib/dblink in the 7.2 beta. It was my attempt inspired by > > Oracle's dblink and some code that you (mlw) posted a while back. Based > > on the limitations wrt returning muliple tuples, I wound up with > > something like: > > > > lt_lcat=# select dblink_tok(t1.dblink_p,0) as f1 from (select > > dblink('hostaddr=127.0.0.1 dbname=template1 user=postgres > > password=postgres','select proname from pg_proc') as dblink_p) as t1; > > > > Which, as has been pointed out more than once, is pretty ugly, but at > > least it's a start ;-) > > > > Joe > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
mlw wrote: > Hey this looks really cool. It looks like something I was thinking about doing. > I have a couple suggestions that could make it a little better, I hope you will > not be offended. (If you want my help, I'll chip in!) > Thanks! Suggestions welcomed. > Why not use a binary cursor? That way native types can slip through without the > overhead of conversion. > I wasn't sure that would work. Would you create dblink_tok as returning opaque then? > Right now you get all rows up front, you may be able to increase overall > performance by fetching only a few rows at a time, rather than get everything > all at once. (Think on the order of 4 million rows from your remote query!) > Execute the commit at the end of processing. There are even some asynchronous > functions you may be able to utilize to reduce the I/O bottleneck. Use the > synchronous function first, then before you return initiate an asynchronous > read. Every successive pass through the function, read the newly arrived tuple, > and initiate the next asynchronous read. (The two machine could be processing > the query simultaneously, and this could even IMPROVE performance over a single > system for heavy duty queries.) Interesting . . . but aren't there some issues with the asynch functions? > > Setup a hash table for field names, rather than requiring field numbers. (Keep > field number for efficiency, of course.) > > You could eliminate having to pass the result pointer around by keeping a > static array in your extension. Use something like Oracle's "contains" notation > of result number. Where each instantiation of "contains()" and "score()" > require an id. i.e. 1,2,3,40 etc. Then hash those numbers into an array. I have > some code that does this for a PostgreSQL extension (it implements contains) on > my website (pgcontains, under download). It is ugly but works for the most > part. > I thought about the static array, but I'm not familiar with Oracle contains() and score() -- I'm only fluent enough with Oracle to be dangerous. Guess I'll have to dig out the books . . . > Seriously, your stuff looks great. I think it could be the beginning of a > fairly usable system for my company. Any help you need/want, just let me know. > I am planning to improve dblink during the next release cycle, so I'll keep all this in mind (and might take you up on the help offer too!). I was hoping we'd have functions returning tuples by now, which would improve this extension dramatically. Unfortunately, it sounds like Alex won't have time to finish that even for 7.3 :( Alex, can we get a look at your latest code? Is it any different the your last submission to PATCHES? Joe
The dblink code is a very cool idea. It got me thinking, what if, just thinking out load here, it was redesigned as something a little more grandeous. Imagine this: select dblink('select * from table', 'table_name', 'db=oracle.test user=chris passwd=knight', 1) as t1, dblink('table2_name', 1) as t2 Just something to think about. The first instance of dblink would take 4 parameters: query, table which it returns, connect string, and link token. The second instance of dblink would just take the name of the table which it returns and a link token. The cool bit is the notion that the query string could specify different databases or even .DBF libraries. Just something to think about. It would REALLY be great if functions could return multiple tuples!