Thread: Re: [INTERFACES] remote database queries
[ Redirected away from the entirely inappropriate pgsql-interfaces list ] "Joe Conway" <joe.conway@mail.com> writes: > Inspired by others who have recently gotten PostgreSQL functions to return > sets, I set out to create my own. I have on more than one occasion wished I > could run a query across databases or servers (similar to a dblink in Oracle > or a linked db in MSSQL). Attached is a C file which is my attempt. > select > dblink_tok(t1.f1,'~',0)::int as vl_id > ,dblink_tok(t1.f1,'~',1)::text as vl_guid > ,dblink_tok(t1.f1,'~',2)::text as vl_pri_email > ,dblink_tok(t1.f1,'~',3)::text as vl_acct_pass_phrase > ,dblink_tok(t1.f1,'~',4)::text as vl_email_relname > ,dblink_tok(t1.f1,'~',5)::text as vl_hwsn_relname > ,dblink_tok(t1.f1,'~',6)::timestamp as vl_mod_dt > ,dblink_tok(t1.f1,'~',7)::int as vl_status > from > (select dblink('host=192.168.5.150 port=5432 dbname=vsreg_001 user=postgres > password=postgres','select * from vs_lkup','~') as f1) as t1 > By doing "create view vs_lkup_rm as . . ." with the above query, from a > database on another server, I can then write: > "select * from vs_lkup" and get results just as if I were on 192.168.5.150 > (sort of -- see problem below). > I have one question, and one problem regarding this. > First the question: is there any way to get the dblink function to return > setof composite -- i.e. return tuples instead of scalar values? It could return a tuple, but there are notational problems that would make it difficult to do anything useful with the tuple; in particular, AFAICS you couldn't retrieve more than one column out of it, so there's no point. Until we fix that (maybe for 7.2, maybe not) your existing hack is probably pretty reasonable. You could save some cycles by avoiding conversion to text, though --- instead return an opaque datum that is pointer-to-tuple-slot and let the dblink_tok function extract fields from the tuple. Look at SQL function support and the FieldSelect expression node type for inspiration. > Now the problem: as I stated above, "select * from vs_lkup" returns results > just as if I were on 192.168.5.150 -- but if I try "select * from vs_lkup > WHERE vl_id = 1" or "select * from vs_lkup WHERE vl_pri_email in > ('email1@foo.com')" I get the following error message: "ERROR: Set-valued > function called in context that cannot accept a set". Any ideas how to work > around this? I think this would work if the planner weren't so enthusiastic about trying to collapse the sub-SELECT query together with the main query. Unfortunately it doesn't check to see if any set-valued functions are involved before it collapses 'em --- leaving you with a set-valued function call in the WHERE clause. Not sure if this is worth fixing, considering that using set-valued functions in this way is just a band-aid that doesn't have a long life expectancy. If you just need a hack with a short life expectancy, here's a hack that I recommend not reading right before dinner ... might make you lose your appetite ;-). Build the view as a dummy UNION:create view vs_lkup_rm as select ... from (select dblink(...)) unionall select null, null, ... where false; Done this way, the UNION won't change the view's results --- but it will prevent the (current version of the) planner from collapsing the view together with the surrounding query. For example: regression=# create function dblink() returns setof int as regression-# 'select f1 from int4_tbl' language 'sql'; CREATE regression=# select dblink(); ?column? ------------- 0 123456 -123456 2147483647-2147483647 (5 rows) regression=# create view vv1 as regression-# select f1 from (select dblink() as f1) t1; CREATE regression=# select * from vv1 where f1 > 0; ERROR: Set-valued function called in context that cannot accept a set regression=# create view vv2 as regression-# select f1 from (select dblink() as f1) t1 regression-# union all regression-# select null where false; CREATE regression=# select * from vv2; f1 ------------- 0 123456 -123456 2147483647-2147483647 (5 rows) regression=# select * from vv2 where f1 > 0; f1 ------------ 1234562147483647 (2 rows) regards, tom lane
> [ Redirected away from the entirely inappropriate pgsql-interfaces list ] oops . . . sorry! > Until we fix that (maybe for 7.2, maybe not) your existing hack is > probably pretty reasonable. You could save some cycles by avoiding > conversion to text, though --- instead return an opaque datum that is > pointer-to-tuple-slot and let the dblink_tok function extract fields > from the tuple. Look at SQL function support and the FieldSelect > expression node type for inspiration. Thanks -- I'll take a look. > I think this would work if the planner weren't so enthusiastic about > trying to collapse the sub-SELECT query together with the main query. > Unfortunately it doesn't check to see if any set-valued functions are > involved before it collapses 'em --- leaving you with a set-valued > function call in the WHERE clause. Not sure if this is worth fixing, > considering that using set-valued functions in this way is just a > band-aid that doesn't have a long life expectancy. I'd certainly bow to your wisdom in this area, but I was thinking it would be useful (at times) to force a FROM clause sub-select to be treated as if it were a "real" table (probably not the best way to express this, but hopefully you get the idea). In MSSQL I've found many situations where putting intermediate results into a temp table, and then joining to it, is significantly faster than letting the optimizer do it's best. But the fact that MSSQL will return record sets from a stored procedure help makes this tolerable/manageable -- i.e. the whole ugly mess can be rolled into one nice neat strored procedure call. If the FROM clause sub-select could be treated, in a sense, like an on-the-fly temp table in PostgreSQL, a similar result is possible. And if the whole ugly mess is rolled up behind a view, no one has to know except the especially curious ;-) > > If you just need a hack with a short life expectancy, here's a hack that > I recommend not reading right before dinner ... might make you lose your > appetite ;-). Build the view as a dummy UNION: *Very* few things make me lose my appetite -- and this worked perfectly! Thank you. -- Joe
> Until we fix that (maybe for 7.2, maybe not) your existing hack is > probably pretty reasonable. You could save some cycles by avoiding > conversion to text, though --- instead return an opaque datum that is > pointer-to-tuple-slot and let the dblink_tok function extract fields > from the tuple. Look at SQL function support and the FieldSelect > expression node type for inspiration. > I changed the dblink() function to return a pointer instead of concatenated text, and dblink_tok() to use the pointer. FWIW, a query on a small (85 tuples) remote (a second PC on a 100baseT subnet) table takes about 34 milliseconds (based on show_query_stats) versus about 4 milliseconds when run locally. It actually takes a bit longer (~65 milliseconds) when run against a second database on the same PC. The original text parsing version was about 25% slower. Although shifting from text parsing to pointer passing is more efficient, I have one more question regarding this -- for now ;) -- is there any way to check the pointer passed to dblink_tok() to be sure it came from dblink()? Thanks, -- Joe
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. > > Until we fix that (maybe for 7.2, maybe not) your existing hack is > > probably pretty reasonable. You could save some cycles by avoiding > > conversion to text, though --- instead return an opaque datum that is > > pointer-to-tuple-slot and let the dblink_tok function extract fields > > from the tuple. Look at SQL function support and the FieldSelect > > expression node type for inspiration. > > > > I changed the dblink() function to return a pointer instead of concatenated > text, and dblink_tok() to use the pointer. FWIW, a query on a small (85 > tuples) remote (a second PC on a 100baseT subnet) table takes about 34 > milliseconds (based on show_query_stats) versus about 4 milliseconds when > run locally. It actually takes a bit longer (~65 milliseconds) when run > against a second database on the same PC. The original text parsing version > was about 25% slower. > > Although shifting from text parsing to pointer passing is more efficient, I > have one more question regarding this -- for now ;) -- is there any way to > check the pointer passed to dblink_tok() to be sure it came from dblink()? > > Thanks, > > -- Joe > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Added to /contrib. Thanks. > > Until we fix that (maybe for 7.2, maybe not) your existing hack is > > probably pretty reasonable. You could save some cycles by avoiding > > conversion to text, though --- instead return an opaque datum that is > > pointer-to-tuple-slot and let the dblink_tok function extract fields > > from the tuple. Look at SQL function support and the FieldSelect > > expression node type for inspiration. > > > > I changed the dblink() function to return a pointer instead of concatenated > text, and dblink_tok() to use the pointer. FWIW, a query on a small (85 > tuples) remote (a second PC on a 100baseT subnet) table takes about 34 > milliseconds (based on show_query_stats) versus about 4 milliseconds when > run locally. It actually takes a bit longer (~65 milliseconds) when run > against a second database on the same PC. The original text parsing version > was about 25% slower. > > Although shifting from text parsing to pointer passing is more efficient, I > have one more question regarding this -- for now ;) -- is there any way to > check the pointer passed to dblink_tok() to be sure it came from dblink()? > > Thanks, > > -- Joe > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026