Thread: Re: [INTERFACES] remote database queries

Re: [INTERFACES] remote database queries

From
Tom Lane
Date:
[ Redirected away from the entirely inappropriate pgsql-interfaces list ]

"Joe Conway" <joe.conway@mail.com> writes:
> Inspired by others who have recently gotten PostgreSQL functions to return
> sets, I set out to create my own. I have on more than one occasion wished I
> could run a query across databases or servers (similar to a dblink in Oracle
> or a linked db in MSSQL). Attached is a C file which is my attempt.

> select
>   dblink_tok(t1.f1,'~',0)::int as vl_id
>  ,dblink_tok(t1.f1,'~',1)::text as vl_guid
>  ,dblink_tok(t1.f1,'~',2)::text as vl_pri_email
>  ,dblink_tok(t1.f1,'~',3)::text as vl_acct_pass_phrase
>  ,dblink_tok(t1.f1,'~',4)::text as vl_email_relname
>  ,dblink_tok(t1.f1,'~',5)::text as vl_hwsn_relname
>  ,dblink_tok(t1.f1,'~',6)::timestamp as vl_mod_dt
>  ,dblink_tok(t1.f1,'~',7)::int as vl_status
> from
>  (select dblink('host=192.168.5.150 port=5432 dbname=vsreg_001 user=postgres
> password=postgres','select * from vs_lkup','~') as f1) as t1

> By doing "create view vs_lkup_rm as . . ." with the above query, from a
> database on another server, I can then write:
> "select * from vs_lkup" and get results just as if I were on 192.168.5.150
> (sort of -- see problem below).

> I have one question, and one problem regarding this.

> First the question: is there any way to get the dblink function to return
> setof composite -- i.e. return tuples instead of scalar values?

It could return a tuple, but there are notational problems that would
make it difficult to do anything useful with the tuple; in particular,
AFAICS you couldn't retrieve more than one column out of it, so there's
no point.

Until we fix that (maybe for 7.2, maybe not) your existing hack is
probably pretty reasonable.  You could save some cycles by avoiding
conversion to text, though --- instead return an opaque datum that is
pointer-to-tuple-slot and let the dblink_tok function extract fields
from the tuple.  Look at SQL function support and the FieldSelect
expression node type for inspiration.

> Now the problem: as I stated above, "select * from vs_lkup" returns results
> just as if I were on 192.168.5.150 -- but if I try "select * from vs_lkup
> WHERE vl_id = 1" or "select * from vs_lkup WHERE vl_pri_email in
> ('email1@foo.com')" I get the following error message: "ERROR:  Set-valued
> function called in context that cannot accept a set". Any ideas how to work
> around this?

I think this would work if the planner weren't so enthusiastic about
trying to collapse the sub-SELECT query together with the main query.
Unfortunately it doesn't check to see if any set-valued functions are
involved before it collapses 'em --- leaving you with a set-valued
function call in the WHERE clause.  Not sure if this is worth fixing,
considering that using set-valued functions in this way is just a
band-aid that doesn't have a long life expectancy.

If you just need a hack with a short life expectancy, here's a hack that
I recommend not reading right before dinner ... might make you lose your
appetite ;-).  Build the view as a dummy UNION:create view vs_lkup_rm as    select ... from (select dblink(...))
unionall    select null, null, ... where false;
 
Done this way, the UNION won't change the view's results --- but it will
prevent the (current version of the) planner from collapsing the view
together with the surrounding query.  For example:

regression=# create function dblink() returns setof int as
regression-# 'select f1 from int4_tbl' language 'sql';
CREATE
regression=# select dblink(); ?column?
-------------          0     123456    -123456 2147483647-2147483647
(5 rows)

regression=# create view vv1 as
regression-# select f1 from (select dblink() as f1) t1;
CREATE
regression=# select * from vv1 where f1 > 0;
ERROR:  Set-valued function called in context that cannot accept a set
regression=# create view vv2 as
regression-# select f1 from (select dblink() as f1) t1
regression-# union all
regression-# select null where false;
CREATE
regression=# select * from vv2;    f1
-------------          0     123456    -123456 2147483647-2147483647
(5 rows)

regression=# select * from vv2 where f1 > 0;    f1
------------    1234562147483647
(2 rows)

        regards, tom lane


Re: Re: [INTERFACES] remote database queries

From
"Joe Conway"
Date:
> [ Redirected away from the entirely inappropriate pgsql-interfaces list ]

oops . . . sorry!

> Until we fix that (maybe for 7.2, maybe not) your existing hack is
> probably pretty reasonable.  You could save some cycles by avoiding
> conversion to text, though --- instead return an opaque datum that is
> pointer-to-tuple-slot and let the dblink_tok function extract fields
> from the tuple.  Look at SQL function support and the FieldSelect
> expression node type for inspiration.

Thanks -- I'll take a look.


> I think this would work if the planner weren't so enthusiastic about
> trying to collapse the sub-SELECT query together with the main query.
> Unfortunately it doesn't check to see if any set-valued functions are
> involved before it collapses 'em --- leaving you with a set-valued
> function call in the WHERE clause.  Not sure if this is worth fixing,
> considering that using set-valued functions in this way is just a
> band-aid that doesn't have a long life expectancy.

I'd certainly bow to your wisdom in this area, but I was thinking it would
be useful (at times) to force a FROM clause sub-select to be treated as if
it were a "real" table (probably not the best way to express this, but
hopefully you get the idea). In MSSQL I've found many situations where
putting intermediate results into a temp table, and then joining to it, is
significantly faster than letting the optimizer do it's best. But the fact
that MSSQL will return record sets from a stored procedure help makes this
tolerable/manageable -- i.e. the whole ugly mess can be rolled into one nice
neat strored procedure call. If the FROM clause sub-select could be treated,
in a sense, like an on-the-fly temp table in PostgreSQL, a similar result is
possible. And if the whole ugly mess is rolled up behind a view, no one has
to know except the especially curious ;-)

>
> If you just need a hack with a short life expectancy, here's a hack that
> I recommend not reading right before dinner ... might make you lose your
> appetite ;-).  Build the view as a dummy UNION:

*Very* few things make me lose my appetite -- and this worked perfectly!
Thank you.

-- Joe



remote database queries

From
"Joe Conway"
Date:
> Until we fix that (maybe for 7.2, maybe not) your existing hack is
> probably pretty reasonable.  You could save some cycles by avoiding
> conversion to text, though --- instead return an opaque datum that is
> pointer-to-tuple-slot and let the dblink_tok function extract fields
> from the tuple.  Look at SQL function support and the FieldSelect
> expression node type for inspiration.
>

I changed the dblink() function to return a pointer instead of concatenated
text, and dblink_tok() to use the pointer. FWIW, a query on a small (85
tuples) remote (a second PC on a 100baseT subnet) table takes about 34
milliseconds (based on show_query_stats) versus about 4 milliseconds when
run locally. It actually takes a bit longer (~65 milliseconds) when run
against a second database on the same PC. The original text parsing version
was about 25% slower.

Although shifting from text parsing to pointer passing is more efficient, I
have one more question regarding this -- for now ;) -- is there any way to
check the pointer passed to dblink_tok() to be sure it came from dblink()?

Thanks,

-- Joe


Re: remote database queries

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

> > Until we fix that (maybe for 7.2, maybe not) your existing hack is
> > probably pretty reasonable.  You could save some cycles by avoiding
> > conversion to text, though --- instead return an opaque datum that is
> > pointer-to-tuple-slot and let the dblink_tok function extract fields
> > from the tuple.  Look at SQL function support and the FieldSelect
> > expression node type for inspiration.
> >
> 
> I changed the dblink() function to return a pointer instead of concatenated
> text, and dblink_tok() to use the pointer. FWIW, a query on a small (85
> tuples) remote (a second PC on a 100baseT subnet) table takes about 34
> milliseconds (based on show_query_stats) versus about 4 milliseconds when
> run locally. It actually takes a bit longer (~65 milliseconds) when run
> against a second database on the same PC. The original text parsing version
> was about 25% slower.
> 
> Although shifting from text parsing to pointer passing is more efficient, I
> have one more question regarding this -- for now ;) -- is there any way to
> check the pointer passed to dblink_tok() to be sure it came from dblink()?
> 
> Thanks,
> 
> -- Joe
> 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: remote database queries

From
Bruce Momjian
Date:
Added to /contrib.  Thanks.

> > Until we fix that (maybe for 7.2, maybe not) your existing hack is
> > probably pretty reasonable.  You could save some cycles by avoiding
> > conversion to text, though --- instead return an opaque datum that is
> > pointer-to-tuple-slot and let the dblink_tok function extract fields
> > from the tuple.  Look at SQL function support and the FieldSelect
> > expression node type for inspiration.
> >
> 
> I changed the dblink() function to return a pointer instead of concatenated
> text, and dblink_tok() to use the pointer. FWIW, a query on a small (85
> tuples) remote (a second PC on a 100baseT subnet) table takes about 34
> milliseconds (based on show_query_stats) versus about 4 milliseconds when
> run locally. It actually takes a bit longer (~65 milliseconds) when run
> against a second database on the same PC. The original text parsing version
> was about 25% slower.
> 
> Although shifting from text parsing to pointer passing is more efficient, I
> have one more question regarding this -- for now ;) -- is there any way to
> check the pointer passed to dblink_tok() to be sure it came from dblink()?
> 
> Thanks,
> 
> -- Joe
> 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026