Re: dblink: add polymorphic functions. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: dblink: add polymorphic functions.
Date
Msg-id CADkLM=c6Wip4-g5jnRxhhXtq0XCC4tNwO0oRJBMoYKzbzKMOWw@mail.gmail.com
Whole thread Raw
In response to Re: dblink: add polymorphic functions.  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers


On Mon, Jul 6, 2015 at 3:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
> On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway <mail@joeconway.com> wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > On 07/06/2015 07:37 AM, Merlin Moncure wrote:
>> >> yup, and at least one case now fails where previously it ran
>> >> through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
>> >> function dblink(unknown, unknown, unknown) is not unique
>> >
>> > Hmm, that is an issue, possibly a fatal one.
>> >
>> > When Cory first mentioned this to me over a year ago we discussed some
>> > other, arguably better and more generic solutions. One was to build
>> > support for:
>> >
>> >   SELECT * FROM srf() AS TYPE OF(foo);
>> >
>> > The second idea I think is actually SQL standard if I recall correctly:
>> >
>> >   SELECT * FROM CAST(srf() AS foo) x;
>> >
>> > Currently this works:
>> >
>> > 8<--------------------
>> > select *
>> > from cast (row(11,'l','{a11,b11,c11}') as foo);
>> >  f1 | f2 |      f3
>> > - ----+----+---------------
>> >  11 | l  | {a11,b11,c11}
>> > (1 row)
>> > 8<--------------------
>> >
>> > But this fails:
>> >
>> > 8<--------------------
>> > select *
>> > from cast
>> > (dblink('dbname=contrib_regression','select * from foo') as foo);
>> > ERROR:  cannot cast type record to foo
>> > 8<--------------------
>> >
>> > Comments in the source have this to say:
>> >
>> > 8<--------------------
>> > /*
>> >  * coerce_record_to_complex
>> >  *              Coerce a RECORD to a specific composite type.
>> >  *
>> >  * Currently we only support this for inputs that are RowExprs or
>> >  * whole-row Vars.
>> >  */
>> > 8<--------------------
>> >
>> > That explains why the first example works while the second does not.
>> > I'm not sure how hard it would be to fix that, but it appears that
>> > that is where we should focus.
>>
>> Yeah.  FWIW, here's my 0.02$:  I use dblink all the time, for all
>> kinds of reasons, vastly preferring to have control over the query
>> string (vs. FDW type solutions).  I have two basic gripes with it.  #1
>> is that remote queries are not cancelled over all call sites when
>> cancelled locally (off-topic for this thread) and #2 is that the SRF
>> record describing mechanics are not abstractable because of using
>> syntax to describe the record.  Corey's proposal, overloading issues
>> aside, appears to neatly deal with this problem because anyelement can
>> be passed down through a wrapping API.
>>
>> IOW, I'd like to do:
>> CREATE FUNCTION remote_call(...) RETURNS ... AS
>> $$
>>   SELECT dblink(...) AS r(...)
>> $$ language sql;
>>
>> ...which can't be done (discounting dynamic sql acrobatics) because of
>> the syntax based expression of the 'r' record.  So I like Corey's
>> idea...I just think the functions need to be named differently (maybe
>> to 'dblink_any', and dblink_get_result_any'?).   TBH, to do better
>> than that you'd need SQL level support for handling the return type in
>> the vein of NEW/OLD.  For fun, let's call it 'OUT'...then you could:
>>
>> SELECT * FROM remote_call(...) RETURNS SETOF foo;
>>
>> Inside remote_call, you'd see something like:
>>
>> SELECT dblink(...) AS OUT;
>>
>> As to the proposed syntax, I would vote to support the SQL standard
>> variant if it could be handled during parse.  I don't see what AS TYPE
>> OF really buys you because FWICT it does not support wrapping.
>>
>> merlin
>
>
> Your experiences with dblink are very similar to mine.
>
> The whole issue arose for me as an outcropping of my Poor Man's Parallel
> Processing extension (still not released but currently working for us in
> production internally).
>
> At some point I had to do dblink_get_result(...) as t(...) and not only did
> I have to render the structure as a string, I was going to have to execute
> that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was
> going to have to re-code in C or plv8. Overall those calls aren't terribly
> expensive (it's working in production - for us - without this dblink
> modification), but a cleaner solution would be better.

Another option is to handle the intermediate result in json if you're
willing to sacrifice a little bit of performance.  For example,
suppose you wanted to log every dblink call through an wrapper without
giving up the ability to handle arbitrary result sets:

CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS
$$
BEGIN
  RAISE WARNING 'got dblink %', _query;

  RETURN QUERY SELECT * FROM dblink(
    'host=localhost',
    format('SELECT row_to_json(q) from (%s) q', _query))
  AS R(j json);
END;
$$ LANGUAGE PLPGSQL;

postgres=# select * from dblink_log('select 0 as value');

WARNING:  got dblink select 0 as value
 dblink_log
─────────────
 {"value":0}
(1 row)

With json, you have a number of good options to convert back to a
record.  For example,

postgres=# CREATE TYPE foo AS (value int);
CREATE TYPE

SELECT p.*
FROM dblink_log('SELECT s AS value FROM generate_series(1,3) s') j
CROSS JOIN LATERAL json_populate_record(null::foo, j) p;

WARNING:  got dblink select s as value from generate_series(1,3) s
 value
───────
     1
     2
     3
(3 rows)

Note, I still support the case behind your patch, but, if it uh,
doesn't make it through, it's good to have options :-).

merlin

(again, more backstory to enhance other's understanding of the issue)

In the earlier iterations of PMPP, I had it simply wrap all queries sent to the remote server inside a 'SELECT * from row_to_json(...)'.

The serialization/deserialization was a performance hit, offset slightly by having the RETURN QUERY SELECT json_field from dblink_get_result('c1') as t(json_field json) be a static (reusable) query.

The ugliness of decomposing the fields from json was no fun, and Merlin's suggestion of  json_populate_record ( I don't remember if that function existed at the time...) would get around that, albeit with another performance hit.

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: dblink: add polymorphic functions.
Next
From: Daniele Varrazzo
Date:
Subject: Redundant error messages in policy.c