Re: proposal: row_to_array function - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: row_to_array function
Date
Msg-id CAFj8pRA8Z76_gSN39UBHuH3sQssE+3x-N8Rhgjvd_9wEzYgfng@mail.gmail.com
Whole thread Raw
In response to Re: proposal: row_to_array function  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: proposal: row_to_array function  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
Hi

2015-06-22 5:18 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 2 April 2015 at 01:59, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> here is rebased patch.
>>> It contains both patches - row_to_array function and foreach array support.
>>
>> While I don't have a problem with hstore_to_array, I don't think that
>> row_to_array is a very good idea; it's basically encouraging people to
>> throw away SQL datatypes altogether and imagine that everything is text.
>> They've already bought into that concept if they are using hstore or
>> json, so smashing elements of those containers to text is not a problem.
>> But that doesn't make this version a good thing.
>>
>> (In any case, those who insist can get there through row_to_json, no?)
>
> You have a point.  What does attached do that to_json does not do
> besides completely discard type information?  Our json api is pretty
> rich and getting richer.  For better or ill, we dumped all json
> support into the already stupendously bloated public namespace and so
> it's always available.


I can see plenty of utility for a function like Pavel speaks of, but
I'd personally rather see it as a function that returns table (colname
name, coltype regtype, coltypmod integer, coltextvalue text,
colordinal integer) so it can carry more complete information and
there's no need to worry about foreach(array). The main use of a
function that includes text representations of the values would IMO be
using it from plain SQL, rather than PL/PgSQL, when faced with
anonymous records.

I'd find it more useful to have lvalue-expressions for dynamic access
to record fields and a function to get record metadata - field names,
types and typmods. Some kind of "pg_get_record_info(record) returns
table(fieldname text, fieldtype regtype, fieldtypmod integer)" and a
PL/PgSQL lvalue-expression for record field access like
"RECORD_FIELD(therecord, fieldname)". I would _certainly_ want to be
able to get the type metadata without the values.

That way you could interact natively with the fields in their true
types, without forcing conversion into and out of 'text', which is a
known performance pain-point with PL/PgSQL. (PL/PgSQL doesn't have a
VARIANT type or support for using 'anyelement', which would be the
other way to solve the type flattening problem IMO).

Think:

DECLARE
    myrow record;
    fi record;
BEGIN
    EXECUTE user_supplied_dynamic_query INTO myrow;
    FOR fi IN
        SELECT fieldname, fieldtype, fieldtypmod
        FROM pg_get_record_info(myrow)
    LOOP
        IF fi.fieldtype == 'int4'::regtype THEN
            RECORD_FIELD(myrow, fi.fieldname) := RECORD_FIELD(myrow,
fi.fieldname) + 1;
        END IF;
    END LOOP;
END;

I am thinking so this is separate task, that should not be solved simply too. I wrote a set functions for working with record (https://github.com/okbob/pltoolbox/blob/master/record.c). But it doesn't solve the basic issues:

1. speed - FOR IN SELECT FROM is more expensive then just unpacking row or record
2. unclean game with creating more code path for any special type.

I have little bit different idea. FOR IN RECORD can change type of any automatic variable in any iteration. Internally we can do more code paths - so your code can be rewritten to

FOREACH key, val IN RECORD myrow
LOOP
  IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN
    val := val + 1; -- these variables can be mutable
    -- or maybe in futore
   myrow[key] := val + 1;
  END IF;
END LOOP;

What is important - "val" is automatic variable, and it can has different type in any step.

It is little bit strange, but impossible to solve, so we cannot to support row[var] as right value (without immutable casting). But we can do it with left value.





OK, so it's a stupid example - increment all int4 fields by one. It
conveys the rough idea though - native use of the field types.

Note that RECORD_FIELD is distinct from the existing support for

    EXECUTE format('SELECT $1.%I', fieldname) USING therecord;

in that that approach doesn't work for all ways that a record can be
produced, it's slow, it doesn't have a good way to enumerate field
names, and there's no equivalent to write to the field. Current
approaches for that are ghastly:
http://stackoverflow.com/q/7711432/398670 .






--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing