Re: copy vs. C function - Mailing list pgsql-performance

From Jon Nelson
Subject Re: copy vs. C function
Date
Msg-id CAKuK5J3VsY-1_4wzRZiYR_ExWVGhnMHYmkBZBxnvBxkMfqsL5w@mail.gmail.com
Whole thread Raw
In response to Re: copy vs. C function  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: copy vs. C function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Wed, Dec 14, 2011 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> The only thing I have left are these statements:
>
>> get_call_result_type
>> TupleDescGetAttInMetadata
>> BuildTupleFromCStrings
>> HeapTupleGetDatum
>> and finally PG_RETURN_DATUM
>
>> It turns out that:
>> get_call_result_type adds 43 seconds [total: 54],
>> TupleDescGetAttInMetadata adds 19 seconds [total: 73],
>> BuildTypleFromCStrings accounts for 43 seconds [total: 116].
>
>> So those three functions account for 90% of the total time spent.
>> What alternatives exist? Do I have to call get_call_result_type /every
>> time/ through the function?
>
> Well, if you're concerned about performance then I think you're going
> about this in entirely the wrong way, because as far as I can tell from
> this you're converting all the field values to text and back again.
> You should be trying to keep the values in Datum format and then
> invoking heap_form_tuple.  And yeah, you probably could cache the
> type information across calls.

The parsing/conversion (except BuildTupleFromCStrings) is only a small
fraction of the overall time spent in the function and could probably
be made slightly faster. It's the overhead that's killing me.

Remember: I'm not converting multiple field values to text and back
again, I'm turning a *single* TEXT into 8 columns of varying types
(INET, INTEGER, and one INTEGER array, among others).  I'll re-write
the code to use Tuples but given that 53% of the time is spent in just
two functions (the two I'd like to cache) I'm not sure how much of a
gain it's likely to be.

Regarding caching, I tried caching it across calls by making the
TupleDesc static and only initializing it once.
When I tried that, I got:

ERROR:  number of columns (6769856) exceeds limit (1664)

I tried to find some documentation or examples that cache the
information, but couldn't find any.

--
Jon

pgsql-performance by date:

Previous
From: Kevin Martyn
Date:
Subject: Re: copy vs. C function
Next
From: idc danny
Date:
Subject: Re: copy vs. C function