Question about PGSQL functions - Mailing list pgsql-performance

From Steve
Subject Question about PGSQL functions
Date
Msg-id Pine.GSO.4.64.0703081715190.11576@kittyhawk.tanabi.org
Whole thread Raw
In response to Re: compact flash disks?  ("Merlin Moncure" <mmoncure@gmail.com>)
Responses Re: Question about PGSQL functions
List pgsql-performance
Hey there :)

I'm re-writing a summarization process that used to be very 'back and
forth' intensive (i.e. very chatty between my summarization software and
the DB).  I'm trying to reduce that by practically reducing all this back
and forth to a single SQL query, but in order to support certain
complexities it looks to me like I'm going to have to write some postgres
C language functions.

This is something I'm actually familiar with and have done before, but let
me describe what I'm trying to do here so I can be sure that this is the
right thing to do, and to be sure I do it correctly and don't cause memory
leaks :)


---

I have two columns, one called "procedure_code" and the other called
"wrong_procedure_code" in my summary table.  These are both of type
varchar(32) I believe or possibly text -- if it matters I can double check
but because all the involved columns are the same type and size it
shouldn't matter. :)

These are actually populated by the "procedure_code" and
"corrected_procedure_code" in the source table.  The logic is, basically:

IF  strlen(source.corrected_procedure_code)
THEN:
   summary.procedure_code=source.corrected_procedure_code
   summary.wrong_procedure_code=source.procedure_code
ELSE:
   summary.procedure_code=source.procedure_code
   summary.wrong_procedure_code=NULL


Simple, right?  Making a C function to handle this should be no sweat --
I would basically split this logic into two separate functions, one to
populate summary.procedure_code and one to populate
summary.wrong_procedure_code, and it removes the need of having any sort
of back and forth between the program and DB... I can just do like:

update summary_table
  set procedure_code=pickCorrect(source.procedure_code,
                                  source.corrected_procedure_code),
      wrong_procedure_code=pickWrong(source.procedure_code,
                                  source.corrected_procedure_code),....
  from source where summary_table.source_id=source.source_id;


Make sense?  So question 1, is this the good way to do all this?


Question 2: Assuming it is the good way to do all this, would this
function be correct assuming I did all the other stuff right (like
PG_FUNCTION_INFO_V1, etc.):

Datum pickCorrect(PG_FUNCTION_ARGS){
     text*    procedure_code=PG_GETARG_TEXT_P(0);
     text*    corrected_code=PG_GETARG_TEXT_P(1);

     if(VARSIZE(corrected_code)-VARHDRSZ){
         PG_RETURN_TEXT_P(corrected_code);
     }else{
         PG_RETURN_TEXT_P(procedure_code);
     }
}

Would that simply work because I'm not actually modifying the data, or
would I have to pmalloc a separate chunk of memory, copy the data, and
return the newly allocated memory because the memory allocated for the
args "goes away" or gets corrupted or something otherwise?



Thanks a lot for the info!


Steve

pgsql-performance by date:

Previous
From: "Craig A. James"
Date:
Subject: Re: strange performance regression between 7.4 and 8.1
Next
From: Heikki Linnakangas
Date:
Subject: Re: Question about PGSQL functions