Re: Performance costs of various PL languages - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Re: Performance costs of various PL languages
Date
Msg-id 1C5807A6935044BEB4E24BCD0201B350@CAPRICA
Whole thread Raw
In response to Re: Performance costs of various PL languages  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
Thanks guys.

Ah, Pl/java - of course. I would miss writing the code right in the SQL
script, but that would have been true of C as well.

None of these procedures really qualify as stored procs that move data;
rather they are scalar functions used for fuzzy string comparisons based on
our own domain logic - imagine something like,

   SELECT *
   FROM fathers AS f, sons AS s
   WHERE same_name(f.last_name, s.last_name)

... and same_name had business logic that corrected for O'reilly vs oreilly,
Van De Lay vs Vandelay, etc.

The point is that as we learn about the domain, we would add the rules into
the function same_name() so that all apps would benefit from the new rules.

Some of the functions are data-driven, for example a table of common
abbreviations with regex or LIKE expressions that would be run against both
strings so that each string is reduced to common abbreviations (i.e. lowest
common denominator) then compared, e.g.

   SELECT *
   FROM companies AS c
   WHERE same_business_name(s, 'ACME Business Supplies, Incorporated')

Would reduce both parameters down to the most common abbreviation and then
compare again with fuzzy logic.

Of course, even if this was written in C, the function would be data-bound
as it read from the abbreviation table - unless you guys tell that there is
a not inconsiderable cost involved in type conversion from PG to internal
vars.

Carlo


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: December 27, 2011 5:54 PM
To: Pavel Stehule
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance costs of various PL languages

On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
> Hello
>
> 2011/12/27 Carlo Stonebanks <stonec.register@sympatico.ca>:
>> We are currently using pltclu as our PL of choice AFTER plpgSql.
>>
>> I'd like to know if anyone can comment on the performance costs of the
>> various PL languages BESIDES C. For example, does pltclu instantiate
faster
>> than pltcl (presumably because it uses a shared interpreter?) Is Perl
more
>> lightweight?
>>
>> I know that everything depends on context - what you are doing with it,
e.g.
>> choose Tcl for string handling vs. Perl for number crunching - but for
those
>> who know about this, is there a clear performance advantage for any of
the
>> various PL languages - and if so, is it a difference so big to be worth
>> switching?
>>
>> I ask this because I had expected to see pl/pgsql as a clear winner in
terms
>> of performance over pltclu, but my initial test showed the opposite. I
know
>> this may be an apples vs oranges problem and I will test further, but if
>> anyone has any advice or insight, I would appreciate it so I can tailor
my
>> tests accordingly.
>>
>
> A performance strongly depends on use case.
>
> PL/pgSQL has fast start but any expression is evaluated as simple SQL
> expression - and some repeated operation should be very expensive -
> array update, string update. PL/pgSQL is best as SQL glue. Positive to
> performance is type compatibility between plpgsql and Postgres.
> Interpret plpgsql is very simply - there are +/- zero optimizations -
> plpgsql code should be minimalistic, but when you don't do some really
> wrong, then a speed is comparable with PHP.
>
>
http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.
2FpgSQL_language
>
> PL/Perl has slower start - but string or array operations are very
> fast. Perl has own expression evaluator - faster than expression
> evaluation in plpgsql. On second hand - any input must be transformed
> from postgres format to perl format and any result must be transformed
> too. Perl and other languages doesn't use data type compatible with
> Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.

merlin


pgsql-performance by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Performance costs of various PL languages
Next
From: Tom Lane
Date:
Subject: Re: Subquery flattening causing sequential scan