Re: Performace Optimization for Dummies - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Re: Performace Optimization for Dummies
Date
Msg-id efhag4$1n9d$1@news.hub.org
Whole thread Raw
In response to Performace Optimization for Dummies  ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
Responses Re: Performace Optimization for Dummies  ("Merlin Moncure" <mmoncure@gmail.com>)
List pgsql-performance
The deduplication process requires so many programmed procedures that it
runs on the client. Most of the de-dupe lookups are not "straight" lookups,
but calculated ones emplying fuzzy logic. This is because we cannot dictate
the format of our input data and must deduplicate with what we get.

This was one of the reasons why I went with PostgreSQL in the first place,
because of the server-side programming options. However, I saw incredible
performance hits when running processes on the server and I partially
abandoned the idea (some custom-buiilt name-comparison functions still run
on the server).

I am using Tcl on both the server and the client. I'm not a fan of Tcl, but
it appears to be quite well implemented and feature-rich in PostgreSQL. I
find PL/pgsql awkward - even compared to Tcl. (After all, I'm just a
programmer...  we do tend to be a little limited.)

The import program actually runs on the server box as a db client and
involves about 3000 lines of code (and it will certainly grow steadily as we
add compatability with more import formats). Could a process involving that
much logic run on the db server, and would there really be a benefit?

Carlo


""Jim C. Nasby"" <jim@nasby.net> wrote in message
news:20060928184538.GV34238@nasby.net...
> On Thu, Sep 28, 2006 at 01:53:22PM -0400, Carlo Stonebanks wrote:
>> > are you using the 'copy' interface?
>>
>> Straightforward inserts - the import data has to transformed, normalised
>> and
>> de-duped by the import program. I imagine the copy interface is for more
>> straightforward data importing. These are - buy necessity - single row
>> inserts.
>
> BTW, stuff like de-duping is something you really want the database -
> not an external program - to be doing. Think about loading the data into
> a temporary table and then working on it from there.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



pgsql-performance by date:

Previous
From: Cedric Boudin
Date:
Subject: archive wal's failure and load increase.
Next
From: Andrew Sullivan
Date:
Subject: Re: slow queue-like empty table