Re: Key/Value reference table generation: INSERT/UPDATE performance - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Key/Value reference table generation: INSERT/UPDATE performance
Date
Msg-id 4652B627.1070109@archonet.com
Whole thread Raw
In response to Key/Value reference table generation: INSERT/UPDATE performance  (valgog <valgog@gmail.com>)
Responses Re: Key/Value reference table generation: INSERT/UPDATE performance
List pgsql-performance
valgog wrote:
> I found several post about INSERT/UPDATE performance in this group,
> but actually it was not really what I am searching an answer for...
>
> I have a simple reference table WORD_COUNTS that contains the count of
> words that appear in a word array storage in another table.

I think this is the root of your problem, I'm afraid. You're trying to
count individual words when you're storing an array of words. I don't
think any of the Gist/GIN indexes will help you with this either.

However, since "you don't want to start from here" isn't very useful
here and now:

1. See what the performance (explain analyse) of the "select
distinct...generate_series()" statement is. I think you're right and
it's going to be slow.
2. You're looping through each row of word_storage and counting
separately. Write it as one query if possible.
3. As Peter says, don't insert then update, start with an empty table
and just insert totals for the lot (see #2).

I'd probably write the query in plperl/python or something else that
supports hash/dictionary structures. Then just process the whole
word_storage into the hash - assuming you only have a few thousand
distinct words that shouldn't take up too much memory.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "Peter Childs"
Date:
Subject: Re: Key/Value reference table generation: INSERT/UPDATE performance
Next
From: "Joost Kraaijeveld"
Date:
Subject: is file size relevant in choosing index or table scan?