Thread: Fw: Performance UPDATE/INSERT
Hello, I have the following scenario: The application read one record from a file, which contains 100 000 records. The application checks on different conditions if this record is allready save in the table 'test'. If this record exists then the application manipulates record and updates the record with UPDATE in PostgreSQL. This UPDATE affects allways 50 columns and PostgreSQL uses an index. If this record doesn`t exit the application manipulate the record and makes an INSERT. Then it reads the next record from the file and so on ... In on extreme case the application makes 100 000 INSERTs, in the other extreme case 100 000 UPDATES. Normally the UPDATES are more that INSERTs. Here I notice that the first case (100 000 INSERTs) takes about 30 minues, but the 100 000 UPDATES about 1 hour. I can't prepare the file to use an COPY, because the application manipulates the records from the file in a complex way. I've also tried with vaccum to get more performance, but hat no success. Michaela > ----- Original Message ----- > From: "A. Kretschmer" <andreas.kretschmer@schollglas.com> > To: <pgsql-general@postgresql.org> > Sent: Tuesday, April 11, 2006 11:48 AM > Subject: Re: [GENERAL] Performance UPDATE/INSERT > > >> am 11.04.2006, um 11:32:55 +0200 mailte MG folgendes: >>> I can`t use COPY for INSERTs, because I have to manipulate each record >>> indiviuell. >>> >>> But the problem is the UPDATEs >> >> test=# \timing >> Timing is on. >> test=# update mira set y = 123; >> UPDATE 150000 >> Time: 1874.894 ms >> >> 150 000 Records, a simple table with 2 int-columns. >> >> Have you run vacuum? Which version? Can you tell us the explain for the >> update? >> >> >>> >>> Greetings >>> Michaela >>> >>> ----- Original Message ----- From: "A. Kretschmer" >> >> Please, no silly TOFU. >> >> >> HTH, Andreas >> -- >> Andreas Kretschmer (Kontakt: siehe Header) >> Heynitz: 035242/47215, D1: 0160/7141639 >> GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net >> === Schollglas Unternehmensgruppe === >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> >> >
The possible solution would be, that you first insert the file into a temporary table via COPY.
have a full outer join on you current table with the temporary one on the basis of the primary column(s) assuming id.
only update the column where the outer table join is not null and insert all the rows where the inner table id is null.
so you'll have select a.id,a.col,b.id,b.col from live a full outer join temp b. similar condition you can place in your where clause. eventually you'll end up in writing 2 sql's one for insert and one for update.
cool,
Luckys.
On 4/11/06, MG <pgsql-general@carladata.de> wrote:
Hello,
I have the following scenario:
The application read one record from a file, which contains 100 000 records.
The application checks on different conditions if this record is allready
save in the table 'test'.
If this record exists then the application manipulates record and updates
the record with UPDATE in PostgreSQL. This UPDATE affects allways 50
columns and PostgreSQL uses an index.
If this record doesn`t exit the application manipulate the record and makes
an INSERT.
Then it reads the next record from the file and so on ...
In on extreme case the application makes 100 000 INSERTs, in the other
extreme case 100 000 UPDATES. Normally the UPDATES are more that INSERTs.
Here I notice that the first case (100 000 INSERTs) takes about 30 minues,
but the 100 000 UPDATES about 1 hour.
I can't prepare the file to use an COPY, because the application
manipulates the records from the file in a complex way.
I've also tried with vaccum to get more performance, but hat no success.
Michaela
> ----- Original Message -----
> From: "A. Kretschmer" < andreas.kretschmer@schollglas.com>
> To: <pgsql-general@postgresql.org>
> Sent: Tuesday, April 11, 2006 11:48 AM
> Subject: Re: [GENERAL] Performance UPDATE/INSERT
>
>
>> am 11.04.2006, um 11:32:55 +0200 mailte MG folgendes:
>>> I can`t use COPY for INSERTs, because I have to manipulate each record
>>> indiviuell.
>>>
>>> But the problem is the UPDATEs
>>
>> test=# \timing
>> Timing is on.
>> test=# update mira set y = 123;
>> UPDATE 150000
>> Time: 1874.894 ms
>>
>> 150 000 Records, a simple table with 2 int-columns.
>>
>> Have you run vacuum? Which version? Can you tell us the explain for the
>> update?
>>
>>
>>>
>>> Greetings
>>> Michaela
>>>
>>> ----- Original Message ----- From: "A. Kretschmer"
>>
>> Please, no silly TOFU.
>>
>>
>> HTH, Andreas
>> --
>> Andreas Kretschmer (Kontakt: siehe Header)
>> Heynitz: 035242/47215, D1: 0160/7141639
>> GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
>> === Schollglas Unternehmensgruppe ===
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo@postgresql.org so that your
>> message can get through to the mailing list cleanly
>>
>>
>
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
On 4/11/06, MG <pgsql-general@carladata.de> wrote: > In on extreme case the application makes 100 000 INSERTs, in the other > extreme case 100 000 UPDATES. Normally the UPDATES are more that INSERTs. > Here I notice that the first case (100 000 INSERTs) takes about 30 minues, > but the 100 000 UPDATES about 1 hour. > > I can't prepare the file to use an COPY, because the application > manipulates the records from the file in a complex way. > I've also tried with vaccum to get more performance, but hat no success. You should probably look at rewriting your procedure in a pl/pgsql function. Have your app bulk load into a work table and then loop it doing your sql processing on the fly. It will be fast and is a a good exercise. Merlin