Thread: how do I update or insert efficently in postgres

how do I update or insert efficently in postgres

From
marc@oscar.eng.cv.net (Marc Spitzer)
Date:
I need to do the follwoing logic for a db I am building:

if row exists update some fields 
else insert all fields

I have come across this befor and have used select to drive the
choice, if I could get the row update else insert.  The db I worked on
had a few thousand rows so it was fast enough.  This table will have
around 1 million rows to start out with and I was wondering if there
was any way to do this better.  I am touching each row twice and would
like to get that down to once if possable.  If that is not possable
would it be better to move the whole thing inside of 1 explicit
transaction?  Any other ideas I have missed?  

Thank you 

marc


Re: how do I update or insert efficently in postgres

From
"Josh Berkus"
Date:
Maqrc,

> if row exists update some fields 
> else insert all fields
> 
> I have come across this befor and have used select to drive the
> choice, if I could get the row update else insert.  The db I worked
> on
> had a few thousand rows so it was fast enough.  This table will have
> around 1 million rows to start out with and I was wondering if there
> was any way to do this better.  I am touching each row twice and
> would
> like to get that down to once if possable.  If that is not possable
> would it be better to move the whole thing inside of 1 explicit
> transaction?  Any other ideas I have missed?  

Not really.  If you have to check for existance, that's going to be a
seperate query from the UPDATE/INSERT no matter how you cut it.  MySQL's
REPLACE probably just hides this double-check from you.  Using an
integer surrogate key which is indexed and VACUUMed regularly is about
all the performance increase you can get. Except maybe moving the table
and the WAL_FILES both to seperate disks.

If you have a significant budget for your project, there could be ways
around this ... you'd just have to hire yourself a PostgreSQL hacker.
They could theoretically write modifications to PostgreSQL that would do
the following:

Scan the index for a pointer to the existing record.
If a pointer is found, use it to do the update without a second index
scan.
If the pointer is not found, do an insert without a unique index check.

However, the above could be costly to implement and put you out of line
for further PostgreSQL upgrades.  So throwing hardware at the problem is
probably a better idea.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: how do I update or insert efficently in postgres

From
Tom Lane
Date:
marc@oscar.eng.cv.net (Marc Spitzer) writes:
> I need to do the follwoing logic for a db I am building:
> if row exists update some fields 
> else insert all fields

> I have come across this befor and have used select to drive the
> choice, if I could get the row update else insert.

Which case do you think will be more common?  If UPDATE is the more
common scenario then it's a win to do
UPDATE set modifiable-fields = whatever WHERE key = whateverif (zero rows updated)    INSERT ...

Alternatively you can do
INSERT ...if (fail due to duplicate key)    UPDATE ...

if you think INSERT is the more common case.  (This all assumes you
have a unique key for the table, but if you don't, then what do you
mean by "the row already exists"?)

Neither of these are perfect, however.  The former has a race condition
if two clients might try to insert the same key at about the same time.
You can improve it to
BEGIN;UPDATE set modifiable-fields = whatever WHERE key = whateverif (zero rows updated){    INSERT ...    if (fail due
toduplicate key)    {        ABORT;        loop back to BEGIN;    }}COMMIT;
 

but this is kinda ugly.  (Of course, if you could have two clients
independently inserting/updating the same row at the same time, you
have problems anyway: which one should win, and why?  I think the
coding difficulty may tell you you have a design problem.)

As for the INSERT-then-UPDATE approach, you have the same problem
that you have to ABORT and start a new transaction if the INSERT
fails.  This is uncool if you really want the whole thing to be part
of a larger transaction.

But as long as you've guessed right about which case is more common,
you have only one query most of the time.
        regards, tom lane


Re: how do I update or insert efficently in postgres

From
marc@oscar.eng.cv.net (Marc Spitzer)
Date:
In article <3557.1005685484@sss.pgh.pa.us>, Tom Lane wrote:
> marc@oscar.eng.cv.net (Marc Spitzer) writes:
>> I need to do the follwoing logic for a db I am building:
>> if row exists update some fields 
>> else insert all fields
> 
>> I have come across this befor and have used select to drive the
>> choice, if I could get the row update else insert.
> 
> Which case do you think will be more common?  If UPDATE is the more
> common scenario then it's a win to do
> 
>     UPDATE set modifiable-fields = whatever WHERE key = whatever
>     if (zero rows updated)
>         INSERT ...
> 
> Alternatively you can do
> 
>     INSERT ...
>     if (fail due to duplicate key)
>         UPDATE ...
> 
> if you think INSERT is the more common case.  (This all assumes you
> have a unique key for the table, but if you don't, then what do you
> mean by "the row already exists"?)
> 
> Neither of these are perfect, however.  The former has a race condition
> if two clients might try to insert the same key at about the same time.
> You can improve it to
> 
>     BEGIN;
>     UPDATE set modifiable-fields = whatever WHERE key = whatever
>     if (zero rows updated)
>     {
>         INSERT ...
>         if (fail due to duplicate key)
>         {
>             ABORT;
>             loop back to BEGIN;
>         }
>     }
>     COMMIT;
> 
> but this is kinda ugly.  (Of course, if you could have two clients
> independently inserting/updating the same row at the same time, you
> have problems anyway: which one should win, and why?  I think the
> coding difficulty may tell you you have a design problem.)
> 
> As for the INSERT-then-UPDATE approach, you have the same problem
> that you have to ABORT and start a new transaction if the INSERT
> fails.  This is uncool if you really want the whole thing to be part
> of a larger transaction.
> 
> But as long as you've guessed right about which case is more common,
> you have only one query most of the time.
> 
>             regards, tom lane
> 

after the first pass it will be mostly updates and I know I have a
flawed design, but I have only 1 batch job that loads information so
there should be no major problems there, yes I know famious last
words. And I need to get something finished quickly to start
generating reports off of it.  

Thanks 

marc



> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


Re: how do I update or insert efficently in postgres

From
"Llew Goodstadt"
Date:
What if you inserted all the records into a temporary table. Delete all
matching records (which would have been updated)in the current table
(via a join). Then copy all the entries from the temporary table into
your existing table. Delete the temp table etc.

Llew

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Marc Spitzer
> Sent: 13 November 2001 18:19
> To: pgsql-sql@postgresql.org
> Subject: [SQL] how do I update or insert efficently in postgres
> 
> 
> I need to do the follwoing logic for a db I am building:
> 
> if row exists update some fields
> else insert all fields
> 
> I have come across this befor and have used select to drive
> the choice, if I could get the row update else insert.  The 
> db I worked on had a few thousand rows so it was fast enough. 
>  This table will have around 1 million rows to start out with 
> and I was wondering if there was any way to do this better.  
> I am touching each row twice and would like to get that down 
> to once if possable.  If that is not possable would it be 
> better to move the whole thing inside of 1 explicit 
> transaction?  Any other ideas I have missed?  
> 
> Thank you
> 
> marc
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
http://www.postgresql.org/users-lounge/docs/faq.html