Thread: Clean way to insert records if they don't exist, update if they do

Clean way to insert records if they don't exist, update if they do

From
Mike Christensen
Date:
Here's my situation.  I have a table with a bunch of URLs and crawl
dates associated with them.  When my program processes a URL, I want
to INSERT a new row with a crawl date.  If the URL already exists, I
want to update the crawl date to the current datetime.  With MS SQL or
Oracle I'd probably use a MERGE command for this.  With mySQL I'd
probably use the ON DUPLICATE KEY UPDATE syntax.

I could do multiple queries in my program, which may or may not be
thread safe.  I could write a SQL function which has various IF...ELSE
logic.  However, for the sake of trying out Postgres features I've
never used before, I'm thinking about creating an INSERT rule -
something like this:

CREATE RULE Pages_Upsert AS ON INSERT TO Pages
   WHERE EXISTS (SELECT 1 from Pages P where NEW.Url = P.Url)
   DO INSTEAD
      UPDATE Pages SET LastCrawled = NOW(), Html = NEW.Html WHERE Url = NEW.Url;

This seems to actually work great.  It probably loses some points on
the "code readability" standpoint, as someone looking at my code for
the first time would have to magically know about this rule, but I
guess that could be solved with good code commenting and
documentation.

Are there any other drawbacks to this idea, or maybe a "your idea
sucks, you should do it /this/ way instead" comment?  I'm on PG 9.0 if
that matters.  BTW, add my name to the long list of people who would
love to see UPSERT and/or MERGE commands in the next version of PG.

Mike

Re: Clean way to insert records if they don't exist, update if they do

From
Diego Augusto Molina
Date:
Hi, first of all, I still haven't tried PG further that 8.4

2011/9/18, Mike Christensen <mike@kitchenpc.com>:
> CREATE RULE Pages_Upsert AS ON INSERT TO Pages
>    WHERE EXISTS (SELECT 1 from Pages P where NEW.Url = P.Url)
>    DO INSTEAD
>       UPDATE Pages SET LastCrawled = NOW(), Html = NEW.Html WHERE Url =
> NEW.Url;
>
> This seems to actually work great.  It probably loses some points on
> the "code readability" standpoint, as someone looking at my code for
> the first time would have to magically know about this rule, but I
> guess that could be solved with good code commenting and
> documentation.

I was in a situation like yours and did exactly what you are saying.
The insert rule and the documentation (after a week I forgot about the
matter and was surprise by the resolution I had taken when I reviewed
the comment :).

> Are there any other drawbacks to this idea, or maybe a "your idea
> sucks, you should do it /this/ way instead" comment?  I'm on PG 9.0 if
> that matters.  BTW, add my name to the long list of people who would
> love to see UPSERT and/or MERGE commands in the next version of PG.

I used that code for quiet a long time with no drawbacks; it worked
perfectly fine for me (tested it nice with a heavy load, in production
things were more quiet).
What's more, I find this approach very appropiate from the
"programming" point of view. Correct me if I'm wrong.

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html