Thread: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
APseudoUtopia
Date:
Hey list,

I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
going great. However, there's one problem I've been having trouble
solving.

I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure:

CREATE TABLE "forums_readposts" (
 "userid"    INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE,
 "threadid"    INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON
DELETE CASCADE,
 "lastpostread"   INTEGER NOT NULL CHECK ("lastpostread" >= 0),
 PRIMARY KEY ("userid", "threadid")
);

Here's the original MySQL query that I have (db_string is a php
function that escapes the string):

INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
"lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
= "lastpost";

Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:

-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$
    BEGIN
        LOOP
            -- Try to update the record
            -- This query is broken. I'm not sure how to do the subquery or
whatever I need to do. Maybe FROM? Another loop?
            -- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah
            IF found THEN
                RETURN;
            END IF;
            -- Not there, try to insert the key
            -- If someone else inserts the same key concurrently,
            -- We could get a unique-key failure
            BEGIN
                INSERT INTO "forums_readposts" ("userid", "threadid",
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;
                RETURN;
            EXCEPTION WHEN unique_violation THEN
                -- Do nothing, and loop to try the update again
            END;
        END LOOP;
    END;
$FuncTag$
LANGUAGE plpgsql;
-----------------------------

I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.

If anyone can point me in the direction to take another step in, I'd
really appreciate it.

Thanks.

APseudoUtopia wrote: <blockquote cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com"
type="cite"><prewrap="">Hey list,
 

I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure: </pre></blockquote> Wouldn't a view be better than having a table that is deleted and updated all the time. 
Iwould add a field in the user table called  last_login type timestamp  then do a select from the forums table to
generatethis table where last_login <= FormTimeStamp .<br /><br /> I don't see the point having this table when a
viewwould work better.  <br /><br /><br /><br /><blockquote
cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com"type="cite"><pre wrap="">
 
Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:

-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$BEGIN    LOOP        -- Try to update the record        -- This query is broken. I'm not sure how to do the
subqueryor
 
whatever I need to do. Maybe FROM? Another loop?        -- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah        IF found THEN            RETURN;        END IF;        -- Not
there,try to insert the key        -- If someone else inserts the same key concurrently </pre></blockquote><br
/><blockquotecite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com" type="cite"><pre wrap="">
-- We could get a unique-key failure        BEGIN            INSERT INTO "forums_readposts" ("userid", "threadid",
 
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;            RETURN;        EXCEPTION WHEN unique_violation THEN            -- Do nothing, and loop
totry the update again        END;    END LOOP;END;
 
$FuncTag$
LANGUAGE plpgsql;
----------------------------- </pre></blockquote><br /> if you want to do something like this either do a test first to
seeif the key is present in the table, update or do an insert like this<br /> There is no reason to do a loop in the
functionwaiting for a lock to clear.   Postgresql Locks do not work like MySQL.  <br /><pre wrap="">CREATE FUNCTION
FORUM_CATCH_UP_ALL(pUserID INTEGER) RETURNS VOID AS
 
$FuncTag$BEGIN</pre>                       select lastpostread  from forums_readposts  where userid = pUserId;<br />
                     if (  found() ) then<br />                          UPDATE forums_readposts SET "lastpostread" =
(SELECTlastpost FROM forums_topics WHERE blah blah) ; --its helpful to post the entire function ;<br />                
     else<br />                          INSERT INTO forums_readposts ( userid, threadid,<br />                        
           lastpostread) (SELECT $1, id, lastpost FROM forums_topics)<br />                                     WHERE
userid= pUserID;<br />                       end;<br /><pre wrap="">    END;
 
$FuncTag$
LANGUAGE plpgsql;</pre><br /><blockquote cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com"
type="cite"><prewrap="">
 
I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.

If anyone can point me in the direction to take another step in, I'd
really appreciate it.

Thanks. </pre></blockquote><br /> Creating a view would work better and than creating a table to track this.  I would
thinkthis website tracks the last time the user logged in correct???  This is going to create allot of over head
maintainingthis table when a simple select statement will work so much better if i understand what you are
doing.....<br/><br /><br /> 

Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
Martin Gainty
Date:
contempating installing a insert,update trigger on forums_topics table something like

SET search_path = public;

CREATE TRIGGER "MyTableName_Trig"
AFTER INSERT OR DELETE OR UPDATE ON "forum_topics"
FOR EACH ROW EXECUTE PROCEDURE "FORUM_CATCH_UP_ALL" ();

HTH
Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
 
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.






Date: Sat, 27 Jun 2009 21:13:23 -0400
From: justin@emproshunts.com
To: apseudoutopia@gmail.com; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

APseudoUtopia wrote:
Hey list,

I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure:
Wouldn't a view be better than having a table that is deleted and updated all the time.  I would add a field in the user table called  last_login type timestamp  then do a select from the forums table to generate this table where last_login <= FormTimeStamp .

I don't see the point having this table when a view would work better. 



Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:

-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
LOOP
-- Try to update the record
-- This query is broken. I'm not sure how to do the subquery or
whatever I need to do. Maybe FROM? Another loop?
-- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah
IF found THEN
RETURN;
END IF;
-- Not there, try to insert the key
-- If someone else inserts the same key concurrently

			-- We could get a unique-key failure
BEGIN
INSERT INTO "forums_readposts" ("userid", "threadid",
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the update again
END;
END LOOP;
END;
$FuncTag$
LANGUAGE plpgsql;
-----------------------------

if you want to do something like this either do a test first to see if the key is present in the table, update or do an insert like this
There is no reason to do a loop in the function waiting for a lock to clear.   Postgresql Locks do not work like MySQL. 
CREATE FUNCTION FORUM_CATCH_UP_ALL (pUserID INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
                      select lastpostread  from forums_readposts  where userid = pUserId;
                      if (  found() ) then
                         UPDATE forums_readposts SET "lastpostread" = (SELECT lastpost FROM forums_topics WHERE blah blah) ; --its helpful to post the entire function ;
                      else
                         INSERT INTO forums_readposts ( userid, threadid,
                                    lastpostread) (SELECT $1, id, lastpost FROM forums_topics)
                                    WHERE userid = pUserID;
                      end;
	END;
$FuncTag$
LANGUAGE plpgsql;

I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.

If anyone can point me in the direction to take another step in, I'd
really appreciate it.

Thanks.

Creating a view would work better and than creating a table to track this.  I would think this website tracks the last time the user logged in correct???  This is going to create allot of over head maintaining this table when a simple select statement will work so much better if i understand what you are doing.....




Lauren found her dream laptop. Find the PC that’s right for you.

Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
Greg Stark
Date:
On Sun, Jun 28, 2009 at 2:13 AM, justin<justin@emproshunts.com> wrote:
>
> if you want to do something like this either do a test first to see if the
> key is present in the table, update or do an insert like this
> There is no reason to do a loop in the function waiting for a lock to
> clear.   Postgresql Locks do not work like MySQL.

The latter is exactly what he posted, you *do* have to loop because
two sessions can try to do the update, find no records, and then both
try to insert and fail.


The problem is that the example posted is for a single update/insert.
The problem you're solving is for merging in a whole set of changes.
That's a bit more painful.

I think you're going to want something like;

UPDATE forums_readposts
       SET lastpostread=(
                   select lastpost
                      from forums_topics
                    where id=threadid
               )
 WHERE userid=_id

INSERT INTO forums_readposts
    (userid,threadid,lastpostread)
    (select _userid, id, lastpost
       from forums_topics
     where id not in (
              select threadid
                 from forum_readposts existing
               where existing.userid=_userid
             )
    )


(you might want to experiment with that as an NOT EXISTS as there are
still cases where one is optimized better than the other due to the
standard's required null behaviour)

You have a few options here. You could just decide concurrency for
this operation really isn't important and use something to serialize
this operation. For example you could lock the user record with an
explicit select for update on the user record and the commit
immediately afterward.

Or you could catch the exception around the insert and assume if that
happened you don't have to bother retrying because the other
transaction you collided with is presumably doing the same thing. That
would break if a user hit "catch up" and simultaneously clicked on a
new thread he hadn't read before in another window.

Or you could do the same kind of loop around this, just always doing
the insert since it should insert 0 records if there are no missing
threads.

You could skip the insert entirely if the number of records updated in
matches the number of threads and you have that number handy. That
would be an especially good idea if you catch the exception around the
insert since exceptions are moderately expensive. They create a
subtransaction. Probably not a factor for an operation like this which
isn't dominating the workload.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
APseudoUtopia
Date:
On Sat, Jun 27, 2009 at 9:13 PM, justin<justin@emproshunts.com> wrote:
> APseudoUtopia wrote:
>
> Hey list,
>
> I have a query which allows users to "Catch up" on read posts on the
> forum. It works by either updating or inserting the "last post read"
> number from every forum thread into the readposts table (for that
> userid and threadid combination, of course). Here's the table
> structure:
>
>
> Wouldn't a view be better than having a table that is deleted and updated
> all the time.  I would add a field in the user table called  last_login type
> timestamp  then do a select from the forums table to generate this table
> where last_login <= FormTimeStamp .
>
> I don't see the point having this table when a view would work better.
>
>
--snip--
>
> Creating a view would work better and than creating a table to track this.
> I would think this website tracks the last time the user logged in
> correct???  This is going to create allot of over head maintaining this
> table when a simple select statement will work so much better if i
> understand what you are doing.....
>
>
>

That wouldn't work. What if a user logs in, reads only one forum
thread, then logs out (intending to read all the other forum threads
at some point in the future when they log in again). If I used a VIEW,
it would automatically consider all those unread forum posts to be
read when the user logs out.

<br /><br /> APseudoUtopia wrote: <blockquote cite="mid:27ade5280906280922p2bd7ca35x549ae5993223a87a@mail.gmail.com"
type="cite"><prewrap="">thread, then logs out (intending to read all the other forum threads
 
at some point in the future when they log in again). If I used a VIEW,
it would automatically consider all those unread forum posts to be
read when the user logs out.
 </pre> That wouldn't work. What if a user logs in, reads only one forum<br /></blockquote><br /> You are keeping a
listof all the forums a user has read,  i would not worry about making sure the table tracking user activity has
duplicatekey values. The select can be limited to return just on row with the highest time stamp then compare this
resultto figure out what forms the user has not read yet.  This eliminates one of problems but creates a problem where
tabletracking user activity is going bloat but in low traffic times delete the duplicate values.<br /><br /> A similar
topicwas discussed  on the performance  mailing list, where updates are hung for several seconds for a similar tracking
table...<br/><a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php">http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php</a><br
/><br/>  <br /> 
To migrate the site, you can use an open source ETL tool.

Talend Open Studio is an open source ETL tool for data integration and
migration experts. It's easy to learn for a non-technical user. What
distinguishes Talend, when it comes to business users, is the tMap
component. It allows the user to get a graphical and functional view of
integration processes.
For more information: http://www.talend.com/



Justin-95 wrote:
>
>
>
>
>
>
>
>
>
> APseudoUtopia wrote:
>
>   thread, then logs out (intending to read all the other forum threads
> at some point in the future when they log in again). If I used a VIEW,
> it would automatically consider all those unread forum posts to be
> read when the user logs out.
>
>
> That wouldn't work. What if a user logs in, reads only one forum
>
>
> You are keeping a list of all the forums a user has read,  i would not
> worry about making sure the table tracking user activity has duplicate
> key values. The select can be limited to return just on row with the
> highest time stamp then compare this result to figure out what forms
> the user has not read yet.  This eliminates one of problems but creates
> a problem where table tracking user activity is going bloat but in low
> traffic times delete the duplicate values.
>
> A similar topic was discussed  on the performance  mailing list, where
> updates are hung for several seconds for a similar tracking table...
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php
>
>  
>
>
>
>

--
View this message in context:
http://www.nabble.com/Switching-from-MySQL%3A-ON-DUPLICATE-KEY-UPDATE%2C-plpgsql-function-tp24237803p24254206.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
Lennin Caro
Date:


--- On Mon, 6/29/09, Tguru <guru@talend.com> wrote:

> From: Tguru <guru@talend.com>
> Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function
> To: pgsql-general@postgresql.org
> Date: Monday, June 29, 2009, 1:33 PM
>
> To migrate the site, you can use an open source ETL tool.
>
> Talend Open Studio is an open source ETL tool for data
> integration and
> migration experts. It's easy to learn for a non-technical
> user. What
> distinguishes Talend, when it comes to business users, is
> the tMap
> component. It allows the user to get a graphical and
> functional view of
> integration processes.
> For more information: http://www.talend.com/
>

> Justin-95 wrote:

> >
> >
> > APseudoUtopia wrote:
> >
> >   thread, then logs out (intending to
> read all the other forum threads
> > at some point in the future when they log in again).
> If I used a VIEW,
> > it would automatically consider all those unread forum
> posts to be
> > read when the user logs out.
> >
> >   
> > That wouldn't work. What if a user logs in, reads only
> one forum
> >
> >
> > You are keeping a list of all the forums a user has
> read,  i would not
> > worry about making sure the table tracking user
> activity has duplicate
> > key values. The select can be limited to return just
> on row with the
> > highest time stamp then compare this result to figure
> out what forms
> > the user has not read yet.  This eliminates one of
> problems but creates
> > a problem where table tracking user activity is going
> bloat but in low
> > traffic times delete the duplicate values.
> >
> > A similar topic was discussed  on the performance 
> mailing list, where
> > updates are hung for several seconds for a similar
> tracking table...
> > http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php
>
> >
> >  
> >
> >
> >
> >


another option is Pentaho, is good and easy too http://kettle.pentaho.org/




Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
David Kerr
Date:
On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote:
- Hey list,
-
- I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
- going great. However, there's one problem I've been having trouble
- solving.
-
- I have a query which allows users to "Catch up" on read posts on the
- forum. It works by either updating or inserting the "last post read"
- number from every forum thread into the readposts table (for that
- userid and threadid combination, of course). Here's the table
- structure:
-
- CREATE TABLE "forums_readposts" (
-  "userid"    INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE,
-  "threadid"    INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON
- DELETE CASCADE,
-  "lastpostread"   INTEGER NOT NULL CHECK ("lastpostread" >= 0),
-  PRIMARY KEY ("userid", "threadid")
- );
-
- Here's the original MySQL query that I have (db_string is a php
- function that escapes the string):
-
- INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
- SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
- "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
- = "lastpost";

So regardless of other design issues. (i.e., assuming what you have was working
in MySQL).

Wouldn't you just be looking for something like:

BEGIN;
  EXECUTE 'insert into forums_readposts values ('...')';
        EXCEPTION when unique_violation THEN
                EXECUTE 'update forums_readposts set lastpostread = '...' ';
END;

The logic as i read your post is. If the user's never done a "catchup" operation
before, this will create the record. If he has, then it will update this record
to reflect the new transid.

Dave

Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
APseudoUtopia
Date:
On Mon, Jun 29, 2009 at 2:26 PM, David Kerr<dmk@mr-paradox.net> wrote:
> On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote:
> - Hey list,
> -
> - I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
> - going great. However, there's one problem I've been having trouble
> - solving.
> -
> - I have a query which allows users to "Catch up" on read posts on the
> - forum. It works by either updating or inserting the "last post read"
> - number from every forum thread into the readposts table (for that
> - userid and threadid combination, of course). Here's the table
> - structure:
> -
> - CREATE TABLE "forums_readposts" (
> -  "userid"    INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE,
> -  "threadid"    INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON
> - DELETE CASCADE,
> -  "lastpostread"   INTEGER NOT NULL CHECK ("lastpostread" >= 0),
> -  PRIMARY KEY ("userid", "threadid")
> - );
> -
> - Here's the original MySQL query that I have (db_string is a php
> - function that escapes the string):
> -
> - INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
> - SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
> - "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
> - = "lastpost";
>
> So regardless of other design issues. (i.e., assuming what you have was working
> in MySQL).
>
> Wouldn't you just be looking for something like:
>
> BEGIN;
>  EXECUTE 'insert into forums_readposts values ('...')';
>        EXCEPTION when unique_violation THEN
>                EXECUTE 'update forums_readposts set lastpostread = '...' ';
> END;
>
> The logic as i read your post is. If the user's never done a "catchup" operation
> before, this will create the record. If he has, then it will update this record
> to reflect the new transid.
>
> Dave
>

Hmm, yeah. I'm new to PostgreSQL, and have little experience with
MySQL other than basic queries and joins. I didn't get to the part in
the docs that covers EXECUTE yet, haha. That looks like it would do
exactly what I want it to.

I appreciate it.

Re: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

From
"Chris Spotts"
Date:
> >
> > Wouldn't you just be looking for something like:
> >
> > BEGIN;
> >  EXECUTE 'insert into forums_readposts values ('...')';
> >        EXCEPTION when unique_violation THEN
> >                EXECUTE 'update forums_readposts set lastpostread =
> '...' ';
> > END;

> > The logic as i read your post is. If the user's never done a
> "catchup" operation
> > before, this will create the record. If he has, then it will update
> this record
> > to reflect the new transid.
> >
> > Dave
> >
>
> Hmm, yeah. I'm new to PostgreSQL, and have little experience with
> MySQL other than basic queries and joins. I didn't get to the part in
> the docs that covers EXECUTE yet, haha. That looks like it would do
> exactly what I want it to.
[Spotts, Christopher]
There's a good example and some documentation of what Dave said in the manual at the bottom of this page.
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html
You should really only need to "EXECUTE" here if you have some dynamically determined columns or tables.  The "magic"
hereis a combination of "IF found" and "EXCEPTION WHEN unique violation"