Thread: Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function
Hey list, I'm migrating my site away from MySQL to PostgreSQL. So far, it's been going great. However, there's one problem I've been having trouble solving. I have a query which allows users to "Catch up" on read posts on the forum. It works by either updating or inserting the "last post read" number from every forum thread into the readposts table (for that userid and threadid combination, of course). Here's the table structure: CREATE TABLE "forums_readposts" ( "userid" INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE, "threadid" INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON DELETE CASCADE, "lastpostread" INTEGER NOT NULL CHECK ("lastpostread" >= 0), PRIMARY KEY ("userid", "threadid") ); Here's the original MySQL query that I have (db_string is a php function that escapes the string): INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread") SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id", "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread" = "lastpost"; Obviously this will not work with PostgreSQL. I've googled around a bit and I decided to create a plpgsql function to handle the task. I don't have much done, but here's what I have: ----------------------------- CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS $FuncTag$ BEGIN LOOP -- Try to update the record -- This query is broken. I'm not sure how to do the subquery or whatever I need to do. Maybe FROM? Another loop? -- UPDATE "forums_readposts" SET "lastpostread" = (SELECT "lastpost" FROM "forums_topics" WHERE blah blah IF found THEN RETURN; END IF; -- Not there, try to insert the key -- If someone else inserts the same key concurrently, -- We could get a unique-key failure BEGIN INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics") WHERE "userid" = $1; RETURN; EXCEPTION WHEN unique_violation THEN -- Do nothing, and loop to try the update again END; END LOOP; END; $FuncTag$ LANGUAGE plpgsql; ----------------------------- I got the structure from the example in the postgresql documentation. Hopefully it's a step in the right direction. If anyone can point me in the direction to take another step in, I'd really appreciate it. Thanks.
APseudoUtopia wrote: <blockquote cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com" type="cite"><prewrap="">Hey list, I have a query which allows users to "Catch up" on read posts on the forum. It works by either updating or inserting the "last post read" number from every forum thread into the readposts table (for that userid and threadid combination, of course). Here's the table structure: </pre></blockquote> Wouldn't a view be better than having a table that is deleted and updated all the time. Iwould add a field in the user table called last_login type timestamp then do a select from the forums table to generatethis table where last_login <= FormTimeStamp .<br /><br /> I don't see the point having this table when a viewwould work better. <br /><br /><br /><br /><blockquote cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com"type="cite"><pre wrap=""> Obviously this will not work with PostgreSQL. I've googled around a bit and I decided to create a plpgsql function to handle the task. I don't have much done, but here's what I have: ----------------------------- CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS $FuncTag$BEGIN LOOP -- Try to update the record -- This query is broken. I'm not sure how to do the subqueryor whatever I need to do. Maybe FROM? Another loop? -- UPDATE "forums_readposts" SET "lastpostread" = (SELECT "lastpost" FROM "forums_topics" WHERE blah blah IF found THEN RETURN; END IF; -- Not there,try to insert the key -- If someone else inserts the same key concurrently </pre></blockquote><br /><blockquotecite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com" type="cite"><pre wrap=""> -- We could get a unique-key failure BEGIN INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics") WHERE "userid" = $1; RETURN; EXCEPTION WHEN unique_violation THEN -- Do nothing, and loop totry the update again END; END LOOP;END; $FuncTag$ LANGUAGE plpgsql; ----------------------------- </pre></blockquote><br /> if you want to do something like this either do a test first to seeif the key is present in the table, update or do an insert like this<br /> There is no reason to do a loop in the functionwaiting for a lock to clear. Postgresql Locks do not work like MySQL. <br /><pre wrap="">CREATE FUNCTION FORUM_CATCH_UP_ALL(pUserID INTEGER) RETURNS VOID AS $FuncTag$BEGIN</pre> select lastpostread from forums_readposts where userid = pUserId;<br /> if ( found() ) then<br /> UPDATE forums_readposts SET "lastpostread" = (SELECTlastpost FROM forums_topics WHERE blah blah) ; --its helpful to post the entire function ;<br /> else<br /> INSERT INTO forums_readposts ( userid, threadid,<br /> lastpostread) (SELECT $1, id, lastpost FROM forums_topics)<br /> WHERE userid= pUserID;<br /> end;<br /><pre wrap=""> END; $FuncTag$ LANGUAGE plpgsql;</pre><br /><blockquote cite="mid:27ade5280906271723s2621198eo66c944065e710fc4@mail.gmail.com" type="cite"><prewrap=""> I got the structure from the example in the postgresql documentation. Hopefully it's a step in the right direction. If anyone can point me in the direction to take another step in, I'd really appreciate it. Thanks. </pre></blockquote><br /> Creating a view would work better and than creating a table to track this. I would thinkthis website tracks the last time the user logged in correct??? This is going to create allot of over head maintainingthis table when a simple select statement will work so much better if i understand what you are doing.....<br/><br /><br />
contempating installing a insert,update trigger on forums_topics table something like
SET search_path = public;
CREATE TRIGGER "MyTableName_Trig"
AFTER INSERT OR DELETE OR UPDATE ON "forum_topics"
FOR EACH ROW EXECUTE PROCEDURE "FORUM_CATCH_UP_ALL" ();
HTH
Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Date: Sat, 27 Jun 2009 21:13:23 -0400
From: justin@emproshunts.com
To: apseudoutopia@gmail.com; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function
APseudoUtopia wrote:
I don't see the point having this table when a view would work better.
if you want to do something like this either do a test first to see if the key is present in the table, update or do an insert like this
There is no reason to do a loop in the function waiting for a lock to clear. Postgresql Locks do not work like MySQL.
if ( found() ) then
UPDATE forums_readposts SET "lastpostread" = (SELECT lastpost FROM forums_topics WHERE blah blah) ; --its helpful to post the entire function ;
else
INSERT INTO forums_readposts ( userid, threadid,
lastpostread) (SELECT $1, id, lastpost FROM forums_topics)
WHERE userid = pUserID;
end;
Creating a view would work better and than creating a table to track this. I would think this website tracks the last time the user logged in correct??? This is going to create allot of over head maintaining this table when a simple select statement will work so much better if i understand what you are doing.....
Lauren found her dream laptop. Find the PC that’s right for you.
SET search_path = public;
CREATE TRIGGER "MyTableName_Trig"
AFTER INSERT OR DELETE OR UPDATE ON "forum_topics"
FOR EACH ROW EXECUTE PROCEDURE "FORUM_CATCH_UP_ALL" ();
HTH
Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
Date: Sat, 27 Jun 2009 21:13:23 -0400
From: justin@emproshunts.com
To: apseudoutopia@gmail.com; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function
APseudoUtopia wrote:
Wouldn't a view be better than having a table that is deleted and updated all the time. I would add a field in the user table called last_login type timestamp then do a select from the forums table to generate this table where last_login <= FormTimeStamp .Hey list,
I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure:
I don't see the point having this table when a view would work better.
Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:
-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
LOOP
-- Try to update the record
-- This query is broken. I'm not sure how to do the subquery or
whatever I need to do. Maybe FROM? Another loop?
-- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah
IF found THEN
RETURN;
END IF;
-- Not there, try to insert the key
-- If someone else inserts the same key concurrently
-- We could get a unique-key failure
BEGIN
INSERT INTO "forums_readposts" ("userid", "threadid",
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the update again
END;
END LOOP;
END;
$FuncTag$
LANGUAGE plpgsql;
-----------------------------
if you want to do something like this either do a test first to see if the key is present in the table, update or do an insert like this
There is no reason to do a loop in the function waiting for a lock to clear. Postgresql Locks do not work like MySQL.
CREATE FUNCTION FORUM_CATCH_UP_ALL (pUserID INTEGER) RETURNS VOID ASselect lastpostread from forums_readposts where userid = pUserId;
$FuncTag$
BEGIN
if ( found() ) then
UPDATE forums_readposts SET "lastpostread" = (SELECT lastpost FROM forums_topics WHERE blah blah) ; --its helpful to post the entire function ;
else
INSERT INTO forums_readposts ( userid, threadid,
lastpostread) (SELECT $1, id, lastpost FROM forums_topics)
WHERE userid = pUserID;
end;
END;
$FuncTag$
LANGUAGE plpgsql;
I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.
If anyone can point me in the direction to take another step in, I'd
really appreciate it.
Thanks.
Creating a view would work better and than creating a table to track this. I would think this website tracks the last time the user logged in correct??? This is going to create allot of over head maintaining this table when a simple select statement will work so much better if i understand what you are doing.....
Lauren found her dream laptop. Find the PC that’s right for you.
On Sun, Jun 28, 2009 at 2:13 AM, justin<justin@emproshunts.com> wrote: > > if you want to do something like this either do a test first to see if the > key is present in the table, update or do an insert like this > There is no reason to do a loop in the function waiting for a lock to > clear. Postgresql Locks do not work like MySQL. The latter is exactly what he posted, you *do* have to loop because two sessions can try to do the update, find no records, and then both try to insert and fail. The problem is that the example posted is for a single update/insert. The problem you're solving is for merging in a whole set of changes. That's a bit more painful. I think you're going to want something like; UPDATE forums_readposts SET lastpostread=( select lastpost from forums_topics where id=threadid ) WHERE userid=_id INSERT INTO forums_readposts (userid,threadid,lastpostread) (select _userid, id, lastpost from forums_topics where id not in ( select threadid from forum_readposts existing where existing.userid=_userid ) ) (you might want to experiment with that as an NOT EXISTS as there are still cases where one is optimized better than the other due to the standard's required null behaviour) You have a few options here. You could just decide concurrency for this operation really isn't important and use something to serialize this operation. For example you could lock the user record with an explicit select for update on the user record and the commit immediately afterward. Or you could catch the exception around the insert and assume if that happened you don't have to bother retrying because the other transaction you collided with is presumably doing the same thing. That would break if a user hit "catch up" and simultaneously clicked on a new thread he hadn't read before in another window. Or you could do the same kind of loop around this, just always doing the insert since it should insert 0 records if there are no missing threads. You could skip the insert entirely if the number of records updated in matches the number of threads and you have that number handy. That would be an especially good idea if you catch the exception around the insert since exceptions are moderately expensive. They create a subtransaction. Probably not a factor for an operation like this which isn't dominating the workload. -- greg http://mit.edu/~gsstark/resume.pdf
On Sat, Jun 27, 2009 at 9:13 PM, justin<justin@emproshunts.com> wrote: > APseudoUtopia wrote: > > Hey list, > > I have a query which allows users to "Catch up" on read posts on the > forum. It works by either updating or inserting the "last post read" > number from every forum thread into the readposts table (for that > userid and threadid combination, of course). Here's the table > structure: > > > Wouldn't a view be better than having a table that is deleted and updated > all the time. I would add a field in the user table called last_login type > timestamp then do a select from the forums table to generate this table > where last_login <= FormTimeStamp . > > I don't see the point having this table when a view would work better. > > --snip-- > > Creating a view would work better and than creating a table to track this. > I would think this website tracks the last time the user logged in > correct??? This is going to create allot of over head maintaining this > table when a simple select statement will work so much better if i > understand what you are doing..... > > > That wouldn't work. What if a user logs in, reads only one forum thread, then logs out (intending to read all the other forum threads at some point in the future when they log in again). If I used a VIEW, it would automatically consider all those unread forum posts to be read when the user logs out.
<br /><br /> APseudoUtopia wrote: <blockquote cite="mid:27ade5280906280922p2bd7ca35x549ae5993223a87a@mail.gmail.com" type="cite"><prewrap="">thread, then logs out (intending to read all the other forum threads at some point in the future when they log in again). If I used a VIEW, it would automatically consider all those unread forum posts to be read when the user logs out. </pre> That wouldn't work. What if a user logs in, reads only one forum<br /></blockquote><br /> You are keeping a listof all the forums a user has read, i would not worry about making sure the table tracking user activity has duplicatekey values. The select can be limited to return just on row with the highest time stamp then compare this resultto figure out what forms the user has not read yet. This eliminates one of problems but creates a problem where tabletracking user activity is going bloat but in low traffic times delete the duplicate values.<br /><br /> A similar topicwas discussed on the performance mailing list, where updates are hung for several seconds for a similar tracking table...<br/><a class="moz-txt-link-freetext" href="http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php">http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php</a><br /><br/> <br />
To migrate the site, you can use an open source ETL tool. Talend Open Studio is an open source ETL tool for data integration and migration experts. It's easy to learn for a non-technical user. What distinguishes Talend, when it comes to business users, is the tMap component. It allows the user to get a graphical and functional view of integration processes. For more information: http://www.talend.com/ Justin-95 wrote: > > > > > > > > > > APseudoUtopia wrote: > > thread, then logs out (intending to read all the other forum threads > at some point in the future when they log in again). If I used a VIEW, > it would automatically consider all those unread forum posts to be > read when the user logs out. > > > That wouldn't work. What if a user logs in, reads only one forum > > > You are keeping a list of all the forums a user has read, i would not > worry about making sure the table tracking user activity has duplicate > key values. The select can be limited to return just on row with the > highest time stamp then compare this result to figure out what forms > the user has not read yet. This eliminates one of problems but creates > a problem where table tracking user activity is going bloat but in low > traffic times delete the duplicate values. > > A similar topic was discussed on the performance mailing list, where > updates are hung for several seconds for a similar tracking table... > http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php > > > > > > -- View this message in context: http://www.nabble.com/Switching-from-MySQL%3A-ON-DUPLICATE-KEY-UPDATE%2C-plpgsql-function-tp24237803p24254206.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--- On Mon, 6/29/09, Tguru <guru@talend.com> wrote: > From: Tguru <guru@talend.com> > Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function > To: pgsql-general@postgresql.org > Date: Monday, June 29, 2009, 1:33 PM > > To migrate the site, you can use an open source ETL tool. > > Talend Open Studio is an open source ETL tool for data > integration and > migration experts. It's easy to learn for a non-technical > user. What > distinguishes Talend, when it comes to business users, is > the tMap > component. It allows the user to get a graphical and > functional view of > integration processes. > For more information: http://www.talend.com/ > > Justin-95 wrote: > > > > > > APseudoUtopia wrote: > > > > thread, then logs out (intending to > read all the other forum threads > > at some point in the future when they log in again). > If I used a VIEW, > > it would automatically consider all those unread forum > posts to be > > read when the user logs out. > > > > > > That wouldn't work. What if a user logs in, reads only > one forum > > > > > > You are keeping a list of all the forums a user has > read, i would not > > worry about making sure the table tracking user > activity has duplicate > > key values. The select can be limited to return just > on row with the > > highest time stamp then compare this result to figure > out what forms > > the user has not read yet. This eliminates one of > problems but creates > > a problem where table tracking user activity is going > bloat but in low > > traffic times delete the duplicate values. > > > > A similar topic was discussed on the performance > mailing list, where > > updates are hung for several seconds for a similar > tracking table... > > http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php > > > > > > > > > > > > > another option is Pentaho, is good and easy too http://kettle.pentaho.org/
On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote: - Hey list, - - I'm migrating my site away from MySQL to PostgreSQL. So far, it's been - going great. However, there's one problem I've been having trouble - solving. - - I have a query which allows users to "Catch up" on read posts on the - forum. It works by either updating or inserting the "last post read" - number from every forum thread into the readposts table (for that - userid and threadid combination, of course). Here's the table - structure: - - CREATE TABLE "forums_readposts" ( - "userid" INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE, - "threadid" INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON - DELETE CASCADE, - "lastpostread" INTEGER NOT NULL CHECK ("lastpostread" >= 0), - PRIMARY KEY ("userid", "threadid") - ); - - Here's the original MySQL query that I have (db_string is a php - function that escapes the string): - - INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread") - SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id", - "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread" - = "lastpost"; So regardless of other design issues. (i.e., assuming what you have was working in MySQL). Wouldn't you just be looking for something like: BEGIN; EXECUTE 'insert into forums_readposts values ('...')'; EXCEPTION when unique_violation THEN EXECUTE 'update forums_readposts set lastpostread = '...' '; END; The logic as i read your post is. If the user's never done a "catchup" operation before, this will create the record. If he has, then it will update this record to reflect the new transid. Dave
On Mon, Jun 29, 2009 at 2:26 PM, David Kerr<dmk@mr-paradox.net> wrote: > On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote: > - Hey list, > - > - I'm migrating my site away from MySQL to PostgreSQL. So far, it's been > - going great. However, there's one problem I've been having trouble > - solving. > - > - I have a query which allows users to "Catch up" on read posts on the > - forum. It works by either updating or inserting the "last post read" > - number from every forum thread into the readposts table (for that > - userid and threadid combination, of course). Here's the table > - structure: > - > - CREATE TABLE "forums_readposts" ( > - "userid" INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE, > - "threadid" INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON > - DELETE CASCADE, > - "lastpostread" INTEGER NOT NULL CHECK ("lastpostread" >= 0), > - PRIMARY KEY ("userid", "threadid") > - ); > - > - Here's the original MySQL query that I have (db_string is a php > - function that escapes the string): > - > - INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread") > - SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id", > - "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread" > - = "lastpost"; > > So regardless of other design issues. (i.e., assuming what you have was working > in MySQL). > > Wouldn't you just be looking for something like: > > BEGIN; > EXECUTE 'insert into forums_readposts values ('...')'; > EXCEPTION when unique_violation THEN > EXECUTE 'update forums_readposts set lastpostread = '...' '; > END; > > The logic as i read your post is. If the user's never done a "catchup" operation > before, this will create the record. If he has, then it will update this record > to reflect the new transid. > > Dave > Hmm, yeah. I'm new to PostgreSQL, and have little experience with MySQL other than basic queries and joins. I didn't get to the part in the docs that covers EXECUTE yet, haha. That looks like it would do exactly what I want it to. I appreciate it.
> > > > Wouldn't you just be looking for something like: > > > > BEGIN; > > EXECUTE 'insert into forums_readposts values ('...')'; > > EXCEPTION when unique_violation THEN > > EXECUTE 'update forums_readposts set lastpostread = > '...' '; > > END; > > The logic as i read your post is. If the user's never done a > "catchup" operation > > before, this will create the record. If he has, then it will update > this record > > to reflect the new transid. > > > > Dave > > > > Hmm, yeah. I'm new to PostgreSQL, and have little experience with > MySQL other than basic queries and joins. I didn't get to the part in > the docs that covers EXECUTE yet, haha. That looks like it would do > exactly what I want it to. [Spotts, Christopher] There's a good example and some documentation of what Dave said in the manual at the bottom of this page. http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html You should really only need to "EXECUTE" here if you have some dynamically determined columns or tables. The "magic" hereis a combination of "IF found" and "EXCEPTION WHEN unique violation"