Thread: Simple Atomic Relationship Insert
Let's say I have two tables like this (I'm leaving stuff out for simplicity):
CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);
CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);
CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);
CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);
The hometowns table is populate as users are created. For example, a client may submit {"name":"Robert", "hometown":"Portland"}.
The hometowns table will never be updated, only either queries or inserted.
So given this I need to INSERT a row into "users" and either SELECT the hometowns.id that matches "Portland" or if it doesn't exist I INSERT it returning the hometowns.id".
Normally I would do by first doing a SELECT on hometown. If I don't get anything I do an INSERT into hometown RETURNING the id. If THAT throws an error then I do the SELECT again. Now I'm finally ready to INSERT into users using the hometowns.id from the above steps.
But wow, that seems like a lot of code for a simple "Add if doesn't exist" foreign key relationship -- but this is how I've always done.
So my question. Is there a simpler, more performant, or thread-safe way to do this?
Thanks!
Let's say I have two tables like this (I'm leaving stuff out for simplicity):
CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);
CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);The hometowns table is populate as users are created. For example, a client may submit {"name":"Robert", "hometown":"Portland"}.The hometowns table will never be updated, only either queries or inserted.So given this I need to INSERT a row into "users" and either SELECT the hometowns.id that matches "Portland" or if it doesn't exist I INSERT it returning the hometowns.id".Normally I would do by first doing a SELECT on hometown. If I don't get anything I do an INSERT into hometown RETURNING the id. If THAT throws an error then I do the SELECT again. Now I'm finally ready to INSERT into users using the hometowns.id from the above steps.But wow, that seems like a lot of code for a simple "Add if doesn't exist" foreign key relationship -- but this is how I've always done.So my question. Is there a simpler, more performant, or thread-safe way to do this?Thanks!
What occurs to me is to simply do an INSERT into the "hometowns" table and just ignore the "already exists" return indication. Then do a SELECT to get the hometowns id which now exists, then INSERT the users. but I could easily be overlooking some reason why this wouldn't work properly.
While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation.111,111,111 x 111,111,111 = 12,345,678,987,654,321
John McKown
Thanks John. I've been seeing a lot of examples like this lately. Does the following approach have any advantages over traditional approaches?
WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);
On Tue, Jan 13, 2015 at 8:50 AM, John McKown <john.archie.mckown@gmail.com> wrote:
Let's say I have two tables like this (I'm leaving stuff out for simplicity):
CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);
CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);The hometowns table is populate as users are created. For example, a client may submit {"name":"Robert", "hometown":"Portland"}.The hometowns table will never be updated, only either queries or inserted.So given this I need to INSERT a row into "users" and either SELECT the hometowns.id that matches "Portland" or if it doesn't exist I INSERT it returning the hometowns.id".Normally I would do by first doing a SELECT on hometown. If I don't get anything I do an INSERT into hometown RETURNING the id. If THAT throws an error then I do the SELECT again. Now I'm finally ready to INSERT into users using the hometowns.id from the above steps.But wow, that seems like a lot of code for a simple "Add if doesn't exist" foreign key relationship -- but this is how I've always done.So my question. Is there a simpler, more performant, or thread-safe way to do this?Thanks!What occurs to me is to simply do an INSERT into the "hometowns" table and just ignore the "already exists" return indication. Then do a SELECT to get the hometowns id which now exists, then INSERT the users. but I could easily be overlooking some reason why this wouldn't work properly.--While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation.111,111,111 x 111,111,111 = 12,345,678,987,654,321Maranatha! <><
John McKown
Thanks John. I've been seeing a lot of examples like this lately. Does the following approach have any advantages over traditional approaches?
WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);
Oh, that is very clever. I've not see such a thing before. Thanks.
I've added it to my stable of "tricks". Which aren't really tricks, just really nice new methods to do something.
The main advantage that I can see is that it is a single SQL statement to send to the server. That makes it "self contained" so that it would be more difficult for someone to accidentally mess it up. On the other hand, CTEs are still a bit new (at least to me) and so the "why it works" might not be very obvious to other programmers who might need to maintain the application. To many this "lack of obviousness" is a detriment. To me, it means "update your knowledge". But then, I am sometimes a arrogant BOFH. Add that to my being an surly old curmudgeon, and you can end up with some bad advice when in a "corporate" environment. The minus, at present, is that it is "clever" and so may violate corporate coding standards due to "complexity". Or maybe I just work for a staid company.
While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation.111,111,111 x 111,111,111 = 12,345,678,987,654,321
John McKown
This CTE approach doesn't appear to play well with multiple concurrent transactions/connections.
On Tue, Jan 13, 2015 at 10:05 AM, John McKown <john.archie.mckown@gmail.com> wrote:
Thanks John. I've been seeing a lot of examples like this lately. Does the following approach have any advantages over traditional approaches?
WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);Oh, that is very clever. I've not see such a thing before. Thanks.I've added it to my stable of "tricks". Which aren't really tricks, just really nice new methods to do something.The main advantage that I can see is that it is a single SQL statement to send to the server. That makes it "self contained" so that it would be more difficult for someone to accidentally mess it up. On the other hand, CTEs are still a bit new (at least to me) and so the "why it works" might not be very obvious to other programmers who might need to maintain the application. To many this "lack of obviousness" is a detriment. To me, it means "update your knowledge". But then, I am sometimes a arrogant BOFH. Add that to my being an surly old curmudgeon, and you can end up with some bad advice when in a "corporate" environment. The minus, at present, is that it is "clever" and so may violate corporate coding standards due to "complexity". Or maybe I just work for a staid company.--While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation.111,111,111 x 111,111,111 = 12,345,678,987,654,321Maranatha! <><
John McKown
What issue are you having? I'd imagine you have a race condition on the insert into hometowns, but you'd have that same race condition in your app code using a more traditional 3 query version as well. I often use CTEs like this to make things atomic. It allows me to remove transactional code out of the app and also to increase performance by reducing the back-and-forth to the db. http://omniti.com/seeds/writable-ctes-improve-performance On Tue, Jan 13, 2015 at 4:21 PM, Robert DiFalco <robert.difalco@gmail.com> wrote: > This CTE approach doesn't appear to play well with multiple concurrent > transactions/connections. > > On Tue, Jan 13, 2015 at 10:05 AM, John McKown <john.archie.mckown@gmail.com> > wrote: >> >> On Tue, Jan 13, 2015 at 11:45 AM, Robert DiFalco >> <robert.difalco@gmail.com> wrote: >>> >>> Thanks John. I've been seeing a lot of examples like this lately. Does >>> the following approach have any advantages over traditional approaches? >>> >>> WITH sel AS ( >>> SELECT id FROM hometowns WHERE name = 'Portland' >>> ), ins AS ( >>> INSERT INTO hometowns(name) >>> SELECT 'Portland' >>> WHERE NOT EXISTS (SELECT 1 FROM sel) >>> RETURNING id >>> ) >>> INSERT INTO users(name, hometown_id) >>> VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel); >>> >>> >> >> Oh, that is very clever. I've not see such a thing before. Thanks. >> >> I've added it to my stable of "tricks". Which aren't really tricks, just >> really nice new methods to do something. >> >> The main advantage that I can see is that it is a single SQL statement to >> send to the server. That makes it "self contained" so that it would be more >> difficult for someone to accidentally mess it up. On the other hand, CTEs >> are still a bit new (at least to me) and so the "why it works" might not be >> very obvious to other programmers who might need to maintain the >> application. To many this "lack of obviousness" is a detriment. To me, it >> means "update your knowledge". But then, I am sometimes a arrogant BOFH. Add >> that to my being an surly old curmudgeon, and you can end up with some bad >> advice when in a "corporate" environment. The minus, at present, is that it >> is "clever" and so may violate corporate coding standards due to >> "complexity". Or maybe I just work for a staid company. >> >> -- >> While a transcendent vocabulary is laudable, one must be eternally careful >> so that the calculated objective of communication does not become ensconced >> in obscurity. In other words, eschew obfuscation. >> >> 111,111,111 x 111,111,111 = 12,345,678,987,654,321 >> >> Maranatha! <>< >> John McKown > >
Well, traditionally I would create a LOOP where I tried the SELECT, if there was nothing I did the INSERT, if that raised an exception I would repeat the LOOP.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;
What's the best way to do it with the CTE? Currently I have the following which gives me Duplicate Key Exceptions when two sessions try to insert the same record at the same time.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;
And that is no bueno. Should I just put the whole thing in a LOOP?
With the single CTE I don't believe you can do a full upsert loop. If you're doing this inside of a postgres function, your changes are already atomic, so I don't believe by switching you are buying yourself much (if anything) by using a CTE query instead of something more traditional here. The advantages of switching to a CTE would be if this code was all being done inside of the app code with multiple queries. On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco <robert.difalco@gmail.com> wrote: > Well, traditionally I would create a LOOP where I tried the SELECT, if there > was nothing I did the INSERT, if that raised an exception I would repeat the > LOOP. > > What's the best way to do it with the CTE? Currently I have the following > which gives me Duplicate Key Exceptions when two sessions try to insert the > same record at the same time. > > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS > INTEGER AS $ > DECLARE hometown_id INTEGER; > BEGIN > WITH sel AS ( > SELECT id FROM hometowns WHERE name = hometown_name > ), ins AS ( > INSERT INTO hometowns (name) > SELECT hometown_name > WHERE NOT EXISTS(SELECT 1 FROM sel) > RETURNING id > ) > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel; > RETURN hometown_id; > END; > $ LANGUAGE plpgsql; > > And that is no bueno. Should I just put the whole thing in a LOOP?
This seems to get rid of the INSERT race condition.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;
On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:
With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.
The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.
On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:
> Well, traditionally I would create a LOOP where I tried the SELECT, if there
> was nothing I did the INSERT, if that raised an exception I would repeat the
> LOOP.
>
> What's the best way to do it with the CTE? Currently I have the following
> which gives me Duplicate Key Exceptions when two sessions try to insert the
> same record at the same time.
>
> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
> INTEGER AS $
> DECLARE hometown_id INTEGER;
> BEGIN
> WITH sel AS (
> SELECT id FROM hometowns WHERE name = hometown_name
> ), ins AS (
> INSERT INTO hometowns (name)
> SELECT hometown_name
> WHERE NOT EXISTS(SELECT 1 FROM sel)
> RETURNING id
> )
> SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> RETURN hometown_id;
> END;
> $ LANGUAGE plpgsql;
>
> And that is no bueno. Should I just put the whole thing in a LOOP?
The loop to run it twice handles that yes. I don't think that buys you anything over a more traditional non-cte method though. I'd run them a few thousand times to see if there's any difference in runtimes but my guess is the CTE version would be slightly slower here. v_id integer; BEGIN; select id into v_id from hometowns where name = hometown_name; BEGIN insert into hometowns (name) select hometown_name where v_id is null returning id into v_id; EXCEPTION WHEN unique_violation THEN select id into v_id from hometowns where name = hometown_name; END; insert into users (name, hometown_id) values ('Robert', v_id); END; On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco <robert.difalco@gmail.com> wrote: > This seems to get rid of the INSERT race condition. > > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS > INTEGER AS $ > DECLARE hometown_id INTEGER; > BEGIN > LOOP > BEGIN > WITH sel AS ( > SELECT id FROM hometowns WHERE name = hometown_name > ), ins AS ( > INSERT INTO hometowns (name) > SELECT hometown_name > WHERE NOT EXISTS(SELECT 1 FROM sel) > RETURNING id > ) > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel; > RETURN hometown_id; > > EXCEPTION WHEN unique_violation > THEN > END; > END LOOP; > END; > $ LANGUAGE plpgsql; > > > On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote: >> >> With the single CTE I don't believe you can do a full upsert loop. If >> you're doing this inside of a postgres function, your changes are >> already atomic, so I don't believe by switching you are buying >> yourself much (if anything) by using a CTE query instead of something >> more traditional here. >> >> The advantages of switching to a CTE would be if this code was all >> being done inside of the app code with multiple queries. >> >> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco >> <robert.difalco@gmail.com> wrote: >> > Well, traditionally I would create a LOOP where I tried the SELECT, if >> > there >> > was nothing I did the INSERT, if that raised an exception I would repeat >> > the >> > LOOP. >> > >> > What's the best way to do it with the CTE? Currently I have the >> > following >> > which gives me Duplicate Key Exceptions when two sessions try to insert >> > the >> > same record at the same time. >> > >> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) >> > RETURNS >> > INTEGER AS $ >> > DECLARE hometown_id INTEGER; >> > BEGIN >> > WITH sel AS ( >> > SELECT id FROM hometowns WHERE name = hometown_name >> > ), ins AS ( >> > INSERT INTO hometowns (name) >> > SELECT hometown_name >> > WHERE NOT EXISTS(SELECT 1 FROM sel) >> > RETURNING id >> > ) >> > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel; >> > RETURN hometown_id; >> > END; >> > $ LANGUAGE plpgsql; >> > >> > And that is no bueno. Should I just put the whole thing in a LOOP? > >
Brian Dunavant wrote on 13.01.2015 22:33: > What issue are you having? I'd imagine you have a race condition on > the insert into hometowns, but you'd have that same race condition in > your app code using a more traditional 3 query version as well. > > I often use CTEs like this to make things atomic. It allows me to > remove transactional code out of the app and also to increase > performance by reducing the back-and-forth to the db. > http://omniti.com/seeds/writable-ctes-improve-performance > Craig Ringer explained some of the pitfalls of this approach here: http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates which is a follow up question based on this: http://stackoverflow.com/a/8702291/330315 Thomas
A very good point, but it does not apply as here (and in my article) we are not using updates, only insert and select. On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Brian Dunavant wrote on 13.01.2015 22:33: >> >> What issue are you having? I'd imagine you have a race condition on >> the insert into hometowns, but you'd have that same race condition in >> your app code using a more traditional 3 query version as well. >> >> I often use CTEs like this to make things atomic. It allows me to >> remove transactional code out of the app and also to increase >> performance by reducing the back-and-forth to the db. >> http://omniti.com/seeds/writable-ctes-improve-performance >> > > Craig Ringer explained some of the pitfalls of this approach here: > > http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates > > which is a follow up question based on this: > http://stackoverflow.com/a/8702291/330315 > > Thomas > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Good points. I guess my feeling is that if there can be a race condition on INSERT then the CTE version is not truly atomic, hence the LOOP.
On Tue, Jan 13, 2015 at 3:11 PM, Brian Dunavant <brian@omniti.com> wrote:
A very good point, but it does not apply as here (and in my article)
we are not using updates, only insert and select.
On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Brian Dunavant wrote on 13.01.2015 22:33:
>>
>> What issue are you having? I'd imagine you have a race condition on
>> the insert into hometowns, but you'd have that same race condition in
>> your app code using a more traditional 3 query version as well.
>>
>> I often use CTEs like this to make things atomic. It allows me to
>> remove transactional code out of the app and also to increase
>> performance by reducing the back-and-forth to the db.
>> http://omniti.com/seeds/writable-ctes-improve-performance
>>
>
> Craig Ringer explained some of the pitfalls of this approach here:
>
> http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates
>
> which is a follow up question based on this:
> http://stackoverflow.com/a/8702291/330315
>
> Thomas
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
John McKown wrote: > On Tue, Jan 13, 2015 at 10:42 AM, Robert DiFalco > <robert.difalco@gmail.com <mailto:robert.difalco@gmail.com>>wrote: > > Let's say I have two tables like this (I'm leaving stuff out for > simplicity): > > CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1; > CREATE TABLE hometowns ( > id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'), > name VARCHAR, > PRIMARY KEY (id), > UNIQUE(name) > ); > > CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1; > CREATE TABLE users ( > id BIGINT DEFAULT nextval('USER_SEQ_GEN'), > hometown_id INTEGER, > name VARCHAR NOT NULL, > PRIMARY KEY (id), > FOREIGN KEY (hometown_id) REFERENCES hometowns(id) > ); > > The hometowns table is populate as users are created. For example, > a client may submit {"name":"Robert", "hometown":"Portland"}. > > The hometowns table will never be updated, only either queries or > inserted. > > So given this I need to INSERT a row into "users" and either SELECT > the hometowns.id <http://hometowns.id> that matches "Portland" or if > it doesn't exist I INSERT it returning the hometowns.id > <http://hometowns.id>". > > Normally I would do by first doing a SELECT on hometown. If I don't > get anything I do an INSERT into hometown RETURNING the id. If THAT > throws an error then I do the SELECT again. Now I'm finally ready to > INSERT into users using the hometowns.id <http://hometowns.id> from > the above steps. > > But wow, that seems like a lot of code for a simple "Add if doesn't > exist" foreign key relationship -- but this is how I've always done. > > So my question. Is there a simpler, more performant, or thread-safe > way to do this? > > Thanks! > > > What occurs to me is to simply do an INSERT into the "hometowns" table > and just ignore the "already exists" return indication. Then do a SELECT > to get the hometowns id which now exists, then INSERT the users. but I > could easily be overlooking some reason why this wouldn't work properly. And one more approach: CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1; CREATE TABLE hometowns ( id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'), name VARCHAR, PRIMARY KEY (id), UNIQUE(name) ); CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1; CREATE TABLE users ( id BIGINT DEFAULT nextval('USER_SEQ_GEN'), hometown_id INTEGER, name VARCHAR NOT NULL, PRIMARY KEY (id), FOREIGN KEY (hometown_id) REFERENCES hometowns(id) ); create or replace view user_town as select users.name as username, hometowns.name as hometownname from users join hometowns on hometowns.id = users.hometown_id; create rule user_town_exists as on insert to user_town where exists(select id from hometowns where (hometowns.name = new.hometownname)) do insert into users (name, hometown_id) values (new.username, (select id from hometowns where (hometowns.name = new.hometownname))); create rule user_town_not_exists as on insert to user_town where not exists(select id from hometowns where (hometowns.name = new.hometownname)) do ( insert into hometowns (name) values (new.hometownname); insert into users (name, hometown_id) values (new.username, (select id from hometowns where (hometowns.name = new.hometownname))); ); create rule user_town_nothing as on insert to user_town do instead nothing; --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
I don't like loops to catch "failure" condition... can you possibly fail to stop?
In a stored procedure (or with auto-commit turned off in any transaction)... You can avoid any race condition by using a semaphore (e.g. you lock "something" for the duration of the critical part of your processing so that anything that "would" update that data just waits).
[ http://www.postgresql.org/docs/9.4/static/applevel-consistency.html#NON-SERIALIZABLE-CONSISTENCY ]
When you have a sequence of steps that need to be serialized across processes, choose or even create a table to use for locking. SELECT FOR UPDATE "a" row in that table. Every process trying to access that row now waits until the first transaction to get the exclusive lock commits/rollbacks (no actual update need be executed). How atomic the lock is depends on what you use for your lock. (poor levels of atomicity will lead to performance problems that are hard to diagnose)
For the most recent version of this I've done... we used the unique business key in an project based audit table. (the table was totally unrelated to the work being done other than it had the same business key values and locking wouldn't interfere with other processing.) So if you had to "create" something to lock...
-- create something to lock... uniqueness is not required but allowed for if you need it
INSERT INTO LOCKIT (table, biz_key) VALUES ('hometowns',hometown_name);
insert into users(name, hometown_id) VALUES ('Robert', SELECT select_hometown_id(hometown_name));
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
SELECT FOR UPDATE from LOCKIT where table = 'hometowns' and biz_key = hometown_name; -- "wait"
END;
$ LANGUAGE plpgsql;
Only one process will be able to execute the CTE at a time - always - and no looping required.
Roxanne
On 1/13/2015 6:52 PM, Robert DiFalco wrote:
In a stored procedure (or with auto-commit turned off in any transaction)... You can avoid any race condition by using a semaphore (e.g. you lock "something" for the duration of the critical part of your processing so that anything that "would" update that data just waits).
[ http://www.postgresql.org/docs/9.4/static/applevel-consistency.html#NON-SERIALIZABLE-CONSISTENCY ]
When you have a sequence of steps that need to be serialized across processes, choose or even create a table to use for locking. SELECT FOR UPDATE "a" row in that table. Every process trying to access that row now waits until the first transaction to get the exclusive lock commits/rollbacks (no actual update need be executed). How atomic the lock is depends on what you use for your lock. (poor levels of atomicity will lead to performance problems that are hard to diagnose)
For the most recent version of this I've done... we used the unique business key in an project based audit table. (the table was totally unrelated to the work being done other than it had the same business key values and locking wouldn't interfere with other processing.) So if you had to "create" something to lock...
-- create something to lock... uniqueness is not required but allowed for if you need it
INSERT INTO LOCKIT (table, biz_key) VALUES ('hometowns',hometown_name);
insert into users(name, hometown_id) VALUES ('Robert', SELECT select_hometown_id(hometown_name));
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
SELECT FOR UPDATE from LOCKIT where table = 'hometowns' and biz_key = hometown_name; -- "wait"
WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;RETURN hometown_id;
END;
$ LANGUAGE plpgsql;
Only one process will be able to execute the CTE at a time - always - and no looping required.
Roxanne
On 1/13/2015 6:52 PM, Robert DiFalco wrote:
Good points. I guess my feeling is that if there can be a race condition on INSERT then the CTE version is not truly atomic, hence the LOOP.On Tue, Jan 13, 2015 at 3:11 PM, Brian Dunavant <brian@omniti.com> wrote:A very good point, but it does not apply as here (and in my article)
we are not using updates, only insert and select.
On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Brian Dunavant wrote on 13.01.2015 22:33:
>>
>> What issue are you having? I'd imagine you have a race condition on
>> the insert into hometowns, but you'd have that same race condition in
>> your app code using a more traditional 3 query version as well.
>>
>> I often use CTEs like this to make things atomic. It allows me to
>> remove transactional code out of the app and also to increase
>> performance by reducing the back-and-forth to the db.
>> http://omniti.com/seeds/writable-ctes-improve-performance
>>
>
> Craig Ringer explained some of the pitfalls of this approach here:
>
> http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates
>
> which is a follow up question based on this:
> http://stackoverflow.com/a/8702291/330315
>
> Thomas
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-- [At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science. Donald Knuth
Roxanne Reid-Bennett wrote: > When you have a sequence of steps that need to be serialized across > processes, choose or even create a table to use for locking This can also be done with an advisory lock, presumably faster: http://www.postgresql.org/docs/current/static/explicit-locking.html DECLARE lock_key int := hashtext(hometown_name); BEGIN SELECT pg_advisory_xact_lock(lock_key,0); // check for existence and insert if it doesn't exist END; When several sessions try to insert the same town (or strictly speaking, with the same hash), only one of them will be allowed to proceed, the others being put to wait until the first one commits or rollbacks, and so on until every session gets through. The lock is automatically released at the end of the transaction. It makes no difference if the "check and insert" is crammed into a single SQL statement or several statements in procedural style. This technique is intended to work with the default "Read Committed" isolation level, which allows the concurrent transactions to "see" the new row inserted by the single other transaction that got the "it does not yet exist" result in the check for existence, and proceeded to insert and eventually commit. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
I must be doing something wrong because both of these approaches are giving me deadlock exceptions.
On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com> wrote:
The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.
v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;
On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco<robert.difalco@gmail.com> wrote:
> This seems to get rid of the INSERT race condition.
>
> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
> INTEGER AS $
> DECLARE hometown_id INTEGER;
> BEGIN
> LOOP
> BEGIN
> WITH sel AS (
> SELECT id FROM hometowns WHERE name = hometown_name
> ), ins AS (
> INSERT INTO hometowns (name)
> SELECT hometown_name
> WHERE NOT EXISTS(SELECT 1 FROM sel)
> RETURNING id
> )
> SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> RETURN hometown_id;
>
> EXCEPTION WHEN unique_violation
> THEN
> END;
> END LOOP;
> END;
> $ LANGUAGE plpgsql;
>
>
> On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:
>>
>> With the single CTE I don't believe you can do a full upsert loop. If
>> you're doing this inside of a postgres function, your changes are
>> already atomic, so I don't believe by switching you are buying
>> yourself much (if anything) by using a CTE query instead of something
>> more traditional here.
>>
>> The advantages of switching to a CTE would be if this code was all
>> being done inside of the app code with multiple queries.
>>
>> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
>> <robert.difalco@gmail.com> wrote:
>> > Well, traditionally I would create a LOOP where I tried the SELECT, if
>> > there
>> > was nothing I did the INSERT, if that raised an exception I would repeat
>> > the
>> > LOOP.
>> >
>> > What's the best way to do it with the CTE? Currently I have the
>> > following
>> > which gives me Duplicate Key Exceptions when two sessions try to insert
>> > the
>> > same record at the same time.
>> >
>> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>> > RETURNS
>> > INTEGER AS $
>> > DECLARE hometown_id INTEGER;
>> > BEGIN
>> > WITH sel AS (
>> > SELECT id FROM hometowns WHERE name = hometown_name
>> > ), ins AS (
>> > INSERT INTO hometowns (name)
>> > SELECT hometown_name
>> > WHERE NOT EXISTS(SELECT 1 FROM sel)
>> > RETURNING id
>> > )
>> > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
>> > RETURN hometown_id;
>> > END;
>> > $ LANGUAGE plpgsql;
>> >
>> > And that is no bueno. Should I just put the whole thing in a LOOP?
>
>
FWIW I was using the select_hometown_id FUNCTION like this:
INSERT INTO users(...) values(..., select_hometown_id('Portland, OR'));
On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com> wrote:
The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.
v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;
On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco<robert.difalco@gmail.com> wrote:
> This seems to get rid of the INSERT race condition.
>
> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
> INTEGER AS $
> DECLARE hometown_id INTEGER;
> BEGIN
> LOOP
> BEGIN
> WITH sel AS (
> SELECT id FROM hometowns WHERE name = hometown_name
> ), ins AS (
> INSERT INTO hometowns (name)
> SELECT hometown_name
> WHERE NOT EXISTS(SELECT 1 FROM sel)
> RETURNING id
> )
> SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> RETURN hometown_id;
>
> EXCEPTION WHEN unique_violation
> THEN
> END;
> END LOOP;
> END;
> $ LANGUAGE plpgsql;
>
>
> On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:
>>
>> With the single CTE I don't believe you can do a full upsert loop. If
>> you're doing this inside of a postgres function, your changes are
>> already atomic, so I don't believe by switching you are buying
>> yourself much (if anything) by using a CTE query instead of something
>> more traditional here.
>>
>> The advantages of switching to a CTE would be if this code was all
>> being done inside of the app code with multiple queries.
>>
>> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
>> <robert.difalco@gmail.com> wrote:
>> > Well, traditionally I would create a LOOP where I tried the SELECT, if
>> > there
>> > was nothing I did the INSERT, if that raised an exception I would repeat
>> > the
>> > LOOP.
>> >
>> > What's the best way to do it with the CTE? Currently I have the
>> > following
>> > which gives me Duplicate Key Exceptions when two sessions try to insert
>> > the
>> > same record at the same time.
>> >
>> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>> > RETURNS
>> > INTEGER AS $
>> > DECLARE hometown_id INTEGER;
>> > BEGIN
>> > WITH sel AS (
>> > SELECT id FROM hometowns WHERE name = hometown_name
>> > ), ins AS (
>> > INSERT INTO hometowns (name)
>> > SELECT hometown_name
>> > WHERE NOT EXISTS(SELECT 1 FROM sel)
>> > RETURNING id
>> > )
>> > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
>> > RETURN hometown_id;
>> > END;
>> > $ LANGUAGE plpgsql;
>> >
>> > And that is no bueno. Should I just put the whole thing in a LOOP?
>
>
On 1/14/15 8:28 AM, Daniel Verite wrote: > Roxanne Reid-Bennett wrote: > >> >When you have a sequence of steps that need to be serialized across >> >processes, choose or even create a table to use for locking > This can also be done with an advisory lock, presumably faster: > http://www.postgresql.org/docs/current/static/explicit-locking.html > > DECLARE > lock_key int := hashtext(hometown_name); > BEGIN > SELECT pg_advisory_xact_lock(lock_key,0); > // check for existence and insert if it doesn't exist > END; I doubt that's going to be any faster than the preferred approach, which is documented in the ERROR TRAPPING section of theplpgsql docs (Example 40-2): http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/15/2015 6:12 PM, Robert DiFalco wrote:
FWIW I was using the select_hometown_id FUNCTION like this:INSERT INTO users(...) values(..., select_hometown_id('Portland, OR'));
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
IF (v_id IS NULL) THEN
select id into v_id from hometowns where name = hometown_name;
END IF;
return v_id;
EXCEPTION
WHEN OTHERS THEN
-- choose your poison, this really shouldn't get here
END;
return null;
END;
$BODY$
LANGUAGE plpgsql;
On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com> wrote:The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.
v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;
On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco<robert.difalco@gmail.com> wrote:
> This seems to get rid of the INSERT race condition.
>
> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
> INTEGER AS $
> DECLARE hometown_id INTEGER;
> BEGIN
> LOOP
> BEGIN
> WITH sel AS (
> SELECT id FROM hometowns WHERE name = hometown_name
> ), ins AS (
> INSERT INTO hometowns (name)
> SELECT hometown_name
> WHERE NOT EXISTS(SELECT 1 FROM sel)
> RETURNING id
> )
> SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> RETURN hometown_id;
>
> EXCEPTION WHEN unique_violation
> THEN
> END;
> END LOOP;
> END;
> $ LANGUAGE plpgsql;
>
>
> On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:
>>
>> With the single CTE I don't believe you can do a full upsert loop. If
>> you're doing this inside of a postgres function, your changes are
>> already atomic, so I don't believe by switching you are buying
>> yourself much (if anything) by using a CTE query instead of something
>> more traditional here.
>>
>> The advantages of switching to a CTE would be if this code was all
>> being done inside of the app code with multiple queries.
>>
>> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
>> <robert.difalco@gmail.com> wrote:
>> > Well, traditionally I would create a LOOP where I tried the SELECT, if
>> > there
>> > was nothing I did the INSERT, if that raised an exception I would repeat
>> > the
>> > LOOP.
>> >
>> > What's the best way to do it with the CTE? Currently I have the
>> > following
>> > which gives me Duplicate Key Exceptions when two sessions try to insert
>> > the
>> > same record at the same time.
>> >
>> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>> > RETURNS
>> > INTEGER AS $
>> > DECLARE hometown_id INTEGER;
>> > BEGIN
>> > WITH sel AS (
>> > SELECT id FROM hometowns WHERE name = hometown_name
>> > ), ins AS (
>> > INSERT INTO hometowns (name)
>> > SELECT hometown_name
>> > WHERE NOT EXISTS(SELECT 1 FROM sel)
>> > RETURNING id
>> > )
>> > SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
>> > RETURN hometown_id;
>> > END;
>> > $ LANGUAGE plpgsql;
>> >
>> > And that is no bueno. Should I just put the whole thing in a LOOP?
>
>
-- [At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science. Donald Knuth
On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote: > > try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again) > Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first- because it's the more restrictive lock. > > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS > INTEGER AS > $BODY$ > DECLARE > v_id integer; > BEGIN > -- perform pg_advisory_xact_lock(hashtext(hometown_name)); > BEGIN > insert into hometowns (name) > select hometown_name where not exists (select id from hometowns where name = hometown_name) > returning id into v_id; That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
The code shown in the Doc (I think) will still give you deadlock in the case where you have two sessions concurrently trying to insert the same 'hometown'. For example:
INSERT INTO users VALUES('Tom', select_hometown_id('Portland, OR'));
INSERT INTO users VALUES(''Waits', select_hometown_id('Portland, OR'));
The LOOP removes the race condition BUT with my limited knowledge I can only think of two ways to prevent the deadlock. Some kind of lock (not sure which) or force a separate transaction for the hometowns INSERT (also not sure how to do this in a FUNCTION). Is there an accepted approach here?
On Thu, Jan 15, 2015 at 11:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Robert DiFalco wrote: > I must be doing something wrong because both of these approaches are giving > me deadlock exceptions. Deadlocks are to be expected if the INSERTs are batched within a single transaction and there are several sessions doing this in parallel. Given that there's an unique constraint on hometowns(name), if this sequence happens (not even considering the "users" table to simplify): Session #1: begin; Session #2: begin; Session #1: insert into hometowns(name) values('City1'); Session #2: insert into hometowns(name) values('City2'); Session #1: insert into hometowns(name) values('City2'); => Session #1 is put to wait until Session #2 commits or rollbacks Session #2: insert into hometowns(name) values('City1'); => Session #2 should wait for Session #1 which is already waiting for Session #2: that's a deadlock It does not mean that the code meant to insert one user and the town without race condition is incorrect by itself. The unique_violation handler is not called in this scenario because the candidate row is not yet committed by the other session. This would work in an OLTP scenario when each "user" is commited after processing it. Avoiding deadlocks between parallel batches is a different problem than avoiding race conditions. If you have the above issue, I don't think you may solve it by tweaking the code of an individual process. It needs to be rethought at the level above, the one that initiates this in parallel and dictates the commit strategy. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Is there a way to force a new private transaction in a FUNCTION? That seems like it would be a good solution here because I could simply do the insert in the RAISE within its own private transaction. Then on the next iteration of the loop (as long as I don't have repeatable reads) it should be picked up. And there should only be a quick recoverable deadlock.
On Fri, Jan 16, 2015 at 7:49 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
Robert DiFalco wrote:
> I must be doing something wrong because both of these approaches are giving
> me deadlock exceptions.
Deadlocks are to be expected if the INSERTs are batched within a single
transaction and there are several sessions doing this in parallel.
Given that there's an unique constraint on hometowns(name), if this sequence
happens (not even considering the "users" table to simplify):
Session #1: begin;
Session #2: begin;
Session #1: insert into hometowns(name) values('City1');
Session #2: insert into hometowns(name) values('City2');
Session #1: insert into hometowns(name) values('City2');
=> Session #1 is put to wait until Session #2 commits or rollbacks
Session #2: insert into hometowns(name) values('City1');
=> Session #2 should wait for Session #1 which is already waiting for
Session #2: that's a deadlock
It does not mean that the code meant to insert one user and the town without
race condition is incorrect by itself. The unique_violation handler is not
called in this scenario because the candidate row is not yet committed by the
other session. This would work in an OLTP scenario when each "user" is
commited after processing it.
Avoiding deadlocks between parallel batches is a different problem than
avoiding race conditions. If you have the above issue, I don't think you may
solve it by tweaking the code of an individual process. It needs to be
rethought at the level above, the one that initiates this in parallel and
dictates the commit strategy.
Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 1/16/2015 2:41 AM, Jim Nasby wrote: > On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote: >> >> try this: (if you still get deadlocks, uncomment the advisory lock >> [thanks Daniel] and try again) >> Logically I suppose it might run faster to do the select, then insert >> "if". I almost always write these as insert first - because it's the >> more restrictive lock. >> >> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) >> RETURNS >> INTEGER AS >> $BODY$ >> DECLARE >> v_id integer; >> BEGIN >> -- perform pg_advisory_xact_lock(hashtext(hometown_name)); >> BEGIN >> insert into hometowns (name) >> select hometown_name where not exists (select id from >> hometowns where name = hometown_name) >> returning id into v_id; > > That has a race condition. The only safe way to do this (outside of > SSI) is using the example code at > http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition? Roxanne (sorry, I was out of town) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hometowns get selected and possibly inserted in unpredictable ways even from multiple concurrent sessions. The only way I could figure out how to solve it was to force each INSERT hometowns to be in its own transaction.
On Mon, Jan 19, 2015 at 1:56 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
I don't think an advisory lock would remove the deadlock.On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett <rox@tara-lu.com> wrote:On 1/16/2015 2:41 AM, Jim Nasby wrote:On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition?
Roxanne
(sorry, I was out of town)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I don't think an advisory lock would remove the deadlock.
On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett <rox@tara-lu.com> wrote:
On 1/16/2015 2:41 AM, Jim Nasby wrote:On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition?
Roxanne
(sorry, I was out of town)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 1/19/2015 4:58 PM, Robert DiFalco wrote:
Hometowns get selected and possibly inserted in unpredictable ways even from multiple concurrent sessions. The only way I could figure out how to solve it was to force each INSERT hometowns to be in its own transaction.
So you fixed it - good. In our situation we were never inserting more than one hometown** record in the same transaction - so I see now that my scenario was not the same as yours - but yours evolved to that. (**using "hometown" for our record type)
On Mon, Jan 19, 2015 at 1:56 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:I don't think an advisory lock would remove the deadlock.
For 2 or more hometown inserts per transaction - I agree - it won't fix it.
Glad to hear you have it fixed.
Roxanne
On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett <rox@tara-lu.com> wrote:On 1/16/2015 2:41 AM, Jim Nasby wrote:On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition?
Roxanne
(sorry, I was out of town)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-- [At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science. Donald Knuth
On 1/19/2015 4:58 PM, Robert DiFalco wrote:
Hometowns get selected and possibly inserted in unpredictable ways even from multiple concurrent sessions. The only way I could figure out how to solve it was to force each INSERT hometowns to be in its own transaction.
So you fixed it - good. In our situation we were never inserting more than one hometown** record in the same transaction - so I see now that my scenario was not the same as yours - but yours evolved to that. (**using "hometown" for our record type)
On Mon, Jan 19, 2015 at 1:56 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:I don't think an advisory lock would remove the deadlock.
For 2 or more hometown inserts per transaction - I agree - it won't fix it.
Glad to hear you have it fixed.
Roxanne
On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett <rox@tara-lu.com> wrote:On 1/16/2015 2:41 AM, Jim Nasby wrote:On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
try this: (if you still get deadlocks, uncomment the advisory lock [thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert "if". I almost always write these as insert first - because it's the more restrictive lock.
CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns where name = hometown_name)
returning id into v_id;
That has a race condition. The only safe way to do this (outside of SSI) is using the example code at http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition?
Roxanne
(sorry, I was out of town)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-- [At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science. Donald Knuth
On 1/16/2015 2:41 AM, Jim Nasby wrote: > On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote: >> >> try this: (if you still get deadlocks, uncomment the advisory lock >> [thanks Daniel] and try again) >> Logically I suppose it might run faster to do the select, then insert >> "if". I almost always write these as insert first - because it's the >> more restrictive lock. >> >> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) >> RETURNS >> INTEGER AS >> $BODY$ >> DECLARE >> v_id integer; >> BEGIN >> -- perform pg_advisory_xact_lock(hashtext(hometown_name)); >> BEGIN >> insert into hometowns (name) >> select hometown_name where not exists (select id from >> hometowns where name = hometown_name) >> returning id into v_id; > > That has a race condition. The only safe way to do this (outside of > SSI) is using the example code at > http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING And if the advisory lock is used? That presumably creates an exclusive lock on the asset "hometown_name". [in most examples given "Portland, OR".] Would not any other process that runs (this function) on the same asset have to wait for this specific transaction to commit or roll back - blocking the race condition? Roxanne (sorry, I was out of town) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general