Thread: what is the PostgreSQL idiom for "insert or update"?
In my application, I receive large blocks of external data that needs to be extracted / translated / loaded into the db, and many of these data are duplicates of what's already there. Consequently, I would like to do efficient "bulk loading" of tables using multi-row INSERT commands, ignoring unique records that are already present, where 'uniqueness' is defined by key constraints. F'rinstance, assume: CREATE TABLE "weather_observations" ("id" serial primary key, "station_id" integer, "observation_time" timestamp, "temperature_c" float) CREATE UNIQUE INDEX "observation_index" ON "weather_observations" ("station_id", "observation_time") Now I'd like to be able to do multi-row inserts, but ignoring duplicate entries (specifically, those that would violate uniqueness constraint of the index): INSERT INTO weather (station_id, date, temperature) VALUES (2257, '2001-01-01', 22.5), (2257, '2001-01-02', 25.3); INSERT INTO weather (station_id, date, temperature) VALUES (2257, '2001-01-02', 25.5), -- ignored: record already present (2257, '2001-01-03', 21.0); What's the idiom for doing this in PostgreSQL? [As an aside, in SQLite, you can modify an INSERT statement with "OR IGNORE" to achieve this.] Thanks!
On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote: > In my application, I receive large blocks of external data that needs > to be extracted / translated / loaded into the db, and many of these > data are duplicates of what's already there. > > Consequently, I would like to do efficient "bulk loading" of tables > using multi-row INSERT commands, ignoring unique records that are > already present, where 'uniqueness' is defined by key constraints. > > F'rinstance, assume: > > CREATE TABLE "weather_observations" ("id" serial primary key, > "station_id" integer, "observation_time" timestamp, "temperature_c" > float) > CREATE UNIQUE INDEX "observation_index" ON "weather_observations" > ("station_id", "observation_time") > > Now I'd like to be able to do multi-row inserts, but ignoring > duplicate entries (specifically, those that would violate uniqueness > constraint of the index): > > INSERT INTO weather (station_id, date, temperature) VALUES > (2257, '2001-01-01', 22.5), > (2257, '2001-01-02', 25.3); > > INSERT INTO weather (station_id, date, temperature) VALUES > (2257, '2001-01-02', 25.5), -- ignored: record already present > (2257, '2001-01-03', 21.0); > > What's the idiom for doing this in PostgreSQL? > > [As an aside, in SQLite, you can modify an INSERT statement with "OR > IGNORE" to achieve this.] > > Thanks! > Here is the current documented method: http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html Regards, Ken
On Wed, Mar 16, 2011 at 7:32 AM, Robert Poor <rdpoor@gmail.com> wrote: > INSERT INTO weather (station_id, date, temperature) VALUES > (2257, '2001-01-01', 22.5), > (2257, '2001-01-02', 25.3); > > INSERT INTO weather (station_id, date, temperature) VALUES > (2257, '2001-01-02', 25.5), -- ignored: record already present > (2257, '2001-01-03', 21.0); > > What's the idiom for doing this in PostgreSQL? How about: INSERT INTO weather (station_id, date, temperature ) SELECT A.station_id, A.date, A.temperature FROM ( VALUES(2257, '2001-01-01', 22.5), (2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature) LEFT JOIN weather AS B ON ( A.station_id, A.date ) = ( B.station_id, B.date ) WHERE B.station_id IS NULL; -- Regards, Richard Broersma Jr.
Ken: On Wed, Mar 16, 2011 at 07:48, Kenneth Marshall <ktm@rice.edu> wrote: > On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote: >> ...I would like to do efficient "bulk loading" of tables >> using multi-row INSERT commands, ignoring unique records that are >> already present, where 'uniqueness' is defined by key constraints. > > Here is the current documented method: > > http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html The construct at the bottom of the page looks promising. I'm loading about 500 records in a single transaction; I'm not clear on how to recast the merge_db() method to handle multiple records. Can you elucidate? Thanks...
Richard: On Wed, Mar 16, 2011 at 08:45, Richard Broersma <richard.broersma@gmail.com> wrote: > How about: > > INSERT INTO weather (station_id, date, temperature ) > SELECT A.station_id, A.date, A.temperature > FROM ( VALUES(2257, '2001-01-01', 22.5), > (2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature) > LEFT JOIN weather AS B > ON ( A.station_id, A.date ) = ( B.station_id, B.date ) > WHERE B.station_id IS NULL; That would work, though I was hoping there'd be some mechanism that used the key to determine if the incoming record was unique or not (rather than writing a custom query). Are there limits to the size of the VALUES sub-query? I'm processing about 500 records at a time, and each record is fairly wide (about 350 characters without the field names).
On Wed, Mar 16, 2011 at 9:31 AM, Robert Poor <rdpoor@gmail.com> wrote: > Are there limits to the size of the VALUES sub-query? I'm not too sure. I wonder if this limitation *could* be related to the maximum text field size which is < 1GB. -- Regards, Richard Broersma Jr.