Thread: what is the PostgreSQL idiom for "insert or update"?

what is the PostgreSQL idiom for "insert or update"?

From
Robert Poor
Date:
In my application, I receive large blocks of external data that needs
to be extracted / translated / loaded into the db, and many of these
data are duplicates of what's already there.

Consequently, I would like to do efficient "bulk loading" of tables
using multi-row INSERT commands, ignoring unique records that are
already present, where 'uniqueness' is defined by key constraints.

F'rinstance, assume:

CREATE TABLE "weather_observations" ("id" serial primary key,
"station_id" integer, "observation_time" timestamp, "temperature_c"
float)
CREATE UNIQUE INDEX "observation_index" ON "weather_observations"
("station_id", "observation_time")

Now I'd like to be able to do multi-row inserts, but ignoring
duplicate entries (specifically, those that would violate uniqueness
constraint of the index):

INSERT INTO weather (station_id, date, temperature) VALUES
  (2257, '2001-01-01', 22.5),
  (2257, '2001-01-02', 25.3);

INSERT INTO weather (station_id, date, temperature) VALUES
  (2257, '2001-01-02', 25.5),                 -- ignored: record already present
  (2257, '2001-01-03', 21.0);

What's the idiom for doing this in PostgreSQL?

[As an aside, in SQLite, you can modify an INSERT statement with "OR
IGNORE" to achieve this.]

Thanks!

Re: what is the PostgreSQL idiom for "insert or update"?

From
Kenneth Marshall
Date:
On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote:
> In my application, I receive large blocks of external data that needs
> to be extracted / translated / loaded into the db, and many of these
> data are duplicates of what's already there.
>
> Consequently, I would like to do efficient "bulk loading" of tables
> using multi-row INSERT commands, ignoring unique records that are
> already present, where 'uniqueness' is defined by key constraints.
>
> F'rinstance, assume:
>
> CREATE TABLE "weather_observations" ("id" serial primary key,
> "station_id" integer, "observation_time" timestamp, "temperature_c"
> float)
> CREATE UNIQUE INDEX "observation_index" ON "weather_observations"
> ("station_id", "observation_time")
>
> Now I'd like to be able to do multi-row inserts, but ignoring
> duplicate entries (specifically, those that would violate uniqueness
> constraint of the index):
>
> INSERT INTO weather (station_id, date, temperature) VALUES
>   (2257, '2001-01-01', 22.5),
>   (2257, '2001-01-02', 25.3);
>
> INSERT INTO weather (station_id, date, temperature) VALUES
>   (2257, '2001-01-02', 25.5),                 -- ignored: record already present
>   (2257, '2001-01-03', 21.0);
>
> What's the idiom for doing this in PostgreSQL?
>
> [As an aside, in SQLite, you can modify an INSERT statement with "OR
> IGNORE" to achieve this.]
>
> Thanks!
>

Here is the current documented method:

http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html

Regards,
Ken

Re: what is the PostgreSQL idiom for "insert or update"?

From
Richard Broersma
Date:
On Wed, Mar 16, 2011 at 7:32 AM, Robert Poor <rdpoor@gmail.com> wrote:


> INSERT INTO weather (station_id, date, temperature) VALUES
>  (2257, '2001-01-01', 22.5),
>  (2257, '2001-01-02', 25.3);
>
> INSERT INTO weather (station_id, date, temperature) VALUES
>  (2257, '2001-01-02', 25.5),                 -- ignored: record already present
>  (2257, '2001-01-03', 21.0);
>
> What's the idiom for doing this in PostgreSQL?

How about:

INSERT INTO weather (station_id, date, temperature )
  SELECT A.station_id, A.date, A.temperature
   FROM ( VALUES(2257, '2001-01-01', 22.5),
              (2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature)
LEFT JOIN weather AS B
    ON ( A.station_id, A.date ) = ( B.station_id, B.date )
WHERE B.station_id IS NULL;


--
Regards,
Richard Broersma Jr.

Re: what is the PostgreSQL idiom for "insert or update"?

From
Robert Poor
Date:
Ken:

On Wed, Mar 16, 2011 at 07:48, Kenneth Marshall <ktm@rice.edu> wrote:
> On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote:
>> ...I would like to do efficient "bulk loading" of tables
>> using multi-row INSERT commands, ignoring unique records that are
>> already present, where 'uniqueness' is defined by key constraints.
>
> Here is the current documented method:
>
> http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html

The construct at the bottom of the page looks promising.  I'm loading
about 500 records in a single transaction; I'm not clear on how to
recast the merge_db() method to handle multiple records.  Can you
elucidate?  Thanks...

Re: what is the PostgreSQL idiom for "insert or update"?

From
Robert Poor
Date:
Richard:

On Wed, Mar 16, 2011 at 08:45, Richard Broersma
<richard.broersma@gmail.com> wrote:
> How about:
>
> INSERT INTO weather (station_id, date, temperature )
>  SELECT A.station_id, A.date, A.temperature
>   FROM ( VALUES(2257, '2001-01-01', 22.5),
>              (2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature)
> LEFT JOIN weather AS B
>    ON ( A.station_id, A.date ) = ( B.station_id, B.date )
> WHERE B.station_id IS NULL;

That would work, though I was hoping there'd be some mechanism that
used the key to determine if the incoming record was unique or not
(rather than writing a custom query).

Are there limits to the size of the VALUES sub-query?  I'm processing
about 500 records at a time, and each record is fairly wide (about 350
characters without the field names).

Re: what is the PostgreSQL idiom for "insert or update"?

From
Richard Broersma
Date:
On Wed, Mar 16, 2011 at 9:31 AM, Robert Poor <rdpoor@gmail.com> wrote:

> Are there limits to the size of the VALUES sub-query?

I'm not too sure.  I wonder if this limitation *could* be related to
the maximum text field size which is < 1GB.




--
Regards,
Richard Broersma Jr.