Thread: upsert doesn't seem to work..

upsert doesn't seem to work..

From
Bert
Date:
Hello,

I hope someone her can help me.

We continuously load data from flat files in our database.
We first insert the data into unlogged tables (in the loadoltp schema), and then we use the 'upsert' statement to transfer the data from the load table into the tables we are going to use.

The load tables are unlogged, and don't have indexes / pk's on them. All our 'real tables', which contains the data, always have a pk consisting out of 2 fields. In the example those are 'tick_server_id' and 'item_id'.

At first everything seems to run ok, however it seems that new fields aren't always inserted as desired.


This is an example query which causes troubles:

WITH UPSERT AS
  (UPDATE oltp.ST_ITEM ET
   SET (tick_server_id,
        item_id,
        item_desc,
        item_code,
        item_date,
        item_starttime,
        item_endtime,
        item_startsaledate,
        item_endsaledate,
        replev_id,
        evt_id,
        tkl_id,
        plan_id,
        itemtyp_id,
        item_accountcode,
        itemstat_id,
        item_seattotal_count,
        item_seatsold_count,
        tl_id,
        item_hotsales,
        item_showplan,
        item_sms_code,
        datetyp_id,
        item_start,
        item_end,
        item_validfrom,
        item_validuntil,
        item_count_sale,
        bartyp_id,
        item_scanning,
        isabo,
        etl_run_id) = (E.tick_server_id,
                       E.item_id,
                       E.item_desc,
                       E.item_code,
                       E.item_date,
                       E.item_starttime,
                       E.item_endtime,
                       E.item_startsaledate,
                       E.item_endsaledate,
                       E.replev_id,
                       E.evt_id,
                       E.tkl_id,
                       E.plan_id,
                       E.itemtyp_id,
                       E.item_accountcode,
                       E.itemstat_id,
                       E.item_seattotal_count,
                       E.item_seatsold_count,
                       E.tl_id,
                       E.item_hotsales,
                       E.item_showplan,
                       E.item_sms_code,
                       E.datetyp_id,
                       E.item_start,
                       E.item_end,
                       E.item_validfrom,
                       E.item_validuntil,
                       E.item_count_sale,
                       E.bartyp_id,
                       E.item_scanning,
                       E.isabo,
                       E.etl_run_id)
   FROM
     (SELECT *
      FROM loadoltp.ST_ITEM_INSERT
      WHERE LOADTABLETIME = '2013-02-12 10:23:51.110877') AS E
   WHERE et.tick_server_id = e.tick_server_id
     AND et.item_id = e.item_id returning ET.*)
INSERT INTO oltp.ST_ITEM
SELECT tick_server_id,
       item_id,
       item_desc,
       item_code,
       item_date,
       item_starttime,
       item_endtime,
       item_startsaledate,
       item_endsaledate,
       replev_id,
       evt_id,
       tkl_id,
       plan_id,
       itemtyp_id,
       item_accountcode,
       itemstat_id,
       item_seattotal_count,
       item_seatsold_count,
       tl_id,
       item_hotsales,
       item_showplan,
       item_sms_code,
       datetyp_id,
       item_start,
       item_end,
       item_validfrom,
       item_validuntil,
       item_count_sale,
       bartyp_id,
       item_scanning,
       isabo,
       etl_run_id
FROM
  (SELECT *
   FROM loadoltp.ST_ITEM_INSERT
   WHERE LOADTABLETIME = '2013-02-12 10:23:51.110877') AS ET
WHERE ET.tick_server_id NOT IN
    (SELECT ET.tick_server_id
     FROM upsert b)
  AND ET.item_id NOT IN
    (SELECT ET.item_id
     FROM upsert b)


this is the query plan:
"Insert on oltp.st_item  (cost=776.69..1123.53 rows=93 width=419)"
"  CTE upsert"
"    ->  Update on oltp.st_item et  (cost=23.26..776.69 rows=39 width=431)"
"          Output: et.tick_server_id, et.item_id, et.item_desc, et.item_code, et.item_date, et.item_starttime, et.item_endtime, et.item_startsaledate, et.item_endsaledate, et.replev_id, et.evt_id, et.tkl_id, et.plan_id, et.itemtyp_id, et.item_accountcode, et.itemstat_id, et.item_seattotal_count, et.item_seatsold_count, et.tl_id, et.item_hotsales, et.item_showplan, et.item_sms_code, et.datetyp_id, et.item_start, et.item_end, et.item_validfrom, et.item_validuntil, et.item_count_sale, et.bartyp_id, et.item_scanning, et.isabo, et.etl_run_id"
"          ->  Hash Join  (cost=23.26..776.69 rows=39 width=431)"
"                Output: loadoltp.st_item_insert.tick_server_id, loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc, loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date, loadoltp.st_item_insert.item_starttime, loadoltp.st_item_insert.item_endtime, loadoltp.st_item_insert.item_startsaledate, loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id, loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id, loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id, loadoltp.st_item_insert.item_accountcode, loadoltp.st_item_insert.itemstat_id, loadoltp.st_item_insert.item_seattotal_count, loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id, loadoltp.st_item_insert.item_hotsales, loadoltp.st_item_insert.item_showplan, loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id, loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end, loadoltp.st_item_insert.item_validfrom, loadoltp.st_item_insert.item_validuntil, loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id, loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo, loadoltp.st_item_insert.etl_run_id, et.ctid, loadoltp.st_item_insert.ctid"
"                Hash Cond: ((et.tick_server_id = loadoltp.st_item_insert.tick_server_id) AND (et.item_id = loadoltp.st_item_insert.item_id))"
"                ->  Seq Scan on oltp.st_item et  (cost=0.00..670.74 rows=10974 width=14)"
"                      Output: et.ctid, et.tick_server_id, et.item_id"
"                ->  Hash  (cost=17.66..17.66 rows=373 width=425)"
"                      Output: loadoltp.st_item_insert.tick_server_id, loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc, loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date, loadoltp.st_item_insert.item_starttime, loadoltp.st_item_insert.item_endtime, loadoltp.st_item_insert.item_startsaledate, loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id, loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id, loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id, loadoltp.st_item_insert.item_accountcode, loadoltp.st_item_insert.itemstat_id, loadoltp.st_item_insert.item_seattotal_count, loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id, loadoltp.st_item_insert.item_hotsales, loadoltp.st_item_insert.item_showplan, loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id, loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end, loadoltp.st_item_insert.item_validfrom, loadoltp.st_item_insert.item_validuntil, loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id, loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo, loadoltp.st_item_insert.etl_run_id, loadoltp.st_item_insert.ctid"
"                      ->  Seq Scan on loadoltp.st_item_insert  (cost=0.00..17.66 rows=373 width=425)"
"                            Output: loadoltp.st_item_insert.tick_server_id, loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc, loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date, loadoltp.st_item_insert.item_starttime, loadoltp.st_item_insert.item_endtime, loadoltp.st_item_insert.item_startsaledate, loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id, loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id, loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id, loadoltp.st_item_insert.item_accountcode, loadoltp.st_item_insert.itemstat_id, loadoltp.st_item_insert.item_seattotal_count, loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id, loadoltp.st_item_insert.item_hotsales, loadoltp.st_item_insert.item_showplan, loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id, loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end, loadoltp.st_item_insert.item_validfrom, loadoltp.st_item_insert.item_validuntil, loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id, loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo, loadoltp.st_item_insert.etl_run_id, loadoltp.st_item_insert.ctid"
"                            Filter: (loadoltp.st_item_insert.loadtabletime = '2013-02-12 10:23:51.110877'::timestamp without time zone)"
"  ->  Seq Scan on loadoltp.st_item_insert  (cost=0.00..346.83 rows=93 width=419)"
"        Output: loadoltp.st_item_insert.tick_server_id, loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc, loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date, loadoltp.st_item_insert.item_starttime, loadoltp.st_item_insert.item_endtime, loadoltp.st_item_insert.item_startsaledate, loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id, loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id, loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id, loadoltp.st_item_insert.item_accountcode, loadoltp.st_item_insert.itemstat_id, loadoltp.st_item_insert.item_seattotal_count, loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id, loadoltp.st_item_insert.item_hotsales, loadoltp.st_item_insert.item_showplan, loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id, loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end, loadoltp.st_item_insert.item_validfrom, loadoltp.st_item_insert.item_validuntil, loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id, loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo, loadoltp.st_item_insert.etl_run_id"
"        Filter: ((loadoltp.st_item_insert.loadtabletime = '2013-02-12 10:23:51.110877'::timestamp without time zone) AND (NOT (SubPlan 2)) AND (NOT (SubPlan 3)))"
"        SubPlan 2"
"          ->  CTE Scan on upsert b  (cost=0.00..0.78 rows=39 width=0)"
"                Output: loadoltp.st_item_insert.tick_server_id"
"        SubPlan 3"
"          ->  CTE Scan on upsert b  (cost=0.00..0.78 rows=39 width=0)"
"                Output: loadoltp.st_item_insert.item_id"

Can anyone see what I'm doing wrong?

wkr,
Bert

Re: upsert doesn't seem to work..

From
Sergey Konoplev
Date:
On Tue, Feb 12, 2013 at 1:38 AM, Bert <biertie@gmail.com> wrote:
> At first everything seems to run ok, however it seems that new fields aren't
> always inserted as desired.

Could you please explain how are you expecting the new fields to be
inserted and what exactly made you think they were inserted wrong?

>
>
> This is an example query which causes troubles:
>
> WITH UPSERT AS
>   (UPDATE oltp.ST_ITEM ET
>    SET (tick_server_id,
>         item_id,
>         item_desc,
>         item_code,
>         item_date,
>         item_starttime,
>         item_endtime,
>         item_startsaledate,
>         item_endsaledate,
>         replev_id,
>         evt_id,
>         tkl_id,
>         plan_id,
>         itemtyp_id,
>         item_accountcode,
>         itemstat_id,
>         item_seattotal_count,
>         item_seatsold_count,
>         tl_id,
>         item_hotsales,
>         item_showplan,
>         item_sms_code,
>         datetyp_id,
>         item_start,
>         item_end,
>         item_validfrom,
>         item_validuntil,
>         item_count_sale,
>         bartyp_id,
>         item_scanning,
>         isabo,
>         etl_run_id) = (E.tick_server_id,
>                        E.item_id,
>                        E.item_desc,
>                        E.item_code,
>                        E.item_date,
>                        E.item_starttime,
>                        E.item_endtime,
>                        E.item_startsaledate,
>                        E.item_endsaledate,
>                        E.replev_id,
>                        E.evt_id,
>                        E.tkl_id,
>                        E.plan_id,
>                        E.itemtyp_id,
>                        E.item_accountcode,
>                        E.itemstat_id,
>                        E.item_seattotal_count,
>                        E.item_seatsold_count,
>                        E.tl_id,
>                        E.item_hotsales,
>                        E.item_showplan,
>                        E.item_sms_code,
>                        E.datetyp_id,
>                        E.item_start,
>                        E.item_end,
>                        E.item_validfrom,
>                        E.item_validuntil,
>                        E.item_count_sale,
>                        E.bartyp_id,
>                        E.item_scanning,
>                        E.isabo,
>                        E.etl_run_id)
>    FROM
>      (SELECT *
>       FROM loadoltp.ST_ITEM_INSERT
>       WHERE LOADTABLETIME = '2013-02-12 10:23:51.110877') AS E
>    WHERE et.tick_server_id = e.tick_server_id
>      AND et.item_id = e.item_id returning ET.*)
> INSERT INTO oltp.ST_ITEM
> SELECT tick_server_id,
>        item_id,
>        item_desc,
>        item_code,
>        item_date,
>        item_starttime,
>        item_endtime,
>        item_startsaledate,
>        item_endsaledate,
>        replev_id,
>        evt_id,
>        tkl_id,
>        plan_id,
>        itemtyp_id,
>        item_accountcode,
>        itemstat_id,
>        item_seattotal_count,
>        item_seatsold_count,
>        tl_id,
>        item_hotsales,
>        item_showplan,
>        item_sms_code,
>        datetyp_id,
>        item_start,
>        item_end,
>        item_validfrom,
>        item_validuntil,
>        item_count_sale,
>        bartyp_id,
>        item_scanning,
>        isabo,
>        etl_run_id
> FROM
>   (SELECT *
>    FROM loadoltp.ST_ITEM_INSERT
>    WHERE LOADTABLETIME = '2013-02-12 10:23:51.110877') AS ET
> WHERE ET.tick_server_id NOT IN
>     (SELECT ET.tick_server_id
>      FROM upsert b)
>   AND ET.item_id NOT IN
>     (SELECT ET.item_id
>      FROM upsert b)
>
>
> this is the query plan:
> "Insert on oltp.st_item  (cost=776.69..1123.53 rows=93 width=419)"
> "  CTE upsert"
> "    ->  Update on oltp.st_item et  (cost=23.26..776.69 rows=39 width=431)"
> "          Output: et.tick_server_id, et.item_id, et.item_desc,
> et.item_code, et.item_date, et.item_starttime, et.item_endtime,
> et.item_startsaledate, et.item_endsaledate, et.replev_id, et.evt_id,
> et.tkl_id, et.plan_id, et.itemtyp_id, et.item_accountcode, et.itemstat_id,
> et.item_seattotal_count, et.item_seatsold_count, et.tl_id, et.item_hotsales,
> et.item_showplan, et.item_sms_code, et.datetyp_id, et.item_start,
> et.item_end, et.item_validfrom, et.item_validuntil, et.item_count_sale,
> et.bartyp_id, et.item_scanning, et.isabo, et.etl_run_id"
> "          ->  Hash Join  (cost=23.26..776.69 rows=39 width=431)"
> "                Output: loadoltp.st_item_insert.tick_server_id,
> loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc,
> loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date,
> loadoltp.st_item_insert.item_starttime,
> loadoltp.st_item_insert.item_endtime,
> loadoltp.st_item_insert.item_startsaledate,
> loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id,
> loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id,
> loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id,
> loadoltp.st_item_insert.item_accountcode,
> loadoltp.st_item_insert.itemstat_id,
> loadoltp.st_item_insert.item_seattotal_count,
> loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id,
> loadoltp.st_item_insert.item_hotsales,
> loadoltp.st_item_insert.item_showplan,
> loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id,
> loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end,
> loadoltp.st_item_insert.item_validfrom,
> loadoltp.st_item_insert.item_validuntil,
> loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id,
> loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo,
> loadoltp.st_item_insert.etl_run_id, et.ctid, loadoltp.st_item_insert.ctid"
> "                Hash Cond: ((et.tick_server_id =
> loadoltp.st_item_insert.tick_server_id) AND (et.item_id =
> loadoltp.st_item_insert.item_id))"
> "                ->  Seq Scan on oltp.st_item et  (cost=0.00..670.74
> rows=10974 width=14)"
> "                      Output: et.ctid, et.tick_server_id, et.item_id"
> "                ->  Hash  (cost=17.66..17.66 rows=373 width=425)"
> "                      Output: loadoltp.st_item_insert.tick_server_id,
> loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc,
> loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date,
> loadoltp.st_item_insert.item_starttime,
> loadoltp.st_item_insert.item_endtime,
> loadoltp.st_item_insert.item_startsaledate,
> loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id,
> loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id,
> loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id,
> loadoltp.st_item_insert.item_accountcode,
> loadoltp.st_item_insert.itemstat_id,
> loadoltp.st_item_insert.item_seattotal_count,
> loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id,
> loadoltp.st_item_insert.item_hotsales,
> loadoltp.st_item_insert.item_showplan,
> loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id,
> loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end,
> loadoltp.st_item_insert.item_validfrom,
> loadoltp.st_item_insert.item_validuntil,
> loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id,
> loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo,
> loadoltp.st_item_insert.etl_run_id, loadoltp.st_item_insert.ctid"
> "                      ->  Seq Scan on loadoltp.st_item_insert
> (cost=0.00..17.66 rows=373 width=425)"
> "                            Output: loadoltp.st_item_insert.tick_server_id,
> loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc,
> loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date,
> loadoltp.st_item_insert.item_starttime,
> loadoltp.st_item_insert.item_endtime,
> loadoltp.st_item_insert.item_startsaledate,
> loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id,
> loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id,
> loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id,
> loadoltp.st_item_insert.item_accountcode,
> loadoltp.st_item_insert.itemstat_id,
> loadoltp.st_item_insert.item_seattotal_count,
> loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id,
> loadoltp.st_item_insert.item_hotsales,
> loadoltp.st_item_insert.item_showplan,
> loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id,
> loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end,
> loadoltp.st_item_insert.item_validfrom,
> loadoltp.st_item_insert.item_validuntil,
> loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id,
> loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo,
> loadoltp.st_item_insert.etl_run_id, loadoltp.st_item_insert.ctid"
> "                            Filter: (loadoltp.st_item_insert.loadtabletime
> = '2013-02-12 10:23:51.110877'::timestamp without time zone)"
> "  ->  Seq Scan on loadoltp.st_item_insert  (cost=0.00..346.83 rows=93
> width=419)"
> "        Output: loadoltp.st_item_insert.tick_server_id,
> loadoltp.st_item_insert.item_id, loadoltp.st_item_insert.item_desc,
> loadoltp.st_item_insert.item_code, loadoltp.st_item_insert.item_date,
> loadoltp.st_item_insert.item_starttime,
> loadoltp.st_item_insert.item_endtime,
> loadoltp.st_item_insert.item_startsaledate,
> loadoltp.st_item_insert.item_endsaledate, loadoltp.st_item_insert.replev_id,
> loadoltp.st_item_insert.evt_id, loadoltp.st_item_insert.tkl_id,
> loadoltp.st_item_insert.plan_id, loadoltp.st_item_insert.itemtyp_id,
> loadoltp.st_item_insert.item_accountcode,
> loadoltp.st_item_insert.itemstat_id,
> loadoltp.st_item_insert.item_seattotal_count,
> loadoltp.st_item_insert.item_seatsold_count, loadoltp.st_item_insert.tl_id,
> loadoltp.st_item_insert.item_hotsales,
> loadoltp.st_item_insert.item_showplan,
> loadoltp.st_item_insert.item_sms_code, loadoltp.st_item_insert.datetyp_id,
> loadoltp.st_item_insert.item_start, loadoltp.st_item_insert.item_end,
> loadoltp.st_item_insert.item_validfrom,
> loadoltp.st_item_insert.item_validuntil,
> loadoltp.st_item_insert.item_count_sale, loadoltp.st_item_insert.bartyp_id,
> loadoltp.st_item_insert.item_scanning, loadoltp.st_item_insert.isabo,
> loadoltp.st_item_insert.etl_run_id"
> "        Filter: ((loadoltp.st_item_insert.loadtabletime = '2013-02-12
> 10:23:51.110877'::timestamp without time zone) AND (NOT (SubPlan 2)) AND
> (NOT (SubPlan 3)))"
> "        SubPlan 2"
> "          ->  CTE Scan on upsert b  (cost=0.00..0.78 rows=39 width=0)"
> "                Output: loadoltp.st_item_insert.tick_server_id"
> "        SubPlan 3"
> "          ->  CTE Scan on upsert b  (cost=0.00..0.78 rows=39 width=0)"
> "                Output: loadoltp.st_item_insert.item_id"
>
> Can anyone see what I'm doing wrong?
>
> wkr,
> Bert



-- 
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com



Re: upsert doesn't seem to work..

From
Ben Morrow
Date:
Quoth biertie@gmail.com (Bert):
> 
> We continuously load data from flat files in our database.
> We first insert the data into unlogged tables (in the loadoltp schema), and
> then we use the 'upsert' statement to transfer the data from the load table
> into the tables we are going to use.
> 
> The load tables are unlogged, and don't have indexes / pk's on them. All
> our 'real tables', which contains the data, always have a pk consisting out
> of 2 fields. In the example those are 'tick_server_id' and 'item_id'.
> 
> At first everything seems to run ok, however it seems that new fields
> aren't always inserted as desired.
> 
> 
> This is an example query which causes troubles:

That query is basically equivalent to something like
   create table "st_item" (       server_id   integer,       item_id     integer,       item_desc   text,       primary
key(server_id, item_id)   );   create table "st_item_insert" (       server_id   integer,       item_id     integer,
  item_desc   text   );
 
   with "upsert" as (       update "st_item" et       set "item_desc" = e.item_desc       from "st_item_insert" e
whereet.server_id = e.server_id           and et.item_id = e.item_id       returning et.server_id, et.item_id   )
insertinto "st_item"        ("server_id", "item_id", "item_desc")   select et.server_id, et.item_id, et.item_desc
from"st_item_insert" et   where et.server_id not in (           select et.server_id           from "upsert" b)
andet.item_id not in (           select et.item_id           from "upsert" b)
 

There are three problems here. The first is that the NOT IN subselect
selects from et instead of from b. In the context of this subselect "et"
is a table reference from outside the subselect, so it's treated as a
constant for each run of the subselect. That means that the subselect
will return the value you are testing against for every row in "upsert",
so if there were any updates at all you will make no insertions.

The second is that you are making two separate subselects. This means
that a row in st_item_insert will not be inserted if there is a row in
"upsert" with a matching server_id and a row in "upsert" with a matching
item_id, *even if they are different rows*. For instance, suppose
st_item_insert has
   2   1   foo   1   2   bar   2   2   baz

and the 'foo' and 'bar' entries get updated. The 'baz' entry will then
not get inserted, because the first subselect will find the 'foo' row
and the second will find the 'bar' row. What you need is a single row
subselect, like this:
   where (et.server_id, et.item_id) not in (       select "server_id", "item_id"       from "upsert")

The third is that upsert is not as simple as you think. It isn't
possible (at least, not in Postgres) to take a lock on a row which
doesn't exist, so it's possible that a concurrent transaction could
insert a row with a conflicting key between the time the UPDATE runs and
the time the INSERT runs. You need to either lock the whole table or use
the retry strategy documented in the 'Trapping Errors' section of the
PL/pgSQL documentation. Annoyingly, even 9's serializable transactions
don't seem to help here, at least not by my experiments.

Ben