Thread: Could be improved point of UPSERT

Could be improved point of UPSERT

From
Yourfriend
Date:
Hi, Hackers,

The feature of UPSERT  was my most interested one of 9.5, I really like need it.

I have test the different usages for this features like one record input, multi records input,
and also more than 10,000 records upserting, all look great, thanks for your work.

When I checked my records from these tests, I found that there was one result that might be 
improved, the issue is,  although there is no new records added to the table when conflict happens,
but the sequence for this table had been touched, so when a new record is actually  added after that,
the sequence will skip the numbers when it was touched, then we get a not reasonable result (from my opinion). The scenario is as following:

1, System:  PostgreSQL 9.5 Alpha + Win7 X64
2, Enter pgadmin:
3, create table foobar (
        sysid serial,
        theprovince varchar(20),
        themonth  varchar(7),
        therevenue integer default 0,
        primary key (sysid),
        unique (theprovince,themonth)
)

4, insert into foobar values 
      ('BeiJing','2015-01',1234),
      ('ShangHai','2015-01',1927) 
5, select * from foobar ; 
    sysid   theprovince  themonth   therevenue
       1          Beijing       2015-01     1234
        2         ShangHai   2015-01     1927

6, insert into foobar values 
      ('BeiJing','2015-01',1999),
      ('ShangHai','2015-01',1988) 
     on conflict (theprovince,themonth) do update set therevenue=excluded.therevenue; 

7, select * from foobar ; 
    sysid   theprovince  themonth   therevenue
       1          Beijing       2015-01     1999
       2         ShangHai   2015-01     1988
8, insert into foobar values 
      ('TianJing','2015-01',1888)

9, select * from foobar ; 
    sysid   theprovince  themonth   therevenue
       1          Beijing       2015-01     1999
       2         ShangHai   2015-01     1988
       5          TiangJing   2015-01     1888
---------------------------------------------------------------------------
Not reasonable result or issue:   
   The third record of "TianJing" in the table gets the SYSID of 5, as the sequence was accessed
twice by the step of 6. 

Suggestion:  When a conflict was found for UPSERT, don't access the sequence, so users can have a reasonable list of ID.


Regards,

Daojing Zhou.

Re: Could be improved point of UPSERT

From
Peter Geoghegan
Date:
On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend <doudou586@gmail.com> wrote:
> Suggestion:  When a conflict was found for UPSERT, don't access the
> sequence, so users can have a reasonable list of ID.

This is not technically feasible. What if the arbiter index is a serial PK?

The same thing can happen when a transaction is aborted. SERIAL is not
guaranteed to be gapless.
-- 
Peter Geoghegan



Re: Could be improved point of UPSERT

From
Gianni
Date:
On Tuesday 14 July 2015 11:33:34 Peter Geoghegan wrote:
> On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend <doudou586@gmail.com> wrote:
> > Suggestion:  When a conflict was found for UPSERT, don't access the
> > sequence, so users can have a reasonable list of ID.
> 
> This is not technically feasible. What if the arbiter index is a serial PK?
> 
> The same thing can happen when a transaction is aborted. SERIAL is not
> guaranteed to be gapless.

Could there be a version of UPSERT where an update is tried, and if 0 records 
are modified, an insert is done?

Just wondering, I haven't got am use-case for that.  I don't mid gaps in 
sequences.



Re: Could be improved point of UPSERT

From
Peter Geoghegan
Date:
On Tue, Jul 14, 2015 at 11:40 AM, Gianni <nasus.maximos@gmail.com> wrote:
> Could there be a version of UPSERT where an update is tried, and if 0 records
> are modified, an insert is done?
>
> Just wondering, I haven't got am use-case for that.  I don't mid gaps in
> sequences.

Perhaps, if you don't mind having a severely restricted set of
qualifications in the UPDATE, which the existing command effectively
has anyway. That would be a very odd thing.

-- 
Peter Geoghegan



Re: Could be improved point of UPSERT

From
Yourfriend
Date:
In my example, I just give each record a different ID to access it efficiently.

In our business cases, some times, we also use some prefix letter like 'SO', "PO' combining with the current year, month and then a sequence  
to make a invoice ID,

for example, SO201507_1001, PO201503_1280, etc.

As these IDs would be the most important attribute to the business, so, we hope there is no gap for the IDs.

Regards,

Daojing Zhou.



On Wed, Jul 15, 2015 at 2:33 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend <doudou586@gmail.com> wrote:
> Suggestion:  When a conflict was found for UPSERT, don't access the
> sequence, so users can have a reasonable list of ID.

This is not technically feasible. What if the arbiter index is a serial PK?

The same thing can happen when a transaction is aborted. SERIAL is not
guaranteed to be gapless.
--
Peter Geoghegan

Re: Could be improved point of UPSERT

From
Peter Geoghegan
Date:
On Wed, Jul 15, 2015 at 12:01 AM, Yourfriend <doudou586@gmail.com> wrote:
> for example, SO201507_1001, PO201503_1280, etc.
>
> As these IDs would be the most important attribute to the business, so, we
> hope there is no gap for the IDs.

That's a requirement I've heard a number of times before. If you're
relying on a sequence for this purpose, your application is already
broken [1]. UPSERT need not be involved at all.

[1] http://www.varlena.com/GeneralBits/130.php
-- 
Peter Geoghegan



Re: Could be improved point of UPSERT

From
Yourfriend
Date:

For the most cases I mentioned, we don't request a strict gapless sequence for the Invoice ID, the essential requirement is unique.
We just hope that there is no obviously gap in most situations. 
From the test of UPSERT, there are quite a few chances to generate a big gap when UPSERT multi records. 
However, the result of UPSERT is acceptable, and I do love this function. So, it's a suggestion only.

Anyway, thanks a lot for the detail explanation. 

Regards,

Daojing Zhou.




On Wed, Jul 15, 2015 at 3:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Jul 15, 2015 at 12:01 AM, Yourfriend <doudou586@gmail.com> wrote:
> for example, SO201507_1001, PO201503_1280, etc.
>
> As these IDs would be the most important attribute to the business, so, we
> hope there is no gap for the IDs.

That's a requirement I've heard a number of times before. If you're
relying on a sequence for this purpose, your application is already
broken [1]. UPSERT need not be involved at all.

[1] http://www.varlena.com/GeneralBits/130.php
--
Peter Geoghegan