Re: INSERT ... ON CONFLICT DO UPDATE - Mailing list pgsql-general

From Daniel Verite
Subject Re: INSERT ... ON CONFLICT DO UPDATE
Date
Msg-id 928925e4-d55b-4144-b994-5004e4b5a0e3@mm
Whole thread Raw
In response to INSERT ... ON CONFLICT DO UPDATE  (Rafal Pietrak <rafal@ztk-rp.eu>)
Responses Re: INSERT ... ON CONFLICT DO UPDATE  (Rafal Pietrak <rafal@ztk-rp.eu>)
List pgsql-general
    Rafal Pietrak wrote:

> CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
> (random()*1000000000)::bigint, issued date default now(), .....);

Generators of truly unique pseudo-random values provide a
better ground for this. Consider for example:

https://wiki.postgresql.org/wiki/Pseudo_encrypt

You may replace the round function with your own secret function,
so you'll get the required randomness, secrecy and uniqueness.
No need to deal with collisions on insertion as there are none.

> 2. with current (as of 9.5) implementation I think I can always "ON
> CONFLICT DO NOTHING", and retry the INSERT from application level.

Yes, but retrying is now easy, let's not underappreciate that.

As a test, with 9.5alpha1, I create a table with 100k unique
random numbers:

  CREATE TABLE vouchers(id int primary key);

Then try to populate it immediately with 100k rows:

INSERT INTO vouchers
   select (random()*1000000000)::int from
   generate_series(1,100000)
 ON CONFLICT DO NOTHING;

psql result: INSERT 0 99995

Note how 5 values conflicted right from the beginning,
even though we're claiming only 10^5 out of the 10^9 output
range (or 0.01%).
The probability of at least one collision is pretty high,
see the "birthday paradox" for the theory on that.

Anyway the collisions got eliminated without any effort
from me and that's quite useful already.

Now trying to insert 10k rows at a time:

  INSERT INTO vouchers
      SELECT (random()*1000000000)::int
      FROM generate_series(1,10000)
      ON CONFLICT DO NOTHING
      RETURNING id;

when run repeatedly, it tends to return between 9995 and 10000 values.
If we want exactly N rows and we get back N-epsilon, then we need to
re-ask for epsilon rows, but this will converge fast  to completion.
(that is, until you have enough values that the birthday paradox effect
really kicks in).

My point is that we can now achieve that without any exception handling
or transaction retry, and no plpgsql function to create, so it's really
a significant improvement in ease of use. And presumably in
performance too.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail


pgsql-general by date:

Previous
From: Amitabh Kant
Date:
Subject: Upgrade postgres cluster on FreeBSD using pg_upgrade
Next
From: "Daniel Verite"
Date:
Subject: Re: INSERT ... ON CONFLICT DO UPDATE