Re: Unique index on hash of jsonb value - correct solution? - Mailing list pgsql-general

From Laurenz Albe
Subject Re: Unique index on hash of jsonb value - correct solution?
Date
Msg-id 3dde4e83382ccd371216003f5d486ef49bc3d08c.camel@cybertec.at
Whole thread Raw
In response to Unique index on hash of jsonb value - correct solution?  (Albrecht Dreß <albrecht.dress@arcor.de>)
Responses Re: Unique index on hash of jsonb value - correct solution?
List pgsql-general
On Mon, 2020-05-18 at 18:43 +0200, Albrecht Dreß wrote:
> in a database I have a table with a text, a jsonb and a bytea column, which
> together shall be unique, like:
> 
> <snip>
>   Column  |  Type  | Collation | Nullable |               Default
> ---------+--------+-----------+----------+-------------------------------------
>   id      | bigint |           | not null | nextval('demotab_id_seq'::regclass)
>   textval | text   |           | not null |
>   jsonval | jsonb  |           | not null |
>   blobval | bytea  |           | not null |
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, jsonval, blobval)
> </snip>
> 
> This seems to work just fine in most cases, but adding a bigger json value (the text
> and the bytea columns are always short) results in a “index row size 2840 exceeds
> maximum 2712 for index "…"” error.  Following the hint in the error message,
> I replaced the index by
> 
> <snip>
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, md5(jsonval::text), blobval)
> </snip>
> 
> which seems to solve the issue.
> 
> My question: is this approach (a) correct and (b) still safe if the items in the
> jsonb (always a dict in my case) are re-ordered?  I tested a few cases, and trying
> to insert something like e.g. '{"a":1,"b":2}' and '{"b":2,"a":1}' actually does
> produce the same hash (i.e. the 2nd insert attempt is rejected due to the index),
> but is this guaranteed by design for every case?  Or is there a better solution
> for this use case?

"jsonb" uses an internal binary representation that reorders the attributes in
a deterministic fashin, so yes, that is guaranteed.

I would use an "md5" hash for the "bytea" column as well to keep the index smaller.

There can be collisions with an md5 hash, so it is possible for duplicates to
creep in.  Besides, if you have an auto-generated "bigint" column, you should
make that the primary key.  Perhaps then the uniqueness condition is no longer
necessary.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: template0 needing vacuum freeze?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Hot and PITR backups in same server