Home > mailing lists

Re: Unique index on hash of jsonb value - correct solution? - Mailing list pgsql-general

From	Laurenz Albe
Subject	Re: Unique index on hash of jsonb value - correct solution?
Date	May 19, 2020 06:05:00
Msg-id	3dde4e83382ccd371216003f5d486ef49bc3d08c.camel@cybertec.at Whole thread Raw
In response to	Unique index on hash of jsonb value - correct solution? (Albrecht Dreß <albrecht.dress@arcor.de>)
Responses	Re: Unique index on hash of jsonb value - correct solution?
List	pgsql-general

Tree view

On Mon, 2020-05-18 at 18:43 +0200, Albrecht Dreß wrote:
> in a database I have a table with a text, a jsonb and a bytea column, which
> together shall be unique, like:
> 
> <snip>
>   Column  |  Type  | Collation | Nullable |               Default
> ---------+--------+-----------+----------+-------------------------------------
>   id      | bigint |           | not null | nextval('demotab_id_seq'::regclass)
>   textval | text   |           | not null |
>   jsonval | jsonb  |           | not null |
>   blobval | bytea  |           | not null |
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, jsonval, blobval)
> </snip>
> 
> This seems to work just fine in most cases, but adding a bigger json value (the text
> and the bytea columns are always short) results in a “index row size 2840 exceeds
> maximum 2712 for index "…"” error.  Following the hint in the error message,
> I replaced the index by
> 
> <snip>
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, md5(jsonval::text), blobval)
> </snip>
> 
> which seems to solve the issue.
> 
> My question: is this approach (a) correct and (b) still safe if the items in the
> jsonb (always a dict in my case) are re-ordered?  I tested a few cases, and trying
> to insert something like e.g. '{"a":1,"b":2}' and '{"b":2,"a":1}' actually does
> produce the same hash (i.e. the 2nd insert attempt is rejected due to the index),
> but is this guaranteed by design for every case?  Or is there a better solution
> for this use case?

"jsonb" uses an internal binary representation that reorders the attributes in
a deterministic fashin, so yes, that is guaranteed.

I would use an "md5" hash for the "bytea" column as well to keep the index smaller.

There can be collisions with an md5 hash, so it is possible for duplicates to
creep in.  Besides, if you have an auto-generated "bigint" column, you should
make that the primary key.  Perhaps then the uniqueness condition is no longer
necessary.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

pgsql-general by date:

From: Adrian Klaver
Date: 19 May 2020, 02:20:27
Subject: Re: template0 needing vacuum freeze?

From: Kyotaro Horiguchi
Date: 19 May 2020, 07:14:43
Subject: Re: Hot and PITR backups in same server

Re: Unique index on hash of jsonb value - correct solution? - Mailing list pgsql-general

Previous

Next