Thread: JSONB Array of Strings (with GIN index) versus Split Rows (B-TreeIndex)

JSONB Array of Strings (with GIN index) versus Split Rows (B-TreeIndex)

From

Syed Jafri

Date:

04 February 2019, 08:34:57

I have a database which stores receiver to indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column.

|---------------------|------------------|---------------------|------------------|

|---------------------|------------------|---------------------|------------------|

| Alpha | 3 | 12 | USA |

|---------------------|------------------|---------------------|------------------|

| Bravo | 3 | 12 | USA |

|---------------------|------------------|---------------------|------------------|

| Charlie | 3 | 12 | USA |

|---------------------|------------------|---------------------|------------------|

While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this:

|-------------------------------|--------------|------------|-------------------|

|-------------------------------|--------------|------------|-------------------|

| ["Alpha", "Bravo", "Charlie"] | 3 | 12 | USA |

|-------------------------------|--------------|------------|-------------------|

More Information:

· Receiver names are of the type (a-z, 1-5, .)

· 95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;

· The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.

Question:

1. Does it make more sense to use Postgres Native Text Array?

2. Would a jsonb_path_ops GIN index on receivers make sense here?

3. Which option is more efficient? Which is faster?

Re: JSONB Array of Strings (with GIN index) versus Split Rows(B-Tree Index)

From

Ian Zimmerman

Date:

05 February 2019, 00:44:25

On 2019-02-04 05:34, Syed Jafri wrote:

> |---------------------|------------------|---------------------|------------------|
> |      Receiver       |       Event      |         Date        |      Location    |
> |---------------------|------------------|---------------------|------------------|
> |       Alpha         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Bravo         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Charlie       |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|

Please forgive a naive question, but doesn't this simply cry out for a
normalization step?

EventID   Date   Location
3         12     USA

Receiver  EventID
Alpha     3
Bravo     3
Charlie   3

If you want to use JSON by any means necessary, perhaps a simpler
key/value store is more appropriate than a relational DB like Postgres?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.

Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

From

"David G. Johnston"

Date:

05 February 2019, 01:02:44

On Sun, Feb 3, 2019 at 10:35 PM Syed Jafri <syed.jafri2@ucalgary.ca> wrote:
> ·         Receiver names are of the type (a-z, 1-5, .)
>
> ·         95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new
formatthis would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb; 
> 3.      Which option is more efficient? Which is faster?

I'd probably develop a performance test for the model and planned
queries and experiment with:

1. text arrays
2. jsonb arrays
3. partitioning

David J.