Thread: JSONB Array of Strings (with GIN index) versus Split Rows (B-TreeIndex)

JSONB Array of Strings (with GIN index) versus Split Rows (B-TreeIndex)

From
Syed Jafri
Date:

I have a database which stores receiver to indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column.

 

|---------------------|------------------|---------------------|------------------|

|      Receiver       |       Event      |         Date        |      Location    |

|---------------------|------------------|---------------------|------------------|

|       Alpha         |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

|       Bravo         |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

|       Charlie       |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

 

While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this:

|-------------------------------|--------------|------------|-------------------|

|           Receivers           |     Event    |    Date    |     Location      |

|-------------------------------|--------------|------------|-------------------|

| ["Alpha", "Bravo", "Charlie"] |       3      |     12     |         USA       |

|-------------------------------|--------------|------------|-------------------|

 

More Information:

·         Receiver names are of the type (a-z, 1-5, .)

·         95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;

·         The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.

·          

Question:

1.      Does it make more sense to use Postgres Native Text Array?

2.      Would a jsonb_path_ops GIN index on receivers make sense here?

3.      Which option is more efficient? Which is faster?

 

Re: JSONB Array of Strings (with GIN index) versus Split Rows(B-Tree Index)

From
Ian Zimmerman
Date:
On 2019-02-04 05:34, Syed Jafri wrote:

> |---------------------|------------------|---------------------|------------------|
> |      Receiver       |       Event      |         Date        |      Location    |
> |---------------------|------------------|---------------------|------------------|
> |       Alpha         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Bravo         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Charlie       |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|

Please forgive a naive question, but doesn't this simply cry out for a
normalization step?

EventID   Date   Location
3         12     USA

Receiver  EventID
Alpha     3
Bravo     3
Charlie   3

If you want to use JSON by any means necessary, perhaps a simpler
key/value store is more appropriate than a relational DB like Postgres?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

From
"David G. Johnston"
Date:
On Sun, Feb 3, 2019 at 10:35 PM Syed Jafri <syed.jafri2@ucalgary.ca> wrote:
> ·         Receiver names are of the type (a-z, 1-5, .)
>
> ·         95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new
formatthis would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb; 
> 3.      Which option is more efficient? Which is faster?

I'd probably develop a performance test for the model and planned
queries and experiment with:

1. text arrays
2. jsonb arrays
3. partitioning

David J.