Using postgresql and JSONB on very large datasets - Mailing list pgsql-general

From Sankar P
Subject Using postgresql and JSONB on very large datasets
Date
Msg-id CAMSEaH6oxC2xXVjJ5-bnRB4E-FNQq4rgU6gBapWDjXSZDSrc+g@mail.gmail.com
Whole thread Raw
List pgsql-general
Hi

I have a table t, with just two columns, an `id SERIAL` and a `record
JSONB`. I have about 3 million records where I save some JSON data in
each record.

Each `record` column jsonb has about 40 json key-values, of which
about 10 are nested fields.

Now, if I run a query, like:

```
select distinct record -> 'destinationServiceName' from t;
```

it takes about 1 minute or so to return the list of unique values for
this column, to search across the 3 million records. The number of
records in my setup may go up to a few tens of millions.

I want these queries to be faster. I also intend to do a few more
aggregation queries, like, "Give the maximum responseTime for each
distinct destinationServiceName" etc. in future.

I have used Elasticsearch and kibana in the past and I am planning to
migrate to postgres jsonb field (not as individual columns), as
vertical scaling is preferred for my case and the schema could not be
fixed.

I assumed that if I add more indexes, these types of queries would
become faster. So I tried to create an index using the command:

```
CREATE INDEX idx_records ON t USING GIN (record);
```

But this index creation fails now as my postbird client timesout.

So my questions are:
1) Is it a good idea to use postgres as a JSON database, if I have
tens of millions of records ? If you run such a setup, can you share
any words of advice on Dos and Don'ts ?
2) Is there a better way to do the index creation, reliably, for a
table with a few million records already ?
3) There are various types of indexes. I am using GIN based on some
short-term googling. Are there any good blog posts, links, tutorials,
courses etc. that you have found useful in understanding the postgres
jsonb performance tuning ?

Thanks.

-- 
Sankar P
http://psankar.blogspot.com



pgsql-general by date:

Previous
From: Sankar P
Date:
Subject: Re: Importing a Large .ndjson file
Next
From: Praveen Kumar K S
Date:
Subject: [HELP] General query - performance metrics