Thread: Postgresql jsonb

Postgresql jsonb

From
Deepak Balasubramanyam
Date:
Hi,

I have a table (20 million rows) in Postgresql 9.4 that contains a bigint id as the primary key and another column that contains jsonb data. Queries run on this table look like so...

------------
## Query
------------
select ... from table
WHERE table.column ->'item'->> 'name' = 'value'
------------

I'd like to make an effort to get Postgresql to keep all data available in this table and any index on this table in memory. This would ensure that sequence or index scans made on the data are fairly fast.

Research into this problem indicates that there is no reliable way to get Postgresql to run off of RAM memory completely (http://stackoverflow.com/a/24235439/830964). Assuming the table and its indexes amount to 15 gb of data  on the disk and the machine contains 64GB of RAM with shared buffers placed at anywhere from 16-24 GB, here are my questions...

1. When postgresql returns data from this query, how can I tell how much of the data was cached in memory?

2. I'm aware that I can tweak the shared buffer so that more data is cached. Is there a way to monitor this value for its effectiveness?

3. Is there a reliable way / calculation (or close to it), to determine a point after which Postgresql will ask the disk for data Vs the caches?

Thank you for taking the time to read my question.

- Deepak

Re: Postgresql jsonb

From
Bill Moran
Date:
On Fri, 14 Aug 2015 17:39:49 +0530
Deepak Balasubramanyam <deepak.balu@gmail.com> wrote:
>
> I have a table (20 million rows) in Postgresql 9.4 that contains a bigint
> id as the primary key and another column that contains jsonb data. Queries
> run on this table look like so...
>
> ------------
> ## Query
> ------------
> select ... from table
> WHERE table.column ->'item'->> 'name' = 'value'
> ------------
>
> I'd like to make an effort to get Postgresql to keep all data available in
> this table and any index on this table in memory. This would ensure that
> sequence or index scans made on the data are fairly fast.
>
> Research into this problem indicates that there is no reliable way to get
> Postgresql to run off of RAM memory completely (
> http://stackoverflow.com/a/24235439/830964). Assuming the table and its
> indexes amount to 15 gb of data  on the disk and the machine contains 64GB
> of RAM with shared buffers placed at anywhere from 16-24 GB, here are my
> questions...
>
> 1. When postgresql returns data from this query, how can I tell how much of
> the data was cached in memory?

I'm not aware of any way to do that on a per-query basis.

> 2. I'm aware that I can tweak the shared buffer so that more data is
> cached. Is there a way to monitor this value for its effectiveness?

Install the pg_buffercache extension and read up on what it provides. It
gives a pretty good view into what PostgreSQL is keeping in memory.

> 3. Is there a reliable way / calculation (or close to it), to determine a
> point after which Postgresql will ask the disk for data Vs the caches?

It will ask the disk for data if the data is not in memory. As long as the
data it needs is in memory, it will never talk to the disk unless it needs
to write data back.

The cache is a cache. So there are only 2 reasons your data wouldn't all be
in memory all the time:

1) It doesn't all fit
2) Some of that memory is needed by other tables/indexes/etc

As far as when things get evicted from memory, you'll have to look at the
source code, but it's your typical "keep the most commonly needed data in
memory" algorithms.

What problem are you seeing? What is your performance requirement, and what
is the observed performance? I ask because it's unlikely that you really
need to dig into these details like you are, and most people who ask
questions like this are misguided in some way.

--
Bill Moran


Re: Postgresql jsonb

From
David Rowley
Date:
On 15 August 2015 at 00:09, Deepak Balasubramanyam <deepak.balu@gmail.com> wrote:
Hi,

I have a table (20 million rows) in Postgresql 9.4 that contains a bigint id as the primary key and another column that contains jsonb data. Queries run on this table look like so...

------------
## Query
------------
select ... from table
WHERE table.column ->'item'->> 'name' = 'value'
------------

I'd like to make an effort to get Postgresql to keep all data available in this table and any index on this table in memory. This would ensure that sequence or index scans made on the data are fairly fast.

Research into this problem indicates that there is no reliable way to get Postgresql to run off of RAM memory completely (http://stackoverflow.com/a/24235439/830964). Assuming the table and its indexes amount to 15 gb of data  on the disk and the machine contains 64GB of RAM with shared buffers placed at anywhere from 16-24 GB, here are my questions...

1. When postgresql returns data from this query, how can I tell how much of the data was cached in memory?


It depends which memory you're talking about. If you mean pages that are in the shared buffers then you can just

EXPLAIN (ANALYZE, BUFFERS) select ... from table;

You'll see Buffers: shared read=N if any buffers were "read from disk" but keep in mind they still might not be coming from disk, they could be cached by the operating system in memory.

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgresql jsonb

From
Deepak Balasubramanyam
Date:
Thank you Bill and David. I'll take a look at `pg_buffercache ` and explain with buffers. 

---------------
>>> What problem are you seeing? 
---------------

I don't have a problem at the moment.

---------------
>>> What is your performance requirement, and what is the observed performance?
---------------

The observed performance is within my requirement. My question was aimed at getting it to stay that way and your answers have helped.

Thanks again
-Deepak


On Fri, Aug 14, 2015 at 6:19 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 15 August 2015 at 00:09, Deepak Balasubramanyam <deepak.balu@gmail.com> wrote:
Hi,

I have a table (20 million rows) in Postgresql 9.4 that contains a bigint id as the primary key and another column that contains jsonb data. Queries run on this table look like so...

------------
## Query
------------
select ... from table
WHERE table.column ->'item'->> 'name' = 'value'
------------

I'd like to make an effort to get Postgresql to keep all data available in this table and any index on this table in memory. This would ensure that sequence or index scans made on the data are fairly fast.

Research into this problem indicates that there is no reliable way to get Postgresql to run off of RAM memory completely (http://stackoverflow.com/a/24235439/830964). Assuming the table and its indexes amount to 15 gb of data  on the disk and the machine contains 64GB of RAM with shared buffers placed at anywhere from 16-24 GB, here are my questions...

1. When postgresql returns data from this query, how can I tell how much of the data was cached in memory?


It depends which memory you're talking about. If you mean pages that are in the shared buffers then you can just

EXPLAIN (ANALYZE, BUFFERS) select ... from table;

You'll see Buffers: shared read=N if any buffers were "read from disk" but keep in mind they still might not be coming from disk, they could be cached by the operating system in memory.

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services