Thread: To keep indexes in memory, is large enough effective_cache_sizeenough?

To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Hi!

Related to my other email (size of index in memory),

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?

Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been released.

Best Regards, Sam

Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
Sergei Kornilov
Date:
Hi

effective_cache_size is not cache. It is just approx value for query planner: how many data can be found in RAM (both
inshared_buffers and OS page cache)
 

> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables and indexes are split to chunks 8 kb each
-pages (usually 8kb, it can be redefined during source compilation).
 
Shared buffers cache is fully automatic, active used pages keeps in memory, lower used pages may be evicted. You can
notpin any table or index to shared buffers.
 

regards, Sergei


Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Hi!

Is is possible to force PostgreSQL to keep an index in memory? The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

It would sound optimal in our scenario.
I think Oracle has capability to keep index in memory (in-memory db functionality). But does PostgreSQL have such a functionality? (I keep searching.)

I have read:
(effective_cache_size, shared_buffers)

I have seen responses to:



Should I actually set shared_buffers to tens of gigabytes also, if I want to keep one very big index in memory?

I ma also reading a PG book.

Best Regards, Sam




On Wednesday, September 19, 2018 11:40 AM, Sam R. <samruohola@yahoo.com> wrote:


Hi!

Related to my other email (size of index in memory),

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?

Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been released.

Best Regards, Sam


Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Sergei wrote:
> You can not pin any table or index to shared buffers.

Thanks, this is answer to my other question!

In our case, this might be an important feature.
(Index in memory, other data / columns not.)
> shared_buffers is cache for both tables and indexes pages.

Ok. So, we should set also shared_buffers big.

BR Sam


On Wednesday, September 19, 2018 12:10 PM, Sergei Kornilov <sk@zsrv.org> wrote:


Hi

effective_cache_size is not cache. It is just approx value for query planner: how many data can be found in RAM (both in shared_buffers and OS page cache)


> Q: Size of shared_buffers does not matter regarding keeping index in memory?

shared_buffers is cache for both tables and indexes pages. All data in tables and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, lower used pages may be evicted. You can not pin any table or index to shared buffers.

regards, Sergei



Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
David Rowley
Date:
On 19 September 2018 at 21:18, Sam R. <samruohola@yahoo.com> wrote:
> Ok. So, we should set also shared_buffers big.

It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance.  The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.

However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.

You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
Kaixi Luo
Date:
Does a large shared_buffers impact checkpoint performance negatively? I was under the impression that everything inside shared_buffers must be written during a checkpoint.

Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
David Rowley
Date:
On 19 September 2018 at 22:12, Kaixi Luo <kaixiluo@gmail.com> wrote:
> Does a large shared_buffers impact checkpoint performance negatively? I was
> under the impression that everything inside shared_buffers must be written
> during a checkpoint.

Only the dirty buffers get written.

Also having too small a shared buffers can mean that buffers must be
written more than they'd otherwise need to be. If a buffer must be
evicted from shared buffers to make way for a new buffer then the
chances of having to evict a dirty buffer increases with smaller
shared buffers. Obviously, this dirty buffer needs to be written out
before the new buffer can be loaded in. In a worst-case scenario, a
backend performing a query would have to do this.  pg_stat_bgwriter is
your friend.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Hi!

Thanks for all of the comments!

David wrote:
> if you mention
> how muchRAM the server has and how big the data is now

Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example.

( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an even little bit bigger value than 64 GB, if needed to. )

BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:


On 19 September 2018 at 21:18, Sam R. <samruohola@yahoo.com> wrote:

> Ok. So, we should set also shared_buffers big.


It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance.  The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.

However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.

You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.

--
David Rowley                  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Size of the index of one huge table has been e.g. 16-20 GB (after REINDEX).

Size of such an index is quite big.
 
BR Samuli

On Wednesday, September 19, 2018 2:01 PM, Sam R. <samruohola@yahoo.com> wrote:


Hi!

Thanks for all of the comments!

David wrote:
> if you mention
> how muchRAM the server has and how big the data is now

Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example.

( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an even little bit bigger value than 64 GB, if needed to. )

BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

...






Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
Jeff Janes
Date:
On Wed, Sep 19, 2018 at 5:19 AM Sam R. <samruohola@yahoo.com> wrote:
Hi!

Is is possible to force PostgreSQL to keep an index in memory?

It might be possible to put the indexes in a separate tablespace, then do something at the file-system level to to force the OS cache to keep pages for that FS in memory.

 
The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

This sounds like speculation.  Do you have hard evidence that this is actually the case?
 

It would sound optimal in our scenario.
I think Oracle has capability to keep index in memory (in-memory db functionality). But does PostgreSQL have such a functionality? (I keep searching.)

There are a lot of Oracle capabilities which encourage people to micromanage the server in ways that are almost never actually productive.

Should I actually set shared_buffers to tens of gigabytes also, if I want to keep one very big index in memory?

If your entire database fits in RAM, then it could be useful to set shared_buffers high enough to fit the entire database.

If fitting the entire database in RAM is hopeless, 10s of gigabytes is probably too much, unless you have 100s of GB of RAM. PostgreSQL doesn't do direct IO, but rather uses the OS file cache extensively.  This leads to double-buffering, where a page is read from disk and stored in the OS file cache, then handed over to PostgreSQL where it is also stored in shared_buffers. That means that 1/2 of RAM is often the worse value for shared_buffers.  You would want it to be either something like 1/20 to 1/10 of RAM, or something like 9/10 or 19/20 of RAM, so that you concentrate pages into one of the caches or the other.  The low fraction of RAM is the more generally useful option.  The high fraction of RAM is useful when you have very high write loads, particularly intensive index updating--and in that case you probably need someone to intensively monitor and baby-sit the database.

Cheers,

Jeff

Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Thanks for the comments!

Sam wrote:
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is actually the case?

In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.

Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).

In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty.  (This is one reason for these discussions.)

BR Sam


Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
Wei Shan
Date:
I believe you can use pg_prewarm to pin index or table to cache.


On Wed, 19 Sep 2018 at 22:50, Sam R. <samruohola@yahoo.com> wrote:
Thanks for the comments!

Sam wrote:
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is actually the case?

In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.

Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).

In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty.  (This is one reason for these discussions.)

BR Sam




--
Regards,
Ang Wei Shan

Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
David Rowley
Date:
On 20 September 2018 at 15:19, Wei Shan <weishan.ang@gmail.com> wrote:
> I believe you can use pg_prewarm to pin index or table to cache.
>
> https://www.postgresql.org/docs/current/static/pgprewarm.html

I think the key sentence in the document you linked to is:

"Prewarmed data also enjoys no special protection from cache
evictions, so it is possible that other system activity may evict the
newly prewarmed blocks shortly after they are read"

So this is not pinning. It's merely loading buffers into shared
buffers in the hope that they might be around long enough for you to
make the most of that effort.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Hi!

"Index in memory" topic:

After read operation starts,
I think / it seems that a big part of an index gets loaded to memory quite quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading operation starts, pages start to be loaded to memory quickly.

So, this "feature" / PostgreSQL may work well in our very long "random" IDs cases still.

It is not needed / not possible to keep the whole index in memory: E.g. if there is not enough memory / size of index is bigger than memory, it is not even possible to keep whole index in memory.
( Regarding in memory DB functionalities: I do not know would "In-memory" index / db work in such situations, if index would not fit in memory. We would like to keep most of the index in memory, but not whole index in all cases e.g. when there is not enough memory available. )

So, again, maybe PostgreSQL works well in our case.

Regarding double buffering: I do not know how much double buffering would slow down operations.
It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.

BR Sam

On Wednesday, September 19, 2018 5:50 PM, Sam R. <samruohola@yahoo.com> wrote:


Thanks for the comments!

Sam wrote:
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

Jeff Janes wrote:

> This sounds like speculation.  Do you have hard evidence that this is actually the case?

In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.

Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).

In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty.  (This is one reason for these discussions.)

BR Sam





Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
David Rowley
Date:
On Tue, 25 Sep 2018 at 18:36, Sam R. <samruohola@yahoo.com> wrote:
> Regarding double buffering: I do not know how much double buffering would slow down operations.
> It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may
stillkeep it in use.
 

I think you've misunderstood double buffering.  The double buffering
itself does not slow anything down. If the buffer is in shared buffers
already then it does not need to look any further for it. Double
buffering only becomes an issue when buffers existing 2 times in
memory causes other useful buffers to appear 0 times.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: To keep indexes in memory, is large enough effective_cache_sizeenough?

From
"Sam R."
Date:
Hi!

> The double buffering
> itself does not slow anything down. 

That was what I was suspecting a little. Double buffering may not matter in our case, because the whole server is meant for PostgreSQL only.

In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared buffers etc.).

Please correct me if I am wrong.

BR Sam



On ti, syysk. 25, 2018 at 23:55, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On Tue, 25 Sep 2018 at 18:36, Sam R. <samruohola@yahoo.com> wrote:

> Regarding double buffering: I do not know how much double buffering would slow down operations.
> It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.


I think you've misunderstood double buffering.  The double buffering
itself does not slow anything down. If the buffer is in shared buffers
already then it does not need to look any further for it. Double
buffering only becomes an issue when buffers existing 2 times in
memory causes other useful buffers to appear 0 times.

--
David Rowley                  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: To keep indexes in memory, is large enough effective_cache_size enough?

From
David Rowley
Date:
On 28 September 2018 at 16:45, Sam R. <samruohola@yahoo.com> wrote:
> That was what I was suspecting a little. Double buffering may not matter in
> our case, because the whole server is meant for PostgreSQL only.
>
> In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared
> buffers etc.).
>
> Please correct me if I am wrong.

You mentioned above:

> RAM: 64 GB
> Data: 500 GB - 1.5 TB, for example.

If most of that data just sits on disk and is never read then you
might be right, but if the working set of the data is larger than RAM
then you might find you get better performance from smaller shared
buffers.

I think the best thing you can go and do is to go and test this. Write
some code that mocks up a realistic production workload and see where
you get the best performance.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services