Thread: To keep indexes in memory, is large enough effective_cache_sizeenough?
Hi!
Related to my other email (size of index in memory),
Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Related to my other email (size of index in memory),
Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?
Background info: I have plans to use hash indexes: very large amount of data in db tables, but (e.g. hash) indexes could be kept in memory.
I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been released.
Best Regards, Sam
Re: To keep indexes in memory, is large enough effective_cache_size enough?
From
Sergei Kornilov
Date:
Hi effective_cache_size is not cache. It is just approx value for query planner: how many data can be found in RAM (both inshared_buffers and OS page cache) > Q: Size of shared_buffers does not matter regarding keeping index in memory? shared_buffers is cache for both tables and indexes pages. All data in tables and indexes are split to chunks 8 kb each -pages (usually 8kb, it can be redefined during source compilation). Shared buffers cache is fully automatic, active used pages keeps in memory, lower used pages may be evicted. You can notpin any table or index to shared buffers. regards, Sergei
Hi!
Is is possible to force PostgreSQL to keep an index in memory? The data in db table columns is not needed to be kept in memory, only the index. (hash index.)
It would sound optimal in our scenario.
I think Oracle has capability to keep index in memory (in-memory db functionality). But does PostgreSQL have such a functionality? (I keep searching.)
I have read:
(effective_cache_size, shared_buffers)
I have seen responses to:
Should I actually set shared_buffers to tens of gigabytes also, if I want to keep one very big index in memory?
I ma also reading a PG book.
Best Regards, Sam
On Wednesday, September 19, 2018 11:40 AM, Sam R. <samruohola@yahoo.com> wrote:
Hi!
Related to my other email (size of index in memory),
Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Related to my other email (size of index in memory),
Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?
Background info: I have plans to use hash indexes: very large amount of data in db tables, but (e.g. hash) indexes could be kept in memory.
I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been released.
Best Regards, Sam
Sergei wrote:
> You can not pin any table or index to shared buffers.
Thanks, this is answer to my other question!
In our case, this might be an important feature.
(Index in memory, other data / columns not.)
(Index in memory, other data / columns not.)
> shared_buffers is cache for both tables and indexes pages.
Ok. So, we should set also shared_buffers big.
BR Sam
On Wednesday, September 19, 2018 12:10 PM, Sergei Kornilov <sk@zsrv.org> wrote:
Hi
effective_cache_size is not cache. It is just approx value for query planner: how many data can be found in RAM (both in shared_buffers and OS page cache)
> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, lower used pages may be evicted. You can not pin any table or index to shared buffers.
regards, Sergei
effective_cache_size is not cache. It is just approx value for query planner: how many data can be found in RAM (both in shared_buffers and OS page cache)
> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, lower used pages may be evicted. You can not pin any table or index to shared buffers.
regards, Sergei
On 19 September 2018 at 21:18, Sam R. <samruohola@yahoo.com> wrote: > Ok. So, we should set also shared_buffers big. It might not be quite as beneficial as you might think. If your database is larger than RAM often having a smaller shared_buffers setting yields better performance. The reason is that if you have a very large shared_buffers that the same buffers can end up cached in the kernel page cache and shared buffers. If you have a smaller shared buffers setting then the chances of that double buffering are reduced and the chances of finding a page cached somewhere increases. However, if your database is quite small and you can afford to fit all your data in shared buffers, with enough free RAM for everything else, then you might benefit from a large shared buffers, but it's important to also consider that some operations, such as DROP TABLE can become slow of shared buffers is very large. You might get more specific recommendations if you mention how much RAM the server has and how big the data is now and will be in the future. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Does a large shared_buffers impact checkpoint performance negatively? I was under the impression that everything inside shared_buffers must be written during a checkpoint.
On 19 September 2018 at 22:12, Kaixi Luo <kaixiluo@gmail.com> wrote: > Does a large shared_buffers impact checkpoint performance negatively? I was > under the impression that everything inside shared_buffers must be written > during a checkpoint. Only the dirty buffers get written. Also having too small a shared buffers can mean that buffers must be written more than they'd otherwise need to be. If a buffer must be evicted from shared buffers to make way for a new buffer then the chances of having to evict a dirty buffer increases with smaller shared buffers. Obviously, this dirty buffer needs to be written out before the new buffer can be loaded in. In a worst-case scenario, a backend performing a query would have to do this. pg_stat_bgwriter is your friend. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi!
Thanks for all of the comments!
David wrote:
> if you mention
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example.
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an even little bit bigger value than 64 GB, if needed to. )
BR Sam
On Wednesday, September 19, 2018 1:11 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 19 September 2018 at 21:18, Sam R. <samruohola@yahoo.com> wrote:
> Ok. So, we should set also shared_buffers big.
It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance. The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.
However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.
You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
> Ok. So, we should set also shared_buffers big.
It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance. The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.
However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.
You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Size of the index of one huge table has been e.g. 16-20 GB (after REINDEX).
Size of such an index is quite big.
BR Samuli
On Wednesday, September 19, 2018 2:01 PM, Sam R. <samruohola@yahoo.com> wrote:
Hi!
Thanks for all of the comments!
David wrote:
> if you mention
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example.
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an even little bit bigger value than 64 GB, if needed to. )
BR Sam
On Wednesday, September 19, 2018 1:11 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
...
On Wed, Sep 19, 2018 at 5:19 AM Sam R. <samruohola@yahoo.com> wrote:
Hi!Is is possible to force PostgreSQL to keep an index in memory?
It might be possible to put the indexes in a separate tablespace, then do something at the file-system level to to force the OS cache to keep pages for that FS in memory.
The data in db table columns is not needed to be kept in memory, only the index. (hash index.)
This sounds like speculation. Do you have hard evidence that this is actually the case?
It would sound optimal in our scenario.I think Oracle has capability to keep index in memory (in-memory db functionality). But does PostgreSQL have such a functionality? (I keep searching.)
There are a lot of Oracle capabilities which encourage people to micromanage the server in ways that are almost never actually productive.
Should I actually set shared_buffers to tens of gigabytes also, if I want to keep one very big index in memory?
If your entire database fits in RAM, then it could be useful to set shared_buffers high enough to fit the entire database.
If fitting the entire database in RAM is hopeless, 10s of gigabytes is probably too much, unless you have 100s of GB of RAM. PostgreSQL doesn't do direct IO, but rather uses the OS file cache extensively. This leads to double-buffering, where a page is read from disk and stored in the OS file cache, then handed over to PostgreSQL where it is also stored in shared_buffers. That means that 1/2 of RAM is often the worse value for shared_buffers. You would want it to be either something like 1/20 to 1/10 of RAM, or something like 9/10 or 19/20 of RAM, so that you concentrate pages into one of the caches or the other. The low fraction of RAM is the more generally useful option. The high fraction of RAM is useful when you have very high write loads, particularly intensive index updating--and in that case you probably need someone to intensively monitor and baby-sit the database.
Cheers,
Jeff
Thanks for the comments!
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)
Jeff Janes wrote:
> This sounds like speculation. Do you have hard evidence that this is actually the case?
In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.
It is not a "sequential" number, but random.
In generation phase, it is a very large random number. Our application may not even generate the random ID.
We use hash index over the ID.
At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.
Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).
In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.)
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.)
BR Sam
I believe you can use pg_prewarm to pin index or table to cache.
On Wed, 19 Sep 2018 at 22:50, Sam R. <samruohola@yahoo.com> wrote:
Thanks for the comments!Sam wrote:>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)Jeff Janes wrote:> This sounds like speculation. Do you have hard evidence that this is actually the case?In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.In generation phase, it is a very large random number. Our application may not even generate the random ID.We use hash index over the ID.At the moment, in "pure theory", we will read randomly through the hash index.So, no one will be able to know what part of the data (from the table) should be kept in memory.Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.)BR Sam
Regards,
Ang Wei Shan
Ang Wei Shan
On 20 September 2018 at 15:19, Wei Shan <weishan.ang@gmail.com> wrote: > I believe you can use pg_prewarm to pin index or table to cache. > > https://www.postgresql.org/docs/current/static/pgprewarm.html I think the key sentence in the document you linked to is: "Prewarmed data also enjoys no special protection from cache evictions, so it is possible that other system activity may evict the newly prewarmed blocks shortly after they are read" So this is not pinning. It's merely loading buffers into shared buffers in the hope that they might be around long enough for you to make the most of that effort. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi!
"Index in memory" topic:
After read operation starts,
I think / it seems that a big part of an index gets loaded to memory quite quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading operation starts, pages start to be loaded to memory quickly.
So, this "feature" / PostgreSQL may work well in our very long "random" IDs cases still.
It is not needed / not possible to keep the whole index in memory: E.g. if there is not enough memory / size of index is bigger than memory, it is not even possible to keep whole index in memory.
( Regarding in memory DB functionalities: I do not know would "In-memory" index / db work in such situations, if index would not fit in memory. We would like to keep most of the index in memory, but not whole index in all cases e.g. when there is not enough memory available. )
( Regarding in memory DB functionalities: I do not know would "In-memory" index / db work in such situations, if index would not fit in memory. We would like to keep most of the index in memory, but not whole index in all cases e.g. when there is not enough memory available. )
So, again, maybe PostgreSQL works well in our case.
Regarding double buffering: I do not know how much double buffering would slow down operations.
It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.
It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.
BR Sam
On Wednesday, September 19, 2018 5:50 PM, Sam R. <samruohola@yahoo.com> wrote:
Thanks for the comments!
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)
Jeff Janes wrote:
> This sounds like speculation. Do you have hard evidence that this is actually the case?
In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.
It is not a "sequential" number, but random.
In generation phase, it is a very large random number. Our application may not even generate the random ID.
We use hash index over the ID.
At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.
Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).
In generic:
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.)
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.)
BR Sam
On Tue, 25 Sep 2018 at 18:36, Sam R. <samruohola@yahoo.com> wrote: > Regarding double buffering: I do not know how much double buffering would slow down operations. > It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may stillkeep it in use. I think you've misunderstood double buffering. The double buffering itself does not slow anything down. If the buffer is in shared buffers already then it does not need to look any further for it. Double buffering only becomes an issue when buffers existing 2 times in memory causes other useful buffers to appear 0 times. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi!
> The double buffering
> itself does not slow anything down.
That was what I was suspecting a little. Double buffering may not matter in our case, because the whole server is meant for PostgreSQL only.
In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared buffers etc.).
Please correct me if I am wrong.
BR Sam
On ti, syysk. 25, 2018 at 23:55, David Rowley<david.rowley@2ndquadrant.com> wrote:On Tue, 25 Sep 2018 at 18:36, Sam R. <samruohola@yahoo.com> wrote:
> Regarding double buffering: I do not know how much double buffering would slow down operations.
> It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.
I think you've misunderstood double buffering. The double buffering
itself does not slow anything down. If the buffer is in shared buffers
already then it does not need to look any further for it. Double
buffering only becomes an issue when buffers existing 2 times in
memory causes other useful buffers to appear 0 times.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 28 September 2018 at 16:45, Sam R. <samruohola@yahoo.com> wrote: > That was what I was suspecting a little. Double buffering may not matter in > our case, because the whole server is meant for PostgreSQL only. > > In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared > buffers etc.). > > Please correct me if I am wrong. You mentioned above: > RAM: 64 GB > Data: 500 GB - 1.5 TB, for example. If most of that data just sits on disk and is never read then you might be right, but if the working set of the data is larger than RAM then you might find you get better performance from smaller shared buffers. I think the best thing you can go and do is to go and test this. Write some code that mocks up a realistic production workload and see where you get the best performance. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services