Thread: How to keep a table in memory?

How to keep a table in memory?

From
adrobj
Date:
I have a pretty small table (~20MB) that is accessed very frequently and
randomly, so I want to make sure it's 100% in memory all the time. There is
a lot of other staff that's also gets accessed frequently, so I don't want
to just hope that Linux file cache would do the right thing for me.

Is there any way to do that?

One idea I have in my head is to start a process that does mmap() and
mlock() with the table file. Will it work? If so, are there any potential
problems?
-- 
View this message in context: http://www.nabble.com/How-to-keep-a-table-in-memory--tf4789293.html#a13700771
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: How to keep a table in memory?

From
Heikki Linnakangas
Date:
adrobj wrote:
> I have a pretty small table (~20MB) that is accessed very frequently and
> randomly, so I want to make sure it's 100% in memory all the time. There is
> a lot of other staff that's also gets accessed frequently, so I don't want
> to just hope that Linux file cache would do the right thing for me.
> 
> Is there any way to do that?
> 
> One idea I have in my head is to start a process that does mmap() and
> mlock() with the table file. Will it work? If so, are there any potential
> problems?

Just leave it to the cache management algorithms in Postgres and Linux. 
If it really is frequently accessed, it should stay in Postgres shared 
buffers.

You can use the pg_buffercache contrib module to see what's in cache.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: How to keep a table in memory?

From
Alex Drobychev
Date:
Hi Heikki,
 
Thanks for the response!
 
I understand that relying on cache management would be the easiest solution. However, I had a similar issue with other RDBMS (MSSQL, to be specific) in the past and observed a lot of disk activity until the table was pinned in memory (fortunately MSSQL has 'dbcc pintable' for that).
 
Basically, this is all about a high-traffic website, where virtually _all_ data in the DB get accessed frequently - so it's not obvious which DB pages are going to win the eviction war. However, the overall cost of access is different for different tables - for the table in question it very well may ~20 disk seeks per webpage view, so very high cache hit rate (ideally 100%) has to be assured.
 
So - will the 'mlock' hack work? Or any other ideas for "pinning" a table in memory?
 
- Alex

Heikki Linnakangas <heikki@enterprisedb.com> wrote:
adrobj wrote:
> I have a pretty small table (~20MB) that is accessed very frequently and
> randomly, so I want to make sure it's 100% in memory all the time. There is
> a lot of other staff that's also gets accessed frequently, so I don't want
> to just hope that Linux file cache would do the right thing for me.
>
> Is there any way to do that?
>
> One idea I have in my head is to start a process that does mmap() and
> mlock() with the table file. Will it work? If so, are there any potential
> problems?

Just leave it to the cache management algorithms in Postgres and Linux.
If it really is frequently accessed, it should stay in Postgres shared
buffers.

You can use the pg_buffercache contrib module to see what's in cache.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


Never miss a thing. Make Yahoo your homepage.

Re: How to keep a table in memory?

From
Andrew Dunstan
Date:


Alex Drobychev wrote:
> Hi Heikki,
>  
> Thanks for the response!
>  
> I understand that relying on cache management would be the easiest 
> solution. However, I had a similar issue with other RDBMS (MSSQL, to 
> be specific) in the past and observed a lot of disk activity until the 
> table was pinned in memory (fortunately MSSQL has 'dbcc pintable' for 
> that).
>  
> Basically, this is all about a high-traffic website, where virtually 
> _all_ data in the DB get accessed frequently - so it's not obvious 
> which DB pages are going to win the eviction war. However, the overall 
> cost of access is different for different tables - for the table in 
> question it very well may ~20 disk seeks per webpage view, so very 
> high cache hit rate (ideally 100%) has to be assured.
>  
> So - will the 'mlock' hack work? Or any other ideas for "pinning" a 
> table in memory?
>  
> - Alex
>
> */Heikki Linnakangas <heikki@enterprisedb.com>/* wrote:
>
>     adrobj wrote:
>     > I have a pretty small table (~20MB) that is accessed very
>     frequently and
>     > randomly, so I want to make sure it's 100% in memory all the
>     time. There is
>     > a lot of other staff that's also gets accessed frequently, so I
>     don't want
>     > to just hope that Linux file cache would do the right thing for me.
>     >
>     > Is there any way to do that?
>     >
>     > One idea I have in my head is to start a process that does
>     mmap() and
>     > mlock() with the table file. Will it work? If so, are there any
>     potential
>     > problems?
>
>     Just leave it to the cache management algorithms in Postgres and
>     Linux.
>     If it really is frequently accessed, it should stay in Postgres
>     shared
>     buffers.
>
>     You can use the pg_buffercache contrib module to see what's in cache.
>


1. when someone replies to your post at the bottom, please don't put 
your reply at the top. It makes everything totally unreadable.

2. you should investigate one or more of: pg_memcache, solid state disk.

FYI, Postgres is know to be used successfully on some *extremely* heavy 
websites, without using tables pinned in memory.

cheers

andrew



Re: How to keep a table in memory?

From
Robert Treat
Date:
On Monday 12 November 2007 18:31, Andrew Dunstan wrote:
> 1. when someone replies to your post at the bottom, please don't put
> your reply at the top. It makes everything totally unreadable.
>

+1

> 2. you should investigate one or more of: pg_memcache, solid state disk.
>

you might also consider creating a tablespace on tmpfs or ramfs or something 
like pramfs

> FYI, Postgres is know to be used successfully on some *extremely* heavy
> websites, without using tables pinned in memory.
>

+1

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: How to keep a table in memory?

From
"Joshua D. Drake"
Date:
Robert Treat wrote:
> On Monday 12 November 2007 18:31, Andrew Dunstan wrote:
>> 1. when someone replies to your post at the bottom, please don't put
>> your reply at the top. It makes everything totally unreadable.
>>
> 
> +1
> 
>> 2. you should investigate one or more of: pg_memcache, solid state disk.
>>
> 
> you might also consider creating a tablespace on tmpfs or ramfs or something 
> like pramfs
> 
>> FYI, Postgres is know to be used successfully on some *extremely* heavy
>> websites, without using tables pinned in memory.
>>
> 
> +1

I give this a +/- 1. Yes extremely heavy websites can do this *but* they 
require extremely expensive hardware to do so.

Joshua D. Drake




Re: How to keep a table in memory?

From
Devrim GÜNDÜZ
Date:
Hi,

On Mon, 2007-11-12 at 21:15 -0500, Robert Treat wrote:
> > 2. you should investigate one or more of: pg_memcache, solid state
> > disk.
>
> you might also consider creating a tablespace on tmpfs or ramfs or
> something like pramfs

IIRC, ramfs are not that good for database use: If you want to extend
its size, you have to stop the database instance -- which is not
considered good.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Re: How to keep a table in memory?

From
Andrew Dunstan
Date:

Joshua D. Drake wrote:
>>
>>> FYI, Postgres is know to be used successfully on some *extremely* heavy
>>> websites, without using tables pinned in memory.
>>>
>>
>> +1
>
> I give this a +/- 1. Yes extremely heavy websites can do this *but* 
> they require extremely expensive hardware to do so.
>
>

I expect extremely heavy websites to require extremely expensive 
equipment regardless of the software they use. Cost was not the issue 
raised by the OP.

cheers

andrew
>
>


Re: How to keep a table in memory?

From
"Joshua D. Drake"
Date:
Devrim GÜNDÜZ wrote:
> Hi,
> 
> On Mon, 2007-11-12 at 21:15 -0500, Robert Treat wrote:
>>> 2. you should investigate one or more of: pg_memcache, solid state
>>> disk.
>> you might also consider creating a tablespace on tmpfs or ramfs or
>> something like pramfs
> 
> IIRC, ramfs are not that good for database use: If you want to extend
> its size, you have to stop the database instance -- which is not
> considered good.

Well, depending on the size you could push the table to another table 
space, drop the old table space, resize the ramfs, and reverse the 
previous :)

Joshua D. Drake

> 
> Regards,



Re: How to keep a table in memory?

From
"Joshua D. Drake"
Date:
Andrew Dunstan wrote:
> 
>> I give this a +/- 1. Yes extremely heavy websites can do this *but* 
>> they require extremely expensive hardware to do so.
>>
>>
> 
> I expect extremely heavy websites to require extremely expensive 
> equipment regardless of the software they use. Cost was not the issue 
> raised by the OP.

Cost is always an issue, even if implicit. If the person is so hung up 
on the idea of pushing things into ram there is a pretty good 
possibility they have priced out the 50 and 100 spindle devices needed 
to get the same type of performance.

Sincerely,

Joshua D. Drake


> 
> cheers
> 
> andrew
>>
>>
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 



Re: How to keep a table in memory?

From
Devrim GÜNDÜZ
Date:
Hi,

On Mon, 2007-11-12 at 09:12 +0000, Heikki Linnakangas wrote:

> Just leave it to the cache management algorithms in Postgres and
> Linux.  If it really is frequently accessed, it should stay in
> Postgres shared buffers.

How is "frequently accessed" determined by PostgreSQL?

I mean... You know, OS caches either inodes, or pages. Page caches are
pretty ignorable, since it means the data is already in virtual memory.
So, we have inode caching, and IIRC it results in i/o requests from the
disk -- and sure, it uses i/o scheduler of the kernel (like the all of
the applications running on that machine -- including a basic login
session). *If* the data hadn't been deleted, it returns from i/o
scheduler.

So there is no 100% guarantee that the table is in the memory. If we
could use the ram (some (or a :) ) database(s) can do that IIRC),  we
will avoid i/o scheduler, which will really speed up the process. (Ok,
AFAIK, you can "pin" your objects to memory with Oracle).

... and one more thing with ramfs: Since there is a fs on ramfs, it
passes through VFS -- and goes through kernel schedulers again.

So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
approach for the people who are asking to keep their objects on RAM,
even though I know that there is nothing we can say right now.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Re: How to keep a table in memory?

From
Tom Lane
Date:
Devrim GÜNDÜZ <devrim@CommandPrompt.com> writes:
> So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
> approach for the people who are asking to keep their objects on RAM,
> even though I know that there is nothing we can say right now.

Well, nothing is a 100% solution.  But my opinion is that people who
think they are smarter than an LRU caching algorithm are typically
mistaken.  If the table is all that heavily used, it will stay in memory
just fine.  If it's not sufficiently heavily used to stay in memory
according to an LRU algorithm, maybe the memory space really should be
spent on something else.

Now there are certainly cases where a standard caching algorithm falls
down --- the main one I can think of offhand is where you would like to
give one class of queries higher priority than another, and so memory
space should preferentially go to tables that are needed by the first
class.  But if that's your problem, "pin these tables in memory" is
still an awfully crude solution to the problem.  I'd be inclined to
think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1,
or some other way of making the priority considerations visible to an
automatic cache management algorithm.
        regards, tom lane


Re: How to keep a table in memory?

From
Greg Smith
Date:
On Mon, 12 Nov 2007, Alex Drobychev wrote:

> Or any other ideas for "pinning" a table in memory?

If the table you're worried about is only 20MB, have you considered just 
running something regularly that touches the whole thing?  This may be the 
only time I've ever considered running "select count(*) from x" as a 
productive move.  That would waste some CPU, but it would help those pages 
"win the eviction war" as you say.

You definately should follow-up on the suggestion given to look at the 
pg_buffercache contrib module to get a better idea what's going on under 
the LRU hood.  In fact, you may want to install a tweak that's standard in 
8.3 to show the usage counts in order to better get a feel for what's 
going on; the appendix on my article at 
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm goes 
into this a bit, with the documentation to pg_buffercache having the rest 
of what you'd need.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: How to keep a table in memory?

From
"Luke Lonergan"
Date:
<p><font size="2">Vacuum is a better thing to run, much less CPU usage.<br /><br /> - Luke<br /><br /> Msg is shrt cuz
mon ma treo<br /><br />  -----Original Message-----<br /> From:   Greg Smith [<a
href="mailto:gsmith@gregsmith.com">mailto:gsmith@gregsmith.com</a>]<br/> Sent:   Monday, November 12, 2007 11:59 PM
EasternStandard Time<br /> To:     Alex Drobychev<br /> Cc:     pgsql-hackers@postgresql.org<br /> Subject:        Re:
[HACKERS]How to keep a table in memory?<br /><br /> On Mon, 12 Nov 2007, Alex Drobychev wrote:<br /><br /> > Or any
otherideas for "pinning" a table in memory?<br /><br /> If the table you're worried about is only 20MB, have you
consideredjust<br /> running something regularly that touches the whole thing?  This may be the<br /> only time I've
everconsidered running "select count(*) from x" as a<br /> productive move.  That would waste some CPU, but it would
helpthose pages<br /> "win the eviction war" as you say.<br /><br /> You definately should follow-up on the suggestion
givento look at the<br /> pg_buffercache contrib module to get a better idea what's going on under<br /> the LRU hood. 
Infact, you may want to install a tweak that's standard in<br /> 8.3 to show the usage counts in order to better get a
feelfor what's<br /> going on; the appendix on my article at<br /><a
href="http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm">http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm</a>
goes<br/> into this a bit, with the documentation to pg_buffercache having the rest<br /> of what you'd need.<br /><br
/>--<br /> * Greg Smith gsmith@gregsmith.com <a href="http://www.gregsmith.com">http://www.gregsmith.com</a> Baltimore,
MD<br/><br /> ---------------------------(end of broadcast)---------------------------<br /> TIP 6: explain analyze is
yourfriend<br /></font> 

Re: How to keep a table in memory?

From
Christopher Browne
Date:
In an attempt to throw the authorities off his trail, jd@commandprompt.com ("Joshua D. Drake") transmitted:
> Andrew Dunstan wrote:
>>> I give this a +/- 1. Yes extremely heavy websites can do this
>>> *but* they require extremely expensive hardware to do so.
>>>
>> I expect extremely heavy websites to require extremely expensive
>> equipment regardless of the software they use. Cost was not the
>> issue raised by the OP.
>
> Cost is always an issue, even if implicit. If the person is so hung
> up on the idea of pushing things into ram there is a pretty good
> possibility they have priced out the 50 and 100 spindle devices
> needed to get the same type of performance.

I dunno; I had a chat about cacheing strategies today where it became
clear to me that when we migrate to 8.3, we'll need to re-examine
things because there has been *so* much change since some of our
present policy was created back in the 7.2 days.

(Pointedly, one of the reasons to want a separate cache DB was to cut
down on XID consumption by read-only processes, and that reason
evaporates in 8.3.)

I have seen enough naive analyses done that I wouldn't be inclined to
assume much of anything.

People can get mighty self-assured about things that they have heard,
whether those things have validity or not.  Few things can get as
badly wrong as bad assumptions made about performance...
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxdatabases.info/info/x.html
"When we write programs that "learn", it turns out that we do and they
don't." -- Alan Perlis


Re: How to keep a table in memory?

From
Christopher Browne
Date:
Quoth tgl@sss.pgh.pa.us (Tom Lane):
> Devrim GÜNDÜZ <devrim@CommandPrompt.com> writes:
>> So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
>> approach for the people who are asking to keep their objects on RAM,
>> even though I know that there is nothing we can say right now.
>
> Well, nothing is a 100% solution.  But my opinion is that people who
> think they are smarter than an LRU caching algorithm are typically
> mistaken.  If the table is all that heavily used, it will stay in memory
> just fine.  If it's not sufficiently heavily used to stay in memory
> according to an LRU algorithm, maybe the memory space really should be
> spent on something else.
>
> Now there are certainly cases where a standard caching algorithm falls
> down --- the main one I can think of offhand is where you would like to
> give one class of queries higher priority than another, and so memory
> space should preferentially go to tables that are needed by the first
> class.  But if that's your problem, "pin these tables in memory" is
> still an awfully crude solution to the problem.  I'd be inclined to
> think instead about a scheme that lets references made by
> higher-priority queries bump buffers' use-counts by more than 1,
> or some other way of making the priority considerations visible to an
> automatic cache management algorithm.

Something I found *really* interesting was that whenever we pushed any
"high traffic" systems onto PostgreSQL 8.1, I kept seeing measurable
performance improvements taking place every day for a week.

Evidently, it took that long for cache to *truly* settle down.

Given that, and given that we've gotten a couple of good steps *more*
sophisticated than mere LRU, I'm fairly willing to go pretty far down
the "trust the shared memory cache" road.

The scenario described certainly warrants doing some benchmarking; it
warrants analyzing the state of the internal buffers over a period of
time to see what is actually in them.

If, after a reasonable period of time (that includes some variations
in system load), a reasonable portion (or perhaps the entirety) of the
Essential Table has consistently resided in buffers, then that should
be pretty decent evidence that cacheing is working the way it should.
-- 
output = ("cbbrowne" "@" "gmail.com")
http://linuxdatabases.info/info/slony.html
A Plateau is the highest form of flattery.


Re: How to keep a table in memory?

From
Heikki Linnakangas
Date:
Luke Lonergan wrote:
> Vacuum is a better thing to run, much less CPU usage.

Vacuum is actually not good for this purpose, because it's been 
special-cased to not bump the usage count.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: How to keep a table in memory?

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I'd be inclined to think instead about a scheme that lets references made by
> higher-priority queries bump buffers' use-counts by more than 1, or some
> other way of making the priority considerations visible to an automatic
> cache management algorithm.

I don't think that really solves the problem.

Consider a case where you have a few dozen queries all of which use indexes to
access only a few pages per call (but spread across a large enough table),
except for just one query which uses a sequential scan of a moderate sized
table.

In such a circumstance the best average performance might be to keep the pages
used by the index scans in memory and force most of the sequential scan to go
to disk. Especially if the sequential scan is fairly rare and especially if
random_page_cost is fairly high.

However if your concern is response time, not average performance, then that
would be disastrous. In exchange for a slight improvement of already fast
queries you would be obtaining an unsatisfactory response time for the
sequential scan.

I'm not sure what the solution is. This scenario is going to be a problem for
any system which tries to judge future usage based on past usage. If the
infrequent query with a strict response time requirement is infrequent enough
any automatic algorithm will evict it.

Some brainstorming ideas: What if a prepared query which previously ran under
some specified response time guarantee didn't bump the usage counts at all.
That way frequently run queries which are fast enough even with disk accesses
don't evict pages needed for slower queries.

Or better yet if we tag a prepared query with the average (or 90% percentile
or something like that) response time from the past and tag every buffer it
touches with that response time if it's greater than what the buffer is
already tagged with. When scanning for a page to evict we ignore any buffer
with response times larger than ours. Ie, queries which respond quickly are
not allowed to evict buffers needed by queries which response slower than
them. Only a slower or ad-hoc non-prepared query is allowed to evict those
pages.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: How to keep a table in memory?

From
Andrew Sullivan
Date:
On Mon, Nov 12, 2007 at 06:55:09PM -0800, Joshua D. Drake wrote:
> Cost is always an issue, even if implicit. If the person is so hung up 
> on the idea of pushing things into ram there is a pretty good 
> possibility they have priced out the 50 and 100 spindle devices needed 
> to get the same type of performance.

I'm not sure I agree with that.  The OP was claiming that this approach was
what worked for him with MS SQL Server, which makes me think that this is
the usual human habit of generalizing widely from a particular.  That is,
"X was a solution that worked once with another product, so I want to know
how to do X with your product."  We get these questions all the time, partly
because one has to re-learn all sorts of things when moving to PostgreSQL. 
For instance, most of the traditional real database systems don't
collaborate with the OS in memory and cache management.

A

-- 
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke


Re: How to keep a table in memory?

From
Andrew Sullivan
Date:
On Mon, Nov 12, 2007 at 10:54:34PM -0500, Tom Lane wrote:

> class.  But if that's your problem, "pin these tables in memory" is
> still an awfully crude solution to the problem.  I'd be inclined to
> think instead about a scheme that lets references made by
> higher-priority queries bump buffers' use-counts by more than 1,
> or some other way of making the priority considerations visible to an
> automatic cache management algorithm.

While this is true, nobody seems to have those other ways available today. 
If there was a quick and easy way to pin certain tables in memory, I think
that administrators might be well-advised to use that mechanism until such
time as the weighted-priority cacheing or whatever shows up.  (Of course,
AFAICT, there's no easy way to do the pinning, either, so this all seems a
little academic.)

I have to agree with what Tom says, however, about people thinking they're
smarter than the system.  Much of the time, this sort of thumb on the scale
optimisation just moves the cost to some other place, and the admin's
analysis isn't comprehensive enough to turn that up until it's all turned on
in production.

A

-- 
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke


Re: How to keep a table in memory?

From
Josh Berkus
Date:
All,

> I'm not sure what the solution is. This scenario is going to be a problem
> for any system which tries to judge future usage based on past usage. If
> the infrequent query with a strict response time requirement is infrequent
> enough any automatic algorithm will evict it.

The way Greg puts this it sounds extremely hypothetical, but it's actually 
pretty common.

For example, I had an application which was primarily data entry but 
periodically (one per 10 minutes or so) would run a heavy-duty full-text 
search.  This caused the full-text index to go to disk alot ... but since the 
data entry was employees and the FTS was for customers, we'd have rather had 
the FTI "pinned" in memory and the data entry be 50% slower.  (in the end, we 
solved the issue with a ramdisk but that was a bit of a hack and involved 
spending $$$ on RAM)

Mind you, that's a case of needing to have an *index* pinned, but I think 
those are just as common.  Overall, it's a problem of having applications 
where response time is *not* tied to frequency of usage.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: How to keep a table in memory?

From
Greg Smith
Date:
On Tue, 13 Nov 2007, Andrew Sullivan wrote:

> I have to agree with what Tom says, however, about people thinking 
> they're smarter than the system.  Much of the time, this sort of thumb 
> on the scale optimisation just moves the cost to some other place

Sure, but in this case the reasoning seems sound enough.  The buffer 
eviction policy presumes that all buffers cost an equal amount to read 
back in again.  Here we have an application where it's believed that's not 
true:  the data on disk for this particular table has a large seek 
component to it for some reason, it tends to get read in large chunks (but 
not necessairly frequently), and latency on that read is critical to 
business requirements.  "The system" doesn't know that, and it's 
impractical to make it smart enough to figure it out on its own, so asking 
how to force that is reasonable.

I see this as similar to the old optimizer hint argument, where there 
certainly exist some edge cases where people know something the optimizer 
doesn't which changes the optimal behavior.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: How to keep a table in memory?

From
Andrew Sullivan
Date:
On Tue, Nov 13, 2007 at 02:36:14PM -0500, Greg Smith wrote:
> Sure, but in this case the reasoning seems sound enough.  

Yes.  But. . .

> I see this as similar to the old optimizer hint argument, where there 
> certainly exist some edge cases where people know something the optimizer 
> doesn't which changes the optimal behavior.

. . .the abuse of such hints in applications I have seen is so rampant as to
make me doubt the utility of adding them anyway.  It's true that by adding
hints, you give a facility to a good, competent designer who has a really
peculiar case that no general purpose system is likely to solve well.  In
practice, however, it also seems to mean that every slack-jawed fool with
access to the manual thinks that he or she is going to "fix" the "broken"
query plan by forcing index scans where they're useless (has a week yet gone
by where someone doesn't post to -performance with that problem?).  So I'm
divided on whether actually providing the facility is a good idea, even
though I can think of a handful of cases where I doubt even the smartest
planner will get it right.  (By analogy, pinning in memory, and I'm
similarly divided.)

A

-- 
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke


Re: How to keep a table in memory?

From
"Kevin Grittner"
Date:
>>> On Tue, Nov 13, 2007 at  2:05 PM, in message
<20071113200508.GX11563@crankycanuck.ca>, Andrew Sullivan <ajs@crankycanuck.ca>
wrote:
> On Tue, Nov 13, 2007 at 02:36:14PM -0500, Greg Smith wrote:
>
>> I see this as similar to the old optimizer hint argument, where there
>> certainly exist some edge cases where people know something the optimizer
>> doesn't which changes the optimal behavior.
>
> . . .the abuse of such hints in applications I have seen is so rampant as to
> make me doubt the utility of adding them anyway.  It's true that by adding
> hints, you give a facility to a good, competent designer who has a really
> peculiar case that no general purpose system is likely to solve well.  In
> practice, however, it also seems to mean that every slack-jawed fool with
> access to the manual thinks that he or she is going to "fix" the "broken"
> query plan by forcing index scans where they're useless (has a week yet gone
> by where someone doesn't post to -performance with that problem?).  So I'm
> divided on whether actually providing the facility is a good idea, even
> though I can think of a handful of cases where I doubt even the smartest
> planner will get it right.  (By analogy, pinning in memory, and I'm
> similarly divided.)
I have trouble not seeing the point of any posts in this thread.
Under our old, commercial database product, we had performance
problems we addressed with a "named caches" feature -- you could
declare a named cache of a particular size, and tweak some
characteristics of it, then bind objects to it.  We came up with
several theories of how we could use them to improve on the default
LRU logic, and carefully tested.  About half of these ideas made
things worse; about half made things better.  We used only the ones
that made things better for us with our usage patterns.  Part of
this involved using a cache small enough to fully contain all of
the heavily referenced tables we bound to it.
The proof of the benefit was that occasionally these settings got
lost through errors in machine builds or upgrades.  The users would
start calling immediately, complaining about the performance; they
were happy again when we restored the named cache configurations.
The lack of such tuning knobs made me more than a little nervous as
we moved toward switching to PostgreSQL, and I'm not sure that we
couldn't use them if they were available; but, PostgreSQL performs
so much better overall that it would be minimal compared to the
improvement we saw switching to PostgreSQL.
This leave me with sympathy for the concern from the original post,
but feeling that I should join the crowd suggesting that its best
to proceed on the assumption that such a tuning feature probably
isn't needed: proceed without it and post any actual performance
problems for advice.  If you can kludge heavier caching for the
objects in question and show an improvement in the metric which
matters for your purposes, perhaps you can convince people it's a
feature worth having, but expect that people will want to see
details and explore alternative solutions.
-Kevin



Re: How to keep a table in memory?

From
Ron Mayer
Date:
Heikki Linnakangas wrote:
> Luke Lonergan wrote:
>> Vacuum is a better thing to run, much less CPU usage.
> 
> Vacuum is actually not good for this purpose, because it's been
> special-cased to not bump the usage count.

Though the OS's page cache will still see it as accesses, no?


Re: How to keep a table in memory?

From
Heikki Linnakangas
Date:
Ron Mayer wrote:
> Heikki Linnakangas wrote:
>> Luke Lonergan wrote:
>>> Vacuum is a better thing to run, much less CPU usage.
>> Vacuum is actually not good for this purpose, because it's been
>> special-cased to not bump the usage count.
> 
> Though the OS's page cache will still see it as accesses, no?

Yes, if the pages were not already in shared buffers.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: How to keep a table in memory?

From
"Merlin Moncure"
Date:
On Nov 13, 2007 12:30 AM, Christopher Browne <cbbrowne@acm.org> wrote:
> Something I found *really* interesting was that whenever we pushed any
> "high traffic" systems onto PostgreSQL 8.1, I kept seeing measurable
> performance improvements taking place every day for a week.
>
> Evidently, it took that long for cache to *truly* settle down.
>
> Given that, and given that we've gotten a couple of good steps *more*
> sophisticated than mere LRU, I'm fairly willing to go pretty far down
> the "trust the shared memory cache" road.
>
> The scenario described certainly warrants doing some benchmarking; it
> warrants analyzing the state of the internal buffers over a period of
> time to see what is actually in them.

kinda along those lines I was wondering if you (or anybody else) could
refer me to some recent results demonstrating the good or bad effects
of going with low or high shared buffers settings.  there is a huge
amount of anecdotal lore on the topic that I have found more or less
impossible to measure on production systems, especially considering a
page fault to disk is much more interesting.

so, I personally configure buffers for what I think the fsm is going
to need plus a fudge, and that's about it...would love to see some
results supporting or refuting that methodology.

merlin


Re: How to keep a table in memory?

From
Simon Riggs
Date:
On Tue, 2007-11-13 at 14:36 -0500, Greg Smith wrote:
> On Tue, 13 Nov 2007, Andrew Sullivan wrote:
> 
> > I have to agree with what Tom says, however, about people thinking 
> > they're smarter than the system.  Much of the time, this sort of thumb 
> > on the scale optimisation just moves the cost to some other place
> 
> Sure, but in this case the reasoning seems sound enough.  The buffer 
> eviction policy presumes that all buffers cost an equal amount to read 
> back in again.  Here we have an application where it's believed that's not 
> true:  the data on disk for this particular table has a large seek 
> component to it for some reason, it tends to get read in large chunks (but 
> not necessairly frequently), and latency on that read is critical to 
> business requirements.  "The system" doesn't know that, and it's 
> impractical to make it smart enough to figure it out on its own, so asking 
> how to force that is reasonable.

It seems possible to imagine a different buffer eviction policy based
upon tablespace, block type, peak rather than latest usage pattern etc..

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: How to keep a table in memory?

From
"Zeugswetter Andreas ADI SD"
Date:
Kevin Grittner wrote:
> > . . .the abuse of such hints in applications I have seen is so
rampant as to
> > make me doubt the utility of adding them anyway.  It's true that by
adding
> > hints, you give a facility to a good, competent designer who has a
really

> I have trouble not seeing the point of any posts in this thread.
> Under our old, commercial database product, we had performance
> problems we addressed with a "named caches" feature -- you could
> declare a named cache of a particular size, and tweak some
> characteristics of it, then bind objects to it.  We came up with

Seems you simply fall in the competent category :-)

I know that another commercial product had introduced a pin table into
memory
feature for a few years, but dropped it again in the current release.
It seems the potential for wrongdoing is significant :-(
At least a "lock this table into memory" must be accompanied by an
"allow a max percentage of buffercache" and something that loads the
table on startup. But what do you do if it does not fit ?
Caching only parts of the table is useless for the mentioned use-case.

One aspect that has not been addressed is whether there is a way to
cluster/partition the table in a way that reduces/clusters the number of
pages that need to
be fetched by these not frequent enough but performance critical queries
?

This may solve the problem with a different approach.

Andreas


Re: How to keep a table in memory?

From
Bruce Momjian
Date:
Added to TODO:

>
> * Consider allowing higher priority queries to have referenced buffer
>   cache pages stay in memory longer
>
>   http://archives.postgresql.org/pgsql-hackers/2007-11/msg00562.php

---------------------------------------------------------------------------

Tom Lane wrote:
> Devrim GÜNDÜZ <devrim@CommandPrompt.com> writes:
> > So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
> > approach for the people who are asking to keep their objects on RAM,
> > even though I know that there is nothing we can say right now.
> 
> Well, nothing is a 100% solution.  But my opinion is that people who
> think they are smarter than an LRU caching algorithm are typically
> mistaken.  If the table is all that heavily used, it will stay in memory
> just fine.  If it's not sufficiently heavily used to stay in memory
> according to an LRU algorithm, maybe the memory space really should be
> spent on something else.
> 
> Now there are certainly cases where a standard caching algorithm falls
> down --- the main one I can think of offhand is where you would like to
> give one class of queries higher priority than another, and so memory
> space should preferentially go to tables that are needed by the first
> class.  But if that's your problem, "pin these tables in memory" is
> still an awfully crude solution to the problem.  I'd be inclined to
> think instead about a scheme that lets references made by
> higher-priority queries bump buffers' use-counts by more than 1,
> or some other way of making the priority considerations visible to an
> automatic cache management algorithm.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +