Thread: Is There Any Way ....

Is There Any Way ....

From

"Lane Van Ingen"

Date:

29 September 2005, 20:21:12

... to do the following:
  (1) Make a table memory-resident only ?
  (2) Set up user variables in memory that are persistent across all
sessions, for
      as long as the database is up and running ?
  (3) Assure that a disk-based table is always in memory (outside of keeping
it in
      memory buffers as a result of frequent activity which would prevent
LRU
      operations from taking it out) ?

Re: Is There Any Way ....

From

"Dario"

Date:

29 September 2005, 21:22:23

1) AFAIK, no. Just in case you are thinking "There should be a way coz I
know it will be used all the time", you must know that postgresql philosophy
is "I'm smarter than you". If table is used all the time, it will be in
memory, if not, it won't waste memory.
2) don't know.
3) see number 1) Of course, you could run into a pathological case where
table is queried just before being taken out of memory. But it means, the
table isn't queried all the time...

Greetings...



-----Mensaje original-----
De: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]En nombre de Lane Van
Ingen
Enviado el: jueves, 29 de septiembre de 2005 20:21
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] Is There Any Way ....


... to do the following:
  (1) Make a table memory-resident only ?
  (2) Set up user variables in memory that are persistent across all
sessions, for
      as long as the database is up and running ?
  (3) Assure that a disk-based table is always in memory (outside of keeping
it in
      memory buffers as a result of frequent activity which would prevent
LRU
      operations from taking it out) ?



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Is There Any Way ....

From

mudfoot@rawbw.com

Date:

29 September 2005, 21:57:22

Quoting Lane Van Ingen <lvaningen@esncc.com>:

> ... to do the following:
>   (1) Make a table memory-resident only ?

Put it on a RAM filesystem.  On Linux, shmfs.  On *BSD, mfs.  Solaris, tmpfs.

>   (2) Set up user variables in memory that are persistent across all
> sessions, for
>       as long as the database is up and running ?

This sounds like a client thing?  Dunno.

>   (3) Assure that a disk-based table is always in memory (outside of
> keeping
> it in
>       memory buffers as a result of frequent activity which would prevent
> LRU
>       operations from taking it out) ?
>

Put on RAM fs (like question 1).

Basically, RAM filesystems are on RAM, meaning you need to have enough physical
memory to support them.  And of course their contents completely disappear
between reboots, so you'll need a way to populate them on bootup and make sure
that your updates go to a real nonvolatile storage medium (like disks).  And you
might get swapping on some types of memory filesystems--Solaris' tmpfs is carved
out of virtual memory, which means it will cause swapping if tmpfs contents plus
the rest of your applications exceed physical memory.

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: Is There Any Way ....

From

"Steinar H. Gunderson"

Date:

29 September 2005, 22:46:19

On Thu, Sep 29, 2005 at 07:21:08PM -0400, Lane Van Ingen wrote:
>   (1) Make a table memory-resident only ?

You might want to look into memcached, but it's impossible to say whether it
will fit your needs or not without more details.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Is There Any Way ....

From

Richard Huxton

Date:

30 September 2005, 04:20:28

Lane Van Ingen wrote:
>   (2) Set up user variables in memory that are persistent across all
> sessions, for
>       as long as the database is up and running ?

You could probably write "C" functions (or possibly Perl) to store data
in shared memory. Of course you'd have to deal with concurrency issues
yourself.

Storing the values in a table and having cached access to them during
the session is probably your best bet.

--
   Richard Huxton
   Archonet Ltd

Re: Is There Any Way ....

From

Stefan Weiss

Date:

04 October 2005, 07:33:57

On 2005-09-30 01:21, Lane Van Ingen wrote:
>   (3) Assure that a disk-based table is always in memory (outside of keeping
> it in
>       memory buffers as a result of frequent activity which would prevent
> LRU
>       operations from taking it out) ?

I was wondering about this too. IMO it would be useful to have a way to tell
PG that some tables were needed frequently, and should be cached if
possible. This would allow application developers to consider joins with
these tables as "cheap", even when querying on columns that are not indexed.
I'm thinking about smallish tables like users, groups, *types, etc which
would be needed every 2-3 queries, but might be swept out of RAM by one
large query in between. Keeping a table like "users" on a RAM fs would not
be an option, because the information is not volatile.

cheers,
stefan

Re: Is There Any Way ....

From

"Lane Van Ingen"

Date:

04 October 2005, 12:01:27

Yes, Stefan, the kind of usage you are mentioning is exactly why I was
asking.

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Stefan Weiss
Sent: Tuesday, October 04, 2005 6:32 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way ....

On 2005-09-30 01:21, Lane Van Ingen wrote:
>   (3) Assure that a disk-based table is always in memory (outside of
keeping
> it in
>       memory buffers as a result of frequent activity which would prevent
> LRU
>       operations from taking it out) ?

I was wondering about this too. IMO it would be useful to have a way to tell
PG that some tables were needed frequently, and should be cached if
possible. This would allow application developers to consider joins with
these tables as "cheap", even when querying on columns that are not indexed.
I'm thinking about smallish tables like users, groups, *types, etc which
would be needed every 2-3 queries, but might be swept out of RAM by one
large query in between. Keeping a table like "users" on a RAM fs would not
be an option, because the information is not volatile.

cheers,
stefan

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: Is There Any Way ....

From

"Jim C. Nasby"

Date:

04 October 2005, 18:23:20

On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of keeping
> > it in
> >       memory buffers as a result of frequent activity which would prevent
> > LRU
> >       operations from taking it out) ?
>
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.

Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Is There Any Way ....

From

Mark Lewis

Date:

04 October 2005, 19:24:28

Which version of PG are you using?  One of the new features for 8.0 was
an improved caching algorithm that was smart enough to avoid letting a
single big query sweep everything else out of cache.

-- Mark Lewis


On Tue, 2005-10-04 at 10:45 -0400, Lane Van Ingen wrote:
> Yes, Stefan, the kind of usage you are mentioning is exactly why I was
> asking.
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Stefan Weiss
> Sent: Tuesday, October 04, 2005 6:32 AM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Is There Any Way ....
>
>
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of
> keeping
> > it in
> >       memory buffers as a result of frequent activity which would prevent
> > LRU
> >       operations from taking it out) ?
>
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.
>
>
> cheers,
> stefan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Is There Any Way ....

From

Ron Peacetree

Date:

04 October 2005, 20:38:05

pg is _very_ stupid about caching.  Almost all of the caching is left
to the OS, and it's that way by design (as post after post by TL has
pointed out).

That means pg has almost no ability to take application domain
specific knowledge into account when deciding what to cache.
There's plenty of papers on caching out there that show that
context dependent knowledge leads to more effective caching
algorithms than context independent ones are capable of.

(Which means said design choice is a Mistake, but unfortunately
one with too much inertia behind it currentyl to change easily.)

Under these circumstances, it is quite possible that an expert class
human could optimize memory usage better than the OS + pg.

If one is _sure_ they know what they are doing, I'd suggest using
tmpfs or the equivalent for critical read-only tables.  For "hot"
tables that are rarely written to and where data loss would not be
a disaster, "tmpfs" can be combined with an asyncronous writer
process push updates to HD.  Just remember that a power hit
means that

The (much) more expensive alternative is to buy SSD(s) and put
the critical tables on it at load time.

Ron

-----Original Message-----
From: "Jim C. Nasby" <jnasby@pervasive.com>
Sent: Oct 4, 2005 4:57 PM
To: Stefan Weiss <spaceman@foo.at>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way ....

On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of keeping
> > it in
> >       memory buffers as a result of frequent activity which would prevent
> > LRU
> >       operations from taking it out) ?
>
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.

Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.

Re: Is There Any Way ....

From

"Jim C. Nasby"

Date:

04 October 2005, 21:01:11

On Tue, Oct 04, 2005 at 07:33:47PM -0400, Ron Peacetree wrote:
> pg is _very_ stupid about caching.  Almost all of the caching is left
> to the OS, and it's that way by design (as post after post by TL has
> pointed out).
>
> That means pg has almost no ability to take application domain
> specific knowledge into account when deciding what to cache.
> There's plenty of papers on caching out there that show that
> context dependent knowledge leads to more effective caching
> algorithms than context independent ones are capable of.
>
> (Which means said design choice is a Mistake, but unfortunately
> one with too much inertia behind it currentyl to change easily.)
>
> Under these circumstances, it is quite possible that an expert class
> human could optimize memory usage better than the OS + pg.

Do you have any examples where this has actually happened? Especially
with 8.x, which isn't all that 'stupid' about how it handles buffers?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Is There Any Way ....

From

"Douglas J. Trainor"

Date:

04 October 2005, 21:44:50

Ron Peacetree sounds like someone talking out of his _AZZ_.
He can save his unreferenced flapdoodle for his SQL Server
clients.  Maybe he will post references so that we may all
learn at the feet of Master Peacetree.  :-)

     douglas

On Oct 4, 2005, at 7:33 PM, Ron Peacetree wrote:

> pg is _very_ stupid about caching.  Almost all of the caching is left
> to the OS, and it's that way by design (as post after post by TL has
> pointed out).
>
> That means pg has almost no ability to take application domain
> specific knowledge into account when deciding what to cache.
> There's plenty of papers on caching out there that show that
> context dependent knowledge leads to more effective caching
> algorithms than context independent ones are capable of.
>
> (Which means said design choice is a Mistake, but unfortunately
> one with too much inertia behind it currentyl to change easily.)
>
> Under these circumstances, it is quite possible that an expert class
> human could optimize memory usage better than the OS + pg.
>
> If one is _sure_ they know what they are doing, I'd suggest using
> tmpfs or the equivalent for critical read-only tables.  For "hot"
> tables that are rarely written to and where data loss would not be
> a disaster, "tmpfs" can be combined with an asyncronous writer
> process push updates to HD.  Just remember that a power hit
> means that
>
> The (much) more expensive alternative is to buy SSD(s) and put
> the critical tables on it at load time.
>
> Ron
>
>
> -----Original Message-----
> From: "Jim C. Nasby" <jnasby@pervasive.com>
> Sent: Oct 4, 2005 4:57 PM
> To: Stefan Weiss <spaceman@foo.at>
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Is There Any Way ....
>
> On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:
>> On 2005-09-30 01:21, Lane Van Ingen wrote:
>>>   (3) Assure that a disk-based table is always in memory (outside of
>>> keeping
>>> it in
>>>       memory buffers as a result of frequent activity which would
>>> prevent
>>> LRU
>>>       operations from taking it out) ?
>>
>> I was wondering about this too. IMO it would be useful to have a way
>> to tell
>> PG that some tables were needed frequently, and should be cached if
>> possible. This would allow application developers to consider joins
>> with
>> these tables as "cheap", even when querying on columns that are not
>> indexed.
>> I'm thinking about smallish tables like users, groups, *types, etc
>> which
>> would be needed every 2-3 queries, but might be swept out of RAM by
>> one
>> large query in between. Keeping a table like "users" on a RAM fs
>> would not
>> be an option, because the information is not volatile.
>
> Why do you think you'll know better than the database how frequently
> something is used? At best, your guess will be correct and PostgreSQL
> (or the kernel) will keep the table in memory. Or, your guess is wrong
> and you end up wasting memory that could have been used for something
> else.
>
> It would probably be better if you describe why you want to force this
> table (or tables) into memory, so we can point you at more appropriate
> solutions.

Re: Is There Any Way ....

From

"Joshua D. Drake"

Date:

04 October 2005, 22:30:02

Douglas J. Trainor wrote:

>
> Ron Peacetree sounds like someone talking out of his _AZZ_.
> He can save his unreferenced flapdoodle for his SQL Server
> clients.  Maybe he will post references so that we may all
> learn at the feet of Master Peacetree.  :-)

Although I agree that I would definitely like to see some test cases
for what Ron is talking about, I don't think that resorting to insults
is going to help the situation.

Ron, if you would please -- provide some test cases for what you are
describing I am sure that anyone would love to see them. We are all
for improving PostgreSQL.

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: Is There Any Way ....

From

Ron Peacetree

Date:

05 October 2005, 00:07:06

Unfortunately, no matter what I say or do, I'm not going to please
or convince anyone who has already have made their minds up
to the extent that they post comments like Mr Trainor's below.
His response style pretty much proves my earlier point that this
is presently a religious issue within the pg community.

The absolute best proof would be to build a version of pg that does
what Oracle and DB2 have done and implement it's own DB
specific memory manager and then compare the performance
between the two versions on the same HW, OS, and schema.

The second best proof would be to set up either DB2 or Oracle so
that they _don't_ use their memory managers and compare their
performance to a set up that _does_ use said memory managers
on the same HW, OS, and schema.

I don't currently have the resources for either experiment.

Some might even argue that IBM (where Codd and Date worked)
and Oracle just _might_ have had justification for the huge effort
they put into developing such infrastructure.

Then there's the large library of research on caching strategies
in just about every HW and SW domain, including DB theory,
that points put that the more context dependent, ie application
or domain specific awareness, caching strategies are the better
they are.

Maybe after we do all we can about physical IO and sorting
performance I'll take on the religious fanatics on this one.

One problem set at a time.
Ron

-----Original Message-----
From: "Joshua D. Drake" <jd@commandprompt.com>
Sent: Oct 4, 2005 9:32 PM
To: "Douglas J. Trainor" <trainor@transborder.net>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way ....

Douglas J. Trainor wrote:

>
> Ron Peacetree sounds like someone talking out of his _AZZ_.
> He can save his unreferenced flapdoodle for his SQL Server
> clients.  Maybe he will post references so that we may all
> learn at the feet of Master Peacetree.  :-)

Although I agree that I would definitely like to see some test cases
for what Ron is talking about, I don't think that resorting to insults
is going to help the situation.

Ron, if you would please -- provide some test cases for what you are
describing I am sure that anyone would love to see them. We are all
for improving PostgreSQL.

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Is There Any Way ....

From

Steve Atkins

Date:

05 October 2005, 00:52:55

On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:

> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure.

The OS and FS world is very, very different now than it was when
the Oracle and DB2 architectures were being crafted. What may have
been an excellent development effort then may not provide such good
ROI now.

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
>
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.

Actually, the main "religious fanatic" I've seen recently is yourself.
While I have a gut feel that some of the issues you raise could
certainly do with further investigation, I'm not seeing that much from
you other than statements that muchof what postgresql does is wrong
(not "wrong for your Ron's use case", but "wrong in every respect").

A little less arrogance and a little more "here are some possibilities
for improvement", "here is an estimate of the amount of effort that
might be needed" and "here are some rough benchmarks showing the
potential return on that investment" would, at the very least, make
the threads far less grating to read.

Cheers,
  Steve

Re: Is There Any Way ....

From

mark@mark.mielke.cc

Date:

05 October 2005, 01:05:27

On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:
> Unfortunately, no matter what I say or do, I'm not going to please
> or convince anyone who has already have made their minds up
> to the extent that they post comments like Mr Trainor's below.
> His response style pretty much proves my earlier point that this
> is presently a religious issue within the pg community.

Religious for some. Conservative for others.

Sometimes people need to see the way, before they are willing to
accept it merely on the say so of another person. In some circles, it
is called the scientific method... :-)

Also, there is a cost to complicated specific optimizations. They can
be a real maintenance and portability head-ache. What is the value ratio
of performance to maintenance or portability?

> The absolute best proof would be to build a version of pg that does
> what Oracle and DB2 have done and implement it's own DB
> specific memory manager and then compare the performance
> between the two versions on the same HW, OS, and schema.

Not necessarily. Even if a version of PostgreSQL were to be written to
function in this new model, there would be no guarantee that it was
written in the most efficient manner possible. Performance could show
PostgreSQL using its own caching, and disk space management
implementation, and performing poorly. The only true, and accurate way
would be to implement, and then invest time by those most competent to
test, and optimize the implementation. At this point, it would become
a moving target, as those who believe otherwise, would be free to
pursue using more efficient file systems, or modifications to the
operating system to better co-operate with PostgreSQL.

I don't think there can be a true answer to this one. The more
innovative, and clever people, will always be able to make their
solution work better. If the difference in performance was really so
obvious, there wouldn't be doubters on either side. It would be clear
to all. The fact is, there is reason to doubt. Perhaps not doubt that
the final solution would be more efficient, but rather, the reason
to doubt that the difference in efficiency would be significant.

> The second best proof would be to set up either DB2 or Oracle so
> that they _don't_ use their memory managers and compare their
> performance to a set up that _does_ use said memory managers
> on the same HW, OS, and schema.

Same as above. If Oracle was designed to work with the functionality,
then disabling the functionality, wouldn't prove that an efficient
design would perform equally poorly, or even, poorly at all. I think
it would be obvious that Oracle would have invested most of their
dollars into the common execution paths, with the expected
functionality present.

> I don't currently have the resources for either experiment.

This is the real problem. :-)

> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure.

Or, not. They might just have more money to throw at the problem,
and be entrenched into their solution to the point that they need
to innovate to ensure that their solution appears to be the best.

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.

A lot of this is theory. It may be good theory, but there is no
guarantee that the variables listed in these theories match, or
properly estimate the issues that would be found in a real
implementation.

> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.
> One problem set at a time.

In any case, I'm on your side - in theory. Intuitively, I don't
understand how anybody could claim that a general solution could ever
be faster than a specific solution. Anybody who claimed this, would
go down in my books as a fool. It should be obvious to these people
that, as an extreme, the entire operating system caching layer, and
the entire file system layer could be inlined into PostgreSQL,
avoiding many of the expenses involved in switching back and forth
between user space and system space, leaving a more efficient,
although significantly more complicated solution.

Whether by luck, or by experience of those involved, I haven't seen
any senior PostgreSQL developers actually stating that it couldn't be
faster. Instead, I've seen it claimed that the PostgreSQL developers
don't have the resources to attack this problem, as there are other
far more pressing features, product defects, and more obviously
beneficial optimization opportunities to work on. Memory management,
or disk management, is "good enough" as provided by decent operating
systems, and the itch just isn't bad enough to scratch yet. They
remain unconvinced that the gain in performance, would be worth the
cost of maintaining this extra complexity in the code base.

If you believe the case can be made, it is up to you to make it.

Cheers!
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Is There Any Way ....

From

"Kevin Grittner"

Date:

05 October 2005, 03:36:12

First off, Mr. Trainor's response proves nothing about anyone or
anything except Mr. Trainor.

I'm going to offer an opinion on the caching topic.  I don't have
any benchmarks; I'm offering a general sense of the issue based on
decades of experience, so I'll give a short summary of that.

I've been earning my living by working with computers since 1972,
and am the architect and primary author of a little-known
database product (developed in 1984) which saw tens of thousands
of installations in various vertical markets.  (Last I checked, a
couple years ago, it was still being used statewide by one state
government after a multi-million dollar attempt to replace it with a
popular commercial database product failed.)  I've installed and
tuned many other database products over the years.  I'm just getting
to know PostgreSQL, and am pretty excited about it.

Now on to the meat of it.

My experience is that a DBMS can improve performance by caching
certain types of data.  In the product I developed, we had a fairly
small cache which used a weighting algorithm for what to keep
(rather than simply relying on LRU).  Index pages got higher weight
than data pages; the higher in the index, the higher the weight.
Recent access got higher weight than older access, although it took
quite a while for the older access to age out entirely. This
improved performance quite a bit over a generalized caching
product alone.

However, there was a point of diminishing return.  My sense is that
every 10% you add to a "smart" cache yields less benefit at a
higher cost, so beyond a certain point, taking RAM from the general
cache to expand the smart cache degrades performance.  Clever
programming techniques can shift the break-even point, but I very
much doubt it can be eliminated entirely, unless the ratio of
performance between CPU+RAM and persistent storage is much
more extreme than I've ever seen.

There is another issue, which has been raised from time to time in
these lists, but not enunciated clearly enough in my view.  These
discussions about caching generally address maximum throughput,
while there are times when it is important that certain tables can
be queried very quickly, even if it hurts overall throughput.  As an
example, there can be tables which are accessed as a user types in
a window and tabs around from one GUI control to another.  The
user perception of the application performance is going to depend
PRIMARILY on how quickly the GUI renders the results of these
queries; if the run time for a large report goes up by 10%, they
will probably not notice.  This is a situation where removing RAM
from a generalized cache, or one based on database internals, to
create an "application specific" cache can yield big rewards.

One client has addressed this in a commercial product by defining
a named cache large enough to hold these tables, and binding those
tables to the cache.  One side benefit is that such caches can be
defined as "relaxed LRU" -- meaning that they eliminate the
overhead of tracking accesses, since they can assume that data will
rarely, if ever, be discarded from the cache.

It seems to me that in the PostgreSQL world, this would currently
be addressed by binding the tables to a tablespace where the file
system, controller, or drive(s) would cache the data, although this
is somewhat less flexible than the "named cache" approach -- unless
there is a file system that can layer a cache on top of a reference to
some other file system's space.  (And let's not forget the many OS
environments in which people use PostgreSQL.)  So I do see that
there would be benefit to adding a feature to PostgreSQL to define
caches and bind tables or indexes to them.

So I do think that it is SMART of PostgreSQL to rely on the
increasingly sophisticated file systems to provide the MAIN cache.
I suspect that a couple types of smaller "smart" caches in front of
this could boost performance, and it might be a significant boost.
I'm not sure what the current shared memory is used for; perhaps
this is already caching specific types of structures for the DBMS.
I'm pretty sure that programmers of GUI apps would appreciate the
named cache feature, so they could tune the database for snappy
GUI response, even under heavy load.

I realize this is short on specifics -- I'm shooting for perspective.
For the record, I don't consider myself particularly religious on the
topic, but I do pull back a little at arguments which sound strictly
academic -- I've found that most of what I've drawn from those
circles has needed adjustment in solving real-world problems.
(Particularly when algorithms optimize for best worst-case
performance.  I've found users are much happier with best typical
case performance as long as the product of worst case performance
and worst case frequency is low.)

Like many others who have posted on the topic, I am quite
prepared to alter my views in the face of relavent evidence.

Feel free to laugh at the old fart who decided to sip his Bushmill's
while reading through this thread and try to run with the young lions.
As someone else recently requested, though, please don't point while
you laugh -- that's just rude.   :-)

-Kevin


>>> Ron Peacetree <rjpeace@earthlink.net> 10/04/05 10:06 PM >>>
Unfortunately, no matter what I say or do, I'm not going to please
or convince anyone who has already have made their minds up
to the extent that they post comments like Mr Trainor's below.
His response style pretty much proves my earlier point that this
is presently a religious issue within the pg community.

Re: Is There Any Way ....

From

"Douglas J. Trainor"

Date:

05 October 2005, 08:04:41

Hey, you can say what you want about my style, but you
still haven't pointed to even one article from the vast literature
that you claim supports your argument.  And I did include a
smiley.  Your original email that PostgreSQL is wrong and
that you are right led me to believe that you, like others making
such statements, would not post your references.  You remind
me of Ted Nelson, who wanted the computing center at
the University of Illinois at Chicago to change their systems
just for him.  BTW, I'm a scientist -- I haven't made my mind
up about anything.  I really am interested in what you say,
if there is any real work backing up your claims such that
it would impact average cases.

Any app designer can conceive of many ways to game the
server to their app's advantage -- I'm not interested in that
potboiler.

     douglas

On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote:

> Unfortunately, no matter what I say or do, I'm not going to please
> or convince anyone who has already have made their minds up
> to the extent that they post comments like Mr Trainor's below.
> His response style pretty much proves my earlier point that this
> is presently a religious issue within the pg community.
>
> The absolute best proof would be to build a version of pg that does
> what Oracle and DB2 have done and implement it's own DB
> specific memory manager and then compare the performance
> between the two versions on the same HW, OS, and schema.
>
> The second best proof would be to set up either DB2 or Oracle so
> that they _don't_ use their memory managers and compare their
> performance to a set up that _does_ use said memory managers
> on the same HW, OS, and schema.
>
> I don't currently have the resources for either experiment.
>
> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure.
>
> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
>
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.
>
> One problem set at a time.
> Ron

Re: Is There Any Way ....

From

"Dario"

Date:

05 October 2005, 08:16:28

I'm sure there will be cases when some human assisted caching algorithm will
perform better than an mathetical statistical based design, but it will also
depend on the "human". And it probably will make thing worse when workload
changes and human doesn't realize. It must be considered that, today,
hardware cost is not the %90 of budget that it used to be. Throwing hardware
at the system can be as much expensive as throwing certified "it stuff".
(just think in coffee budget! :-) )

If you need to improve "user perception", you can do others things. Like
caching a table in your client  (with a trigger for any change on table X
updating a table called "timestamp_table_change" and a small select to this
table, you can easily know when you must update your client). If it is a
application server, serving http request, then "user perception" will be
sticked to bandwidth AND application server (some of them have cache for
request).

FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle
has some structures to allow that. (you know) It uses his own buffer. Since
version 9i, you can set three different data buffers, one (recycled cache)
for low usage tables (I mean tables with blocks which don't have too much
chance to be queried again, like a very large historical table) , one for
high usage tables (keep cache), and the regular one (difference is in
algorithm). And you must also set a buffer cache size for tablespaces with
different block size. But there is no such thing as "create table x keep
entirenly in buffer". And above all things, oracle doc always states "first,
tune design, then tune queries, then start tunning engine".

greetings.

Re: Is There Any Way ....

From

"Kevin Grittner"

Date:

05 October 2005, 10:51:53

** Low Priority **

Human feedback from testers and users has proven pretty effective
at catching errors in the "human assisted" cache configuration.  When
people setting up the servers have missed the named cache configuration,
and all they had was the single general purpose cache, it has been caught
because of user complaints on performance.

There was an attempt made to simulate database queries -- hitting a
client side cache on some of the roughly100 tables (out of 300 in the well
normalized schema) which fit this pattern of usage.  It didn't prove very
cost effective.  It just makes more sense to allow the DBAs to tweek
database performance through database configuration changes than to
jump through that many hoops in application code to try to achieve it
where it becomes an issue.

As far as I know, you can't use this technique in Microsoft SQL Server or
Oracle.  They are using Sybase Adaptive Server Enterprise (ASE).  I
believe named caches were added in version 12.0, long after Microsoft
split off with their separate code stream based on the Sybase effort.

-Kevin


>>> "Dario" <dario_d_s@unitech.com.ar> 10/05/05 6:16 AM >>>
I'm sure there will be cases when some human assisted caching algorithm will
perform better than an mathetical statistical based design, but it will also
depend on the "human". And it probably will make thing worse when workload
changes and human doesn't realize. It must be considered that, today,
hardware cost is not the %90 of budget that it used to be. Throwing hardware
at the system can be as much expensive as throwing certified "it stuff".
(just think in coffee budget! :-) )

If you need to improve "user perception", you can do others things. Like
caching a table in your client  (with a trigger for any change on table X
updating a table called "timestamp_table_change" and a small select to this
table, you can easily know when you must update your client). If it is a
application server, serving http request, then "user perception" will be
sticked to bandwidth AND application server (some of them have cache for
request).

FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle
has some structures to allow that. (you know) It uses his own buffer. Since
version 9i, you can set three different data buffers, one (recycled cache)
for low usage tables (I mean tables with blocks which don't have too much
chance to be queried again, like a very large historical table) , one for
high usage tables (keep cache), and the regular one (difference is in
algorithm). And you must also set a buffer cache size for tablespaces with
different block size. But there is no such thing as "create table x keep
entirenly in buffer". And above all things, oracle doc always states "first,
tune design, then tune queries, then start tunning engine".

greetings.




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: Is There Any Way ....

From

Frank Wiles

Date:

05 October 2005, 12:35:28

On Tue, 4 Oct 2005 23:06:54 -0400 (EDT)
Ron Peacetree <rjpeace@earthlink.net> wrote:

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.

  Isn't this also a very strong argument for putting your caching
  into your application and not at the database level?

  As you say the more "application or domain specific" it is the better.
  I don't see how PostgreSQL is going to magically determine what
  is perfect for everyone's differing needs and implement it for you.

  Even rudimentary controls such "always keep this
  table/index/whatever in RAM" aren't as fine grained or specific
  enough to get full benefit.

  My suggestion is to use something like memcached to store your
  data in, based on the particular needs of your application.  This
  puts all of the control in the hands of the programmer where, in
  my opinion, it belongs.

  Just to clarify, I'm not entirely against the idea, but I certainly
  think there are other areas of PostgreSQL we should be focusing our
  efforts.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------

Re: Is There Any Way ....

From

Ron Peacetree

Date:

05 October 2005, 14:12:32

From: Kevin Grittner <Kevin.Grittner@wicourts.gov>
Sent: Oct 5, 2005 2:16 AM
Subject: Re: [PERFORM] Is There Any Way ....

>First off, Mr. Trainor's response proves nothing about anyone or
>anything except Mr. Trainor.
>
Fair Enough.  I apologize for the inappropriately general statement.

>I'm going to offer an opinion on the caching topic.  I don't have
>any benchmarks; I'm offering a general sense of the issue based on
>decades of experience, so I'll give a short summary of that.
>
>I've been earning my living by working with computers since 1972,
>
~1978 for me.  So to many on this list, I also would be an "old fart".

<description of qualifications snipped>
>
I've pretty much spent my entire career thinking about and making
advances in RW distributed computing and parallel processing as
first a programmer and then a systems architect.

>Now on to the meat of it.
<excellent and fair handed overall analysis snipped>
>
I agree with your comments just about across the board.

I also agree with the poster(s) who noted that the "TLC factor" and the
2x every 18months pace of increasing HW performance and RAM capacity
make this stuff a moving target.

OTOH, there are some fundamentals that don't seem to change no
matter how far or fast the computing field evolves.

As usual, the proper answers involve finding a sometimes nontrivial
balance between building on known precedent and not being trapped
by doctrine.

Ron

Re: Is There Any Way ....

From

"Douglas J. Trainor"

Date:

05 October 2005, 17:01:51

A blast from the past is forwarded below.


    douglas


Begin forwarded message:


<excerpt><bold><color><param>0000,0000,0000</param>From:
</color></bold>Tom Lane <<tgl@sss.pgh.pa.us>

<bold><color><param>0000,0000,0000</param>Date: </color></bold>August
23, 2005 3:23:43 PM EDT

<bold><color><param>0000,0000,0000</param>To: </color></bold>Donald
Courtney <<Donald.Courtney@sun.com>

<bold><color><param>0000,0000,0000</param>Cc:
</color></bold>pgsql-performance@postgresql.org, Frank Wiles
<<frank@wiles.org>, gokulnathbabu manoharan <<gokulnathbabu@yahoo.com>

<bold><color><param>0000,0000,0000</param>Subject: </color>Re:
[PERFORM] Caching by Postgres

</bold>

Donald Courtney <<Donald.Courtney@Sun.COM> writes:

<excerpt>I am not alone in having the *expectation* that a database
should have

some cache size parameter and the option to skip the file system.  If

I use oracle, sybase, mysql and maxdb they all have the ability to

size a data cache and move to 64 bits.

</excerpt>

And you're not alone in holding that opinion despite having no shred

of evidence that it's worthwhile expanding the cache that far.


However, since we've gotten tired of hearing this FUD over and over,

8.1 will have the ability to set shared_buffers as high as you want.

I expect next we'll be hearing from people complaining that they

set shared_buffers to use all of RAM and performance went into the

tank ...


            regards, tom lane

</excerpt>


On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote:


<excerpt>Unfortunately, no matter what I say or do, I'm not going to
please

or convince anyone who has already have made their minds up

to the extent that they post comments like Mr Trainor's below.

His response style pretty much proves my earlier point that this

is presently a religious issue within the pg community.


The absolute best proof would be to build a version of pg that does

what Oracle and DB2 have done and implement it's own DB

specific memory manager and then compare the performance

between the two versions on the same HW, OS, and schema.


The second best proof would be to set up either DB2 or Oracle so

that they _don't_ use their memory managers and compare their

performance to a set up that _does_ use said memory managers

on the same HW, OS, and schema.


I don't currently have the resources for either experiment.


Some might even argue that IBM (where Codd and Date worked)

and Oracle just _might_ have had justification for the huge effort

they put into developing such infrastructure.


Then there's the large library of research on caching strategies

in just about every HW and SW domain, including DB theory,

that points put that the more context dependent, ie application

or domain specific awareness, caching strategies are the better

they are.


Maybe after we do all we can about physical IO and sorting

performance I'll take on the religious fanatics on this one.


One problem set at a time.

Ron

</excerpt>
A blast from the past is forwarded below.

     douglas

Begin forwarded message:

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: August 23, 2005 3:23:43 PM EDT
> To: Donald Courtney <Donald.Courtney@sun.com>
> Cc: pgsql-performance@postgresql.org, Frank Wiles <frank@wiles.org>,
> gokulnathbabu manoharan <gokulnathbabu@yahoo.com>
> Subject: Re: [PERFORM] Caching by Postgres
>
> Donald Courtney <Donald.Courtney@Sun.COM> writes:
>> I am not alone in having the *expectation* that a database should have
>> some cache size parameter and the option to skip the file system.  If
>> I use oracle, sybase, mysql and maxdb they all have the ability to
>> size a data cache and move to 64 bits.
>
> And you're not alone in holding that opinion despite having no shred
> of evidence that it's worthwhile expanding the cache that far.
>
> However, since we've gotten tired of hearing this FUD over and over,
> 8.1 will have the ability to set shared_buffers as high as you want.
> I expect next we'll be hearing from people complaining that they
> set shared_buffers to use all of RAM and performance went into the
> tank ...
>
>             regards, tom lane


On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote:

> Unfortunately, no matter what I say or do, I'm not going to please
> or convince anyone who has already have made their minds up
> to the extent that they post comments like Mr Trainor's below.
> His response style pretty much proves my earlier point that this
> is presently a religious issue within the pg community.
>
> The absolute best proof would be to build a version of pg that does
> what Oracle and DB2 have done and implement it's own DB
> specific memory manager and then compare the performance
> between the two versions on the same HW, OS, and schema.
>
> The second best proof would be to set up either DB2 or Oracle so
> that they _don't_ use their memory managers and compare their
> performance to a set up that _does_ use said memory managers
> on the same HW, OS, and schema.
>
> I don't currently have the resources for either experiment.
>
> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure.
>
> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
>
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.
>
> One problem set at a time.
> Ron

Re: Is There Any Way ....

From

"Craig A. James"

Date:

24 October 2005, 19:20:06

Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote:
> > Stefan Weiss wrote:
> > ... IMO it would be useful to have a way to tell
> > PG that some tables were needed frequently, and should be cached if
> > possible. This would allow application developers to consider joins with
> > these tables as "cheap", even when querying on columns that are
> > not indexed.
>
> Why do you think you'll know better than the database how frequently
> something is used? At best, your guess will be correct and PostgreSQL
> (or the kernel) will keep the table in memory. Or, your guess is wrong
> and you end up wasting memory that could have been used for something
> else.
>
> It would probably be better if you describe why you want to force this
> table (or tables) into memory, so we can point you at more appropriate
> solutions.

Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate
implementation. ;-) 

Ok, wittiness aside, here's a concrete example.  I have an application with one critical index that MUST remain in
memoryat all times.  The index's tablespace is about 2 GB.  As long as it's in memory, performance is excellent - a
user'squery takes a fraction of a second.  But if it gets swapped out, the user's query might take up to five minutes
asthe index is re-read from memory. 

Now here's the rub.  The only performance I care about is response to queries from the web application.  Everything
elseis low priority.  But there is other activity going on.  Suppose, for example, that I'm updating tables, performing
queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site.  The
anothercustomer comes along and performs a query. 

At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except my
webapplication.  The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevant
comparedto the performance of the customer's query. 

What actually happens is that the other activities have swapped out the critical index, and my customer waits, and
waits,and waits... and goes away after a minute or two.  To solve this, we've been forced to purchase two computers,
andmirror the database on both.  All administration and modification happens on the "offline" database, and the web
applicationonly uses the "online" database.  At some point, we swap the two servers, sync the two databases, and carry
on. It's a very unsatisfactory solution. 

There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a
mechanismto explicitely name the tables that should be locked in memory. 

Look at tsearchd that Oleg is working on.  It's a direct response to this problem.

It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our ability
topredict the behavior of an application is woefully inadequate compared with the application developer's knowledge of
theapplication.  Computer Science simply isn't a match for the human brain yet, not even close. 

To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the
responsesI received told me that they encountered this problem, and their solution was to use an external full-text
engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. 

Craig

Re: Is There Any Way ....

From

Alex Turner

Date:

24 October 2005, 20:11:14

This is possible with Oracle utilizing the keep pool

alter table t_name storage ( buffer_pool keep);

If Postgres were to implement it's own caching system, this seems like
it would be easily to implement (beyond the initial caching effort).

Alex


On 10/24/05, Craig A. James <cjames@modgraph-usa.com> wrote:
> Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote:
> > > Stefan Weiss wrote:
> > > ... IMO it would be useful to have a way to tell
> > > PG that some tables were needed frequently, and should be cached if
> > > possible. This would allow application developers to consider joins with
> > > these tables as "cheap", even when querying on columns that are
> > > not indexed.
> >
> > Why do you think you'll know better than the database how frequently
> > something is used? At best, your guess will be correct and PostgreSQL
> > (or the kernel) will keep the table in memory. Or, your guess is wrong
> > and you end up wasting memory that could have been used for something
> > else.
> >
> > It would probably be better if you describe why you want to force this
> > table (or tables) into memory, so we can point you at more appropriate
> > solutions.
>
> Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate
implementation. ;-) 
>
> Ok, wittiness aside, here's a concrete example.  I have an application with one critical index that MUST remain in
memoryat all times.  The index's tablespace is about 2 GB.  As long as it's in memory, performance is excellent - a
user'squery takes a fraction of a second.  But if it gets swapped out, the user's query might take up to five minutes
asthe index is re-read from memory. 
>
> Now here's the rub.  The only performance I care about is response to queries from the web application.  Everything
elseis low priority.  But there is other activity going on.  Suppose, for example, that I'm updating tables, performing
queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site.  The
anothercustomer comes along and performs a query. 
>
> At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except
myweb application.  The performance of all the other stuff, the administration, the updates, etc., is utterly
irrelevantcompared to the performance of the customer's query. 
>
> What actually happens is that the other activities have swapped out the critical index, and my customer waits, and
waits,and waits... and goes away after a minute or two.  To solve this, we've been forced to purchase two computers,
andmirror the database on both.  All administration and modification happens on the "offline" database, and the web
applicationonly uses the "online" database.  At some point, we swap the two servers, sync the two databases, and carry
on. It's a very unsatisfactory solution. 
>
> There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a
mechanismto explicitely name the tables that should be locked in memory. 
>
> Look at tsearchd that Oleg is working on.  It's a direct response to this problem.
>
> It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our
abilityto predict the behavior of an application is woefully inadequate compared with the application developer's
knowledgeof the application.  Computer Science simply isn't a match for the human brain yet, not even close. 
>
> To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the
responsesI received told me that they encountered this problem, and their solution was to use an external full-text
engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. 
>
> Craig
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Is There Any Way ....

From

Alan Stange

Date:

24 October 2005, 21:50:43

Alex Turner wrote:
> This is possible with Oracle utilizing the keep pool
>
> alter table t_name storage ( buffer_pool keep);
>
> If Postgres were to implement it's own caching system, this seems like
> it would be easily to implement (beyond the initial caching effort).
>
> Alex
>
>
> On 10/24/05, Craig A. James <cjames@modgraph-usa.com> wrote:
>
>> Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote:
>>
>>>> Stefan Weiss wrote:
>>>> ... IMO it would be useful to have a way to tell
>>>> PG that some tables were needed frequently, and should be cached if
>>>> possible. This would allow application developers to consider joins with
>>>> these tables as "cheap", even when querying on columns that are
>>>> not indexed.
>>>>
>>> Why do you think you'll know better than the database how frequently
>>> something is used? At best, your guess will be correct and PostgreSQL
>>> (or the kernel) will keep the table in memory. Or, your guess is wrong
>>> and you end up wasting memory that could have been used for something
>>> else.
>>>
>>> It would probably be better if you describe why you want to force this
>>> table (or tables) into memory, so we can point you at more appropriate
>>> solutions.
>>>
>> Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate
implementation. ;-) 
>>
>> Ok, wittiness aside, here's a concrete example.  I have an application with one critical index that MUST remain in
memoryat all times.  The index's tablespace is about 2 GB.  As long as it's in memory, performance is excellent - a
user'squery takes a fraction of a second.  But if it gets swapped out, the user's query might take up to five minutes
asthe index is re-read from memory. 
>>
>> Now here's the rub.  The only performance I care about is response to queries from the web application.  Everything
elseis low priority.  But there is other activity going on.  Suppose, for example, that I'm updating tables, performing
queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site.  The
anothercustomer comes along and performs a query. 
>>
>> At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except
myweb application.  The performance of all the other stuff, the administration, the updates, etc., is utterly
irrelevantcompared to the performance of the customer's query. 
>>
>> What actually happens is that the other activities have swapped out the critical index, and my customer waits, and
waits,and waits... and goes away after a minute or two.  To solve this, we've been forced to purchase two computers,
andmirror the database on both.  All administration and modification happens on the "offline" database, and the web
applicationonly uses the "online" database.  At some point, we swap the two servers, sync the two databases, and carry
on. It's a very unsatisfactory solution. 
We have a similar problem with vacuum being the equivalent of
"continuously flush all system caches for a long time".  Our database is
about 200GB in size and vacuums take hours and hours.   The performance
is acceptable still, but only because we've hidden the latency in our
application.

I've occasionally thought it would be good to have the backend doing a
vacuum or analyze also call priocntl() prior to doing any real work to
lower its priority.   We'll be switching to the 8.1 release ASAP just
because the direct IO capabilities are appearing to be a win on our
development system.

-- Alan