Thread: Multiple buffer cache?

Multiple buffer cache?

From
Alexei Vladishev
Date:
Greetings,

Is there a way of configuring PostgreSQL so that one specific table would
use, say, 4GB of buffer cache while other tables would use the rest?

I would like to keep the table and its indexes always in "hot" state, so
that
other queries won't pollute this part of the buffer cache. It would ensure
reliable performance and much less disk IOPS working with the table.

Is it possible?

Thanks for any hints!

Alexei

Re: Multiple buffer cache?

From
"Bret S. Lambert"
Date:
On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
> Greetings,
>
> Is there a way of configuring PostgreSQL so that one specific table would
> use, say, 4GB of buffer cache while other tables would use the rest?
>
> I would like to keep the table and its indexes always in "hot"
> state, so that
> other queries won't pollute this part of the buffer cache. It would ensure
> reliable performance and much less disk IOPS working with the table.

Fiddling with the buffer cache like that would require some sort of
OS support, if I'm not mistaken in what you're asking for.

And then, even if the support is there, you'd need to outline exactly
how you're planning on pushing this button.

Specifically, what's your usage pattern that would make this a
win for you?

If the table and its indexes can already fit into the buffer cache,
and it's as commonly accessed as you think it is, the OS should
probably have it cached anyway.

>
> Is it possible?
>
> Thanks for any hints!
>
> Alexei
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Multiple buffer cache?

From
Cédric Villemain
Date:
2010/2/6 Bret S. Lambert <bret.lambert@gmail.com>:
> On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
>> Greetings,
>>
>> Is there a way of configuring PostgreSQL so that one specific table would
>> use, say, 4GB of buffer cache while other tables would use the rest?
>>
>> I would like to keep the table and its indexes always in "hot"
>> state, so that
>> other queries won't pollute this part of the buffer cache. It would ensure
>> reliable performance and much less disk IOPS working with the table.
>
> Fiddling with the buffer cache like that would require some sort of
> OS support, if I'm not mistaken in what you're asking for.
>
> And then, even if the support is there, you'd need to outline exactly
> how you're planning on pushing this button.
>
> Specifically, what's your usage pattern that would make this a
> win for you?
>
> If the table and its indexes can already fit into the buffer cache,
> and it's as commonly accessed as you think it is, the OS should
> probably have it cached anyway.

that's all true.

I am working on pgfincore which allow in some way to prepare buffer cache.
You need pg > 8.4 and linux (probably working with bsd too)

I don't consider it ready fo rproduction, but fine for debugging
things, if you reall care the buffer cache preload, tell me, I'll
stabilize the code in a shorter time ;)
http://villemain.org/projects/pgfincore

>
>>
>> Is it possible?
>>
>> Thanks for any hints!
>>
>> Alexei
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain

Re: Multiple buffer cache?

From
"Bret S. Lambert"
Date:
On Sat, Feb 06, 2010 at 03:46:58PM +0200, Alexei Vladishev wrote:
> Bret,
>
> Thank you for your response!
>
> >>Greetings,
> >>
> >>Is there a way of configuring PostgreSQL so that one specific table would
> >>use, say, 4GB of buffer cache while other tables would use the rest?
> >>
> >>I would like to keep the table and its indexes always in "hot"
> >>state, so that
> >>other queries won't pollute this part of the buffer cache. It would ensure
> >>reliable performance and much less disk IOPS working with the table.
> >
> >Fiddling with the buffer cache like that would require some sort of
> >OS support, if I'm not mistaken in what you're asking for.
> I am talking about PostgreSQL buffer cache not OS level. I believe
> it has nothing to do with
> OS support.

Well, kinda; but I'd been spending too much time doing admin, so I'd
completely spaced on Postgres terms when you used "buffer cache", so
sorry for the mixup.

>
> It would be great to have support of multiple cache buffers assigned
> to different set of tables.
> Having this implemented, I would assign frequently accessed
> configuration tables (selects
> and updates) to one buffer and historical tables (lots of insert
> operations) to another buffer, so
> the sets would use independent buffers and won't affect each other.

Fair enough.

>
> >And then, even if the support is there, you'd need to outline exactly
> >how you're planning on pushing this button.
> >
> >Specifically, what's your usage pattern that would make this a
> >win for you?
> Let me explain. I have a very busy application generating thousands
> of SQLs per second.
> There is an application level cache built into the application already.
>
> The important part is that once per hour the application writes
> collected data to huge historical
> tables (100M up-to billions of records, partitioned). Since it
> happens every hour database buffer
> cache is already overwritten by data and indexes of other tables, so
> the write operation is very
> slow and requires huge amount of disk seeks causing 50-100x drop of
> performance.

The disk seeks will happen regardless of what Postgres does, as the
OS pulls in new disk blocks to perform the write. If your OS' buffer
cache is large enough to hold all the data you need, then your
best bet is likely partitioning data across multiple disks, so that
queuing the archive reads doesn't get in the way of production reads.

As I'm a unix admin mostly, I'm not qualified to give advice on whether
or not that's possible, or how to do it if it is ;)

>
> So, my idea is to assign a separate buffer cache for the historical
> tables. It would guarantee that
> index data is always cached, so the write operation will be very fast.
>
> Is it possible? Is there any other techniques available?

If it were at all possible, I'd actually set up a secondary archiving
server (unless you need the historical data on tap for the production
system as well), either on another port on the same machine, or on
another machine which won't impact your production system if it has to
suddenly do a bunch of disk I/O, and log the history to that.

>
> >If the table and its indexes can already fit into the buffer cache,
> >and it's as commonly accessed as you think it is, the OS should
> >probably have it cached anyway.
> I see what you are saying but the problem is that it is normally
> accessed once per hour only.
>
> Any thoughts?
>
> Alexei

Re: Multiple buffer cache?

From
"Bret S. Lambert"
Date:
On Sat, Feb 06, 2010 at 02:44:32PM +0100, C?dric Villemain wrote:
> 2010/2/6 Bret S. Lambert <bret.lambert@gmail.com>:
> > On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
> >> Greetings,
> >>
> >> Is there a way of configuring PostgreSQL so that one specific table would
> >> use, say, 4GB of buffer cache while other tables would use the rest?
> >>
> >> I would like to keep the table and its indexes always in "hot"
> >> state, so that
> >> other queries won't pollute this part of the buffer cache. It would ensure
> >> reliable performance and much less disk IOPS working with the table.
> >
> > Fiddling with the buffer cache like that would require some sort of
> > OS support, if I'm not mistaken in what you're asking for.
> >
> > And then, even if the support is there, you'd need to outline exactly
> > how you're planning on pushing this button.
> >
> > Specifically, what's your usage pattern that would make this a
> > win for you?
> >
> > If the table and its indexes can already fit into the buffer cache,
> > and it's as commonly accessed as you think it is, the OS should
> > probably have it cached anyway.
>
> that's all true.
>
> I am working on pgfincore which allow in some way to prepare buffer cache.
> You need pg > 8.4 and linux (probably working with bsd too)

Why do something with a non-portable interface? Most OSes support
coherently mmap(2)'ing disk blocks into memory; in fact, I'm somewhat
taken aback that the postgres buffer cache isn't implemented in that
manner, but I'm willing to give the devs credit for having not done
so for good reasons.

>
> I don't consider it ready fo rproduction, but fine for debugging
> things, if you reall care the buffer cache preload, tell me, I'll
> stabilize the code in a shorter time ;)
> http://villemain.org/projects/pgfincore
>
> >
> >>
> >> Is it possible?
> >>
> >> Thanks for any hints!
> >>
> >> Alexei
> >>
> >> --
> >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-general
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>
>
>
> --
> C?dric Villemain

Re: Multiple buffer cache?

From
"BillR"
Date:
I don't know how to do this with PostgreSQL, but I am pretty sure what
Alexei is looking for is what Oracle and SQL Server people refer to as
'pinning' a table or other DB object (into memory). I would be interested to
know if PostgreSQL does this too. I think it is a very useful feature.

How to pin a table in cache with Oracle (for an example):
http://www.jlcomp.demon.co.uk/faq/pin_table.html

Couple more examples.

http://blogs.oracle.com/stevenChan/2007/05/pinning_objects_to_improve_app.ht
ml

http://www.mssqltips.com/tip.asp?tip=1317

In some large enterprise systems I have worked on (e.g. tier one telecom
companies), besides the standard Oracle installation the billing systems
used one database product where everything was in memory. This was used
*mostly* for static lookup data to help speed up the performance of the
system. When you have say, 300 million customers, every little bit helps. :)

Hopefully someone knows how with Postgres.

Cheers

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bret S. Lambert
Sent: February-06-10 4:50 AM
To: Alexei Vladishev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Multiple buffer cache?

On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
> Greetings,
>
> Is there a way of configuring PostgreSQL so that one specific table would
> use, say, 4GB of buffer cache while other tables would use the rest?
>
> I would like to keep the table and its indexes always in "hot"
> state, so that
> other queries won't pollute this part of the buffer cache. It would ensure
> reliable performance and much less disk IOPS working with the table.

Fiddling with the buffer cache like that would require some sort of
OS support, if I'm not mistaken in what you're asking for.

And then, even if the support is there, you'd need to outline exactly
how you're planning on pushing this button.

Specifically, what's your usage pattern that would make this a
win for you?

If the table and its indexes can already fit into the buffer cache,
and it's as commonly accessed as you think it is, the OS should
probably have it cached anyway.

>
> Is it possible?
>
> Thanks for any hints!
>
> Alexei
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--------------------------------
Spam/Virus scanning by CanIt Pro

For more information see
http://www.kgbinternet.com/SpamFilter.htm

To control your spam filter, log in at
http://filter.kgbinternet.com


--
BEGIN-ANTISPAM-VOTING-LINKS
------------------------------------------------------

Teach CanIt if this mail (ID 80190050) is spam:
Spam:
http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
06&c=s
Not spam:
http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
06&c=n
Forget vote:
http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
06&c=f
------------------------------------------------------
END-ANTISPAM-VOTING-LINKS

__________ Information from ESET Smart Security, version of virus signature
database 4841 (20100206) __________

The message was checked by ESET Smart Security.

http://www.eset.com




Re: Multiple buffer cache?

From
Cédric Villemain
Date:
2010/2/6 Bret S. Lambert <bret.lambert@gmail.com>:
> On Sat, Feb 06, 2010 at 02:44:32PM +0100, C?dric Villemain wrote:
>> 2010/2/6 Bret S. Lambert <bret.lambert@gmail.com>:
>> > On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
>> >> Greetings,
>> >>
>> >> Is there a way of configuring PostgreSQL so that one specific table would
>> >> use, say, 4GB of buffer cache while other tables would use the rest?
>> >>
>> >> I would like to keep the table and its indexes always in "hot"
>> >> state, so that
>> >> other queries won't pollute this part of the buffer cache. It would ensure
>> >> reliable performance and much less disk IOPS working with the table.
>> >
>> > Fiddling with the buffer cache like that would require some sort of
>> > OS support, if I'm not mistaken in what you're asking for.
>> >
>> > And then, even if the support is there, you'd need to outline exactly
>> > how you're planning on pushing this button.
>> >
>> > Specifically, what's your usage pattern that would make this a
>> > win for you?
>> >
>> > If the table and its indexes can already fit into the buffer cache,
>> > and it's as commonly accessed as you think it is, the OS should
>> > probably have it cached anyway.
>>
>> that's all true.
>>
>> I am working on pgfincore which allow in some way to prepare buffer cache.
>> You need pg > 8.4 and linux (probably working with bsd too)
>
> Why do something with a non-portable interface? Most OSes support
> coherently mmap(2)'ing disk blocks into memory; in fact, I'm somewhat
> taken aback that the postgres buffer cache isn't implemented in that
> manner, but I'm willing to give the devs credit for having not done
> so for good reasons.

currrently, pgfincore is implement via mmap'ing and mincore. This is
the mincore implementation which might differ.

In next release there will probably a system call 'fincore()' which is
probably going to be integrated in linux kernel.

Doing all of those in a portable way is not fully possible (thinks
windows), but it may be possible using #ifdef things to catch the good
system call for each OS (if possible/implemented).


>
>>
>> I don't consider it ready fo rproduction, but fine for debugging
>> things, if you reall care the buffer cache preload, tell me, I'll
>> stabilize the code in a shorter time ;)
>> http://villemain.org/projects/pgfincore
>>
>> >
>> >>
>> >> Is it possible?
>> >>
>> >> Thanks for any hints!
>> >>
>> >> Alexei
>> >>
>> >> --
>> >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-general
>> >
>> > --
>> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-general
>> >
>>
>>
>>
>> --
>> C?dric Villemain
>



--
Cédric Villemain

Re: Multiple buffer cache?

From
Cédric Villemain
Date:
2010/2/6 BillR <iambill@williamrosmus.com>:
> I don't know how to do this with PostgreSQL, but I am pretty sure what
> Alexei is looking for is what Oracle and SQL Server people refer to as
> 'pinning' a table or other DB object (into memory). I would be interested to
> know if PostgreSQL does this too. I think it is a very useful feature.
>
> How to pin a table in cache with Oracle (for an example):
> http://www.jlcomp.demon.co.uk/faq/pin_table.html
>
> Couple more examples.
>
> http://blogs.oracle.com/stevenChan/2007/05/pinning_objects_to_improve_app.ht
> ml
>
> http://www.mssqltips.com/tip.asp?tip=1317
>
> In some large enterprise systems I have worked on (e.g. tier one telecom
> companies), besides the standard Oracle installation the billing systems
> used one database product where everything was in memory. This was used
> *mostly* for static lookup data to help speed up the performance of the
> system. When you have say, 300 million customers, every little bit helps. :)
>
> Hopefully someone knows how with Postgres.

In some way pgfincore let you do that with function
pgfadv_willneed(table/index). It will try to load blocks of the
relations without killing your IO.

But, I am not sure it is the correct answer for the problem here.

>
> Cheers
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bret S. Lambert
> Sent: February-06-10 4:50 AM
> To: Alexei Vladishev
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Multiple buffer cache?
>
> On Fri, Feb 05, 2010 at 11:41:13PM +0200, Alexei Vladishev wrote:
>> Greetings,
>>
>> Is there a way of configuring PostgreSQL so that one specific table would
>> use, say, 4GB of buffer cache while other tables would use the rest?
>>
>> I would like to keep the table and its indexes always in "hot"
>> state, so that
>> other queries won't pollute this part of the buffer cache. It would ensure
>> reliable performance and much less disk IOPS working with the table.
>
> Fiddling with the buffer cache like that would require some sort of
> OS support, if I'm not mistaken in what you're asking for.
>
> And then, even if the support is there, you'd need to outline exactly
> how you're planning on pushing this button.
>
> Specifically, what's your usage pattern that would make this a
> win for you?
>
> If the table and its indexes can already fit into the buffer cache,
> and it's as commonly accessed as you think it is, the OS should
> probably have it cached anyway.
>
>>
>> Is it possible?
>>
>> Thanks for any hints!
>>
>> Alexei
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>
>
> --
> BEGIN-ANTISPAM-VOTING-LINKS
> ------------------------------------------------------
>
> Teach CanIt if this mail (ID 80190050) is spam:
> Spam:
> http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
> 06&c=s
> Not spam:
> http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
> 06&c=n
> Forget vote:
> http://filter.kgbinternet.com/canit/b.php?i=80190050&m=fbb7172fd677&t=201002
> 06&c=f
> ------------------------------------------------------
> END-ANTISPAM-VOTING-LINKS
>
> __________ Information from ESET Smart Security, version of virus signature
> database 4841 (20100206) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain

Re: Multiple buffer cache?

From
Alexei Vladishev
Date:
Bret,

Thank you for your response!

>> Greetings,
>>
>> Is there a way of configuring PostgreSQL so that one specific table would
>> use, say, 4GB of buffer cache while other tables would use the rest?
>>
>> I would like to keep the table and its indexes always in "hot"
>> state, so that
>> other queries won't pollute this part of the buffer cache. It would ensure
>> reliable performance and much less disk IOPS working with the table.
>>
>
> Fiddling with the buffer cache like that would require some sort of
> OS support, if I'm not mistaken in what you're asking for.
>
I am talking about PostgreSQL buffer cache not OS level. I believe it
has nothing to do with
OS support.

It would be great to have support of multiple cache buffers assigned to
different set of tables.
Having this implemented, I would assign frequently accessed
configuration tables (selects
and updates) to one buffer and historical tables (lots of insert
operations) to another buffer, so
the sets would use independent buffers and won't affect each other.

> And then, even if the support is there, you'd need to outline exactly
> how you're planning on pushing this button.
>
> Specifically, what's your usage pattern that would make this a
> win for you?
>
Let me explain. I have a very busy application generating thousands of
SQLs per second.
There is an application level cache built into the application already.

The important part is that once per hour the application writes
collected data to huge historical
tables (100M up-to billions of records, partitioned). Since it happens
every hour database buffer
cache is already overwritten by data and indexes of other tables, so the
write operation is very
slow and requires huge amount of disk seeks causing 50-100x drop of
performance.

So, my idea is to assign a separate buffer cache for the historical
tables. It would guarantee that
index data is always cached, so the write operation will be very fast.

Is it possible? Is there any other techniques available?

> If the table and its indexes can already fit into the buffer cache,
> and it's as commonly accessed as you think it is, the OS should
> probably have it cached anyway.
>
I see what you are saying but the problem is that it is normally
accessed once per hour only.

Any thoughts?

Alexei

Re: Multiple buffer cache?

From
Greg Smith
Date:
Alexei Vladishev wrote:
> Is there a way of configuring PostgreSQL so that one specific table would
> use, say, 4GB of buffer cache while other tables would use the rest?

It sounds like you're looking for what other databases call "pinning".
It's not supported in PostgreSQL right now, and as far as I know it's
not on anybody's hotlist of features they're working on.  It would be
straightforward to add actually; I know exactly where the code that
evicts pages from the buffer cache would need to be tweaked to support
this.  See the "Inside the PostgreSQL Buffer Cache" presentation at
http://www.westnet.com/~gsmith/content/postgresql/ for more details
about how the current implementation works.  Be happy to talk about what
what it would take to sponsor the bit of development required if this is
something you really need for your app--it's not a giant feature to
build, just not one that anyone has needed badly enough so far to bother
writing.

> I would like to keep the table and its indexes always in "hot" state,
> so that
> other queries won't pollute this part of the buffer cache. It would
> ensure
> reliable performance and much less disk IOPS working with the table.

The PostgreSQL design presumes that the OS cache is significantly larger
than the RAM dedicated to the database, so a database cache miss doesn't
necessarily turn into physical I/O.  Part of the reason this feature
hasn't been been perceived as more valuable is because just letting the
two cache levels involved here sort out what's really valuable or not
can often outperform what an application developer thinks the optimal
configuration will be.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Multiple buffer cache?

From
Greg Smith
Date:
Bret S. Lambert wrote:
> On Sat, Feb 06, 2010 at 02:44:32PM +0100, C?dric Villemain wrote:
>
>> I am working on pgfincore which allow in some way to prepare buffer cache.
>> You need pg > 8.4 and linux (probably working with bsd too)
>>
>
> Why do something with a non-portable interface? Most OSes support
> coherently mmap(2)'ing disk blocks into memory; in fact, I'm somewhat
> taken aback that the postgres buffer cache isn't implemented in that
> manner, but I'm willing to give the devs credit for having not done
> so for good reasons.
>

First off, Cedric's work is using a fairly portable interface:  mincore
http://www.kernel.org/doc/man-pages/online/pages/man2/mincore.2.html

combined with clever bits of mmap.  Two useful introductions to this are
are http://www.usenix.org/events/lisa07/tech/plonka.html and
http://www.kennygorman.com/wordpress/?p=246

As for why not use mmap in the first place, this is a long running
debate.  See
http://wiki.postgresql.org/wiki/Todo#Miscellaneous_Performance for a
quick intro to the issues involved.  There are portability, database
consistency, and code complexity issues around such a change, and the
practical benefit of the change is not so obvious as to make hurdling
those seem a priority.  Plenty of easier changes around with completely
unambiguous benefits to work on instead.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Multiple buffer cache?

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> ...  Part of the reason this feature
> hasn't been been perceived as more valuable is because just letting the
> two cache levels involved here sort out what's really valuable or not
> can often outperform what an application developer thinks the optimal
> configuration will be.

Or to put it even more clearly: the only way that pinning particular
tables into RAM could beat letting the caching algorithm take care of
it is if the caching algorithm is seriously suboptimal.  Therefore,
it would make more sense to put development effort into improving
the cache algorithm than to put it into giving the DBA a rather blunt
instrument for overriding the cache algorithm.  We've already made
several rounds of improvements of that kind, and are quite happy to
consider more.

            regards, tom lane

Re: Multiple buffer cache?

From
Greg Stark
Date:

I doubt pinning buffers ever improve system on any halfway modern system. It will often *look* like it has improved performance because it improves the performance of the queries you're looking at -- but at the expense of slowing down everything else.

There is a use case it would be useful for though. When you have some queries that are latency critical. Then you might want to pin the buffers those queries use to avoid having larger less urgent queries purge those buffers.

If we had a way to mark latency critical queries that might be a more flexible interface but ewe would need some way to control just how critical they are. we wouldn't want to keep those buffets pinned forever.

greg

On 7 Feb 2010 07:36, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Greg Smith <greg@2ndquadrant.com> writes:
> ...  Part of the reason this feature

> hasn't been been perceived as more valuable is because just letting the
> two cache levels involv...
Or to put it even more clearly: the only way that pinning particular
tables into RAM could beat letting the caching algorithm take care of
it is if the caching algorithm is seriously suboptimal.  Therefore,
it would make more sense to put development effort into improving
the cache algorithm than to put it into giving the DBA a rather blunt
instrument for overriding the cache algorithm.  We've already made
several rounds of improvements of that kind, and are quite happy to
consider more.

                       regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subs...

Re: Multiple buffer cache?

From
Alvaro Herrera
Date:
Greg Stark wrote:
> I doubt pinning buffers ever improve system on any halfway modern system. It
> will often *look* like it has improved performance because it improves the
> performance of the queries you're looking at -- but at the expense of
> slowing down everything else.
>
> There is a use case it would be useful for though. When you have some
> queries that are latency critical. Then you might want to pin the buffers
> those queries use to avoid having larger less urgent queries purge those
> buffers.
>
> If we had a way to mark latency critical queries that might be a more
> flexible interface but ewe would need some way to control just how critical
> they are. we wouldn't want to keep those buffets pinned forever.

This should be easy to test, no?  Just set some variable while running
latency-critical queries that makes PinBuffer increment usage_count by
more than one when pinning a buffer.  Such a buffer would have its usage
count typically higher than a buffer only used for regular queries.

To make this work we'd probably need a slightly larger value of
BM_MAX_USAGE_COUNT, I think.


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Multiple buffer cache?

From
Greg Smith
Date:
Alvaro Herrera wrote:
> This should be easy to test, no?  Just set some variable while running
> latency-critical queries that makes PinBuffer increment usage_count by
> more than one when pinning a buffer.  Such a buffer would have its usage
> count typically higher than a buffer only used for regular queries.
>

Yeah, the only problem is that if the latency-critical ones don't happen
often enough, just the constant passes of the clock sweep hand over them
to allocate for the big activity might kick them out regardless. I was
thinking more along the lines of just adding another buffer flag that
protected them instead--once you get in the cache, if you came from a
relation that's pinned, you never leave again if that flag is on. That's
more along the lines of what they think they want--whether or not that's
really best would take some profiling to prove.

> To make this work we'd probably need a slightly larger value of
> BM_MAX_USAGE_COUNT, I think.
>

That alone might help their case significantly, if in fact the
latency-critical bits do happen often. Given a repeatable test case, the
first thing I'd consider doing for this problem is instrumenting the
distribution of data in the buffer cache with usage counts, double
BM_MAX_USAGE_COUNT, then run it again and see what's different. Would
learn a lot with that test.

As far as the sort of issues that Tom and Greg both mentioned, there are
plenty of situations where people are willing to trade-off a significant
amount of average or best performance in return for lowering worst-case
latency for some queries. I'm not completely aligned with presuming the
database will always have enough information to make that call on its
own, and "pinning hints" are common enough in other systems that maybe
they're worth an investigation here too. Some of the things people ask
for optimizer hints for might even be addressed as a side-effect
here--knowing the relation was likely nailed down in the cache would
certainly adjust the plan costs in a way the optimizer could use. That's
a slightly different case than the direct requests for optimizer hints,
which are usually a bad idea because they encourage the optimizer to
make decisions based on data that's likely out of date.

We (myself, Greg Stark, Robert Haas, Stephen Frost) had a little meeting
last year on the topic of more advanced buffer cache methods and what
the optimizer might do with them, and I still owe everyone a written
report on that I haven't forgotten about. While I agree that wandering
in that direction is the ideal approach here, there may be a role for
relation pinning in all this somewhere too.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Multiple buffer cache?

From
Alexei Vladishev
Date:
Greg,

Let's look at this from an application developer perspective. Suppose my
application has a few hundreds of tables. I know _very_well_ how the
tables are used. I'd like to tune PostgreSQL so that it would respect
how the application works in order to get best possible performance.

My database is 500GB. One set of tables, configuration and operational
data, takes around 20GB. Historical data takes 480GB.

Configuration: 20GB, lots of selects and updates. Latency is important.
History: 480GB, frequent inserts, selects. Latency is important as well.

Since PostgreSQL has only one buffer cache, operations with one set of
tables affects other set of tables. Bunch of selects from the historical
tables could easily cause cache pollution.

Pinning buffers is a nice feature, but imho it would be hard to tune and
control. However I think that pinning relations (tables) would help a
lot. In this case all configuration tables could be pinned, therefore
selects and updates will be very fast.

Alternatively if PostgreSQL supported multiple buffer cache, I would
assign relatively small configuration tables to one buffer and
historical tables to another, so they won't affect each other. It may
also work on per database level, so a database can be assigned to one of
available buffers.

Does it make sense?

Kind regards,
Alexei

Greg Stark wrote:
>
> I doubt pinning buffers ever improve system on any halfway modern
> system. It will often *look* like it has improved performance because
> it improves the performance of the queries you're looking at -- but at
> the expense of slowing down everything else.
>
> There is a use case it would be useful for though. When you have some
> queries that are latency critical. Then you might want to pin the
> buffers those queries use to avoid having larger less urgent queries
> purge those buffers.
>
> If we had a way to mark latency critical queries that might be a more
> flexible interface but ewe would need some way to control just how
> critical they are. we wouldn't want to keep those buffets pinned forever.
>
> greg
>
>> On 7 Feb 2010 07:36, "Tom Lane" <tgl@sss.pgh.pa.us
>> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>> Greg Smith <greg@2ndquadrant.com <mailto:greg@2ndquadrant.com>> writes:
>> > ...  Part of the reason this feature
>>
>> > hasn't been been perceived as more valuable is because just letting
>> the
>> > two cache levels involv...
>>
>> Or to put it even more clearly: the only way that pinning particular
>> tables into RAM could beat letting the caching algorithm take care of
>> it is if the caching algorithm is seriously suboptimal.  Therefore,
>> it would make more sense to put development effort into improving
>> the cache algorithm than to put it into giving the DBA a rather blunt
>> instrument for overriding the cache algorithm.  We've already made
>> several rounds of improvements of that kind, and are quite happy to
>> consider more.
>>
>>                        regards, tom lane
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>> <mailto:pgsql-general@postgresql.org>)
>> To make changes to your subs...
>>


Re: Multiple buffer cache?

From
Alexei Vladishev
Date:
Greg,

> Alexei Vladishev wrote:
>> Is there a way of configuring PostgreSQL so that one specific table
>> would
>> use, say, 4GB of buffer cache while other tables would use the rest?
>
> It sounds like you're looking for what other databases call
> "pinning".  It's not supported in PostgreSQL right now, and as far as
> I know it's not on anybody's hotlist of features they're working on.
> It would be straightforward to add actually; I know exactly where the
> code that evicts pages from the buffer cache would need to be tweaked
> to support this.  See the "Inside the PostgreSQL Buffer Cache"
> presentation at http://www.westnet.com/~gsmith/content/postgresql/ for
> more details about how the current implementation works.  Be happy to
> talk about what what it would take to sponsor the bit of development
> required if this is something you really need for your app--it's not a
> giant feature to build, just not one that anyone has needed badly
> enough so far to bother writing.
Thank you for the link. Lots of useful information there.

>> I would like to keep the table and its indexes always in "hot" state,
>> so that
>> other queries won't pollute this part of the buffer cache. It would
>> ensure
>> reliable performance and much less disk IOPS working with the table.
>
> The PostgreSQL design presumes that the OS cache is significantly
> larger than the RAM dedicated to the database, so a database cache
> miss doesn't necessarily turn into physical I/O.  Part of the reason
> this feature hasn't been been perceived as more valuable is because
> just letting the two cache levels involved here sort out what's really
> valuable or not can often outperform what an application developer
> thinks the optimal configuration will be.
Interesting! I assumed that it is a common practice to dedicate most of
the RAM to a database engine leaving only small percentage to OS.

Kind regards,
Alexei

Re: Multiple buffer cache?

From
Alexei Vladishev
Date:
Bret,

>>> And then, even if the support is there, you'd need to outline exactly
>>> how you're planning on pushing this button.
>>>
>>> Specifically, what's your usage pattern that would make this a
>>> win for you?
>>>
>> Let me explain. I have a very busy application generating thousands
>> of SQLs per second.
>> There is an application level cache built into the application already.
>>
>> The important part is that once per hour the application writes
>> collected data to huge historical
>> tables (100M up-to billions of records, partitioned). Since it
>> happens every hour database buffer
>> cache is already overwritten by data and indexes of other tables, so
>> the write operation is very
>> slow and requires huge amount of disk seeks causing 50-100x drop of
>> performance.
>>
>
> The disk seeks will happen regardless of what Postgres does, as the
> OS pulls in new disk blocks to perform the write. If your OS' buffer
> cache is large enough to hold all the data you need, then your
> best bet is likely partitioning data across multiple disks, so that
> queuing the archive reads doesn't get in the way of production reads.
>
> As I'm a unix admin mostly, I'm not qualified to give advice on whether
> or not that's possible, or how to do it if it is ;)
>
I was talking about read seeks obviously caused by index-related
searches. Write operations
do not cause latency issues as they are handled quite well by OS,
controller, HDD, whatever
write cache.

>> So, my idea is to assign a separate buffer cache for the historical
>> tables. It would guarantee that
>> index data is always cached, so the write operation will be very fast.
>>
>> Is it possible? Is there any other techniques available?
>>
>
> If it were at all possible, I'd actually set up a secondary archiving
> server (unless you need the historical data on tap for the production
> system as well), either on another port on the same machine, or on
> another machine which won't impact your production system if it has to
> suddenly do a bunch of disk I/O, and log the history to that.
>
I agree. Two separate servers would be a nice solution as usage patterns
are absolutely different,
so the servers can be tuned differently. Lack of transactional integrity
is an obvious drawback of
such approach.

Kind regards,
Alexei