Thread: Larger volumes of chronologically ordered data and the planner

Larger volumes of chronologically ordered data and the planner

From

John Moran

Date:

03 March 2010, 17:18:02

Hello,

What is PostgreSQL's likely behaviour when it encounters a large
volume of data that is chronologically ordered (there's a btree index
on a date column)? Is postgreSQL intelligent enough to discern that
since the most frequently accessed data is invariably recent data,
that it should store only that in memory, and efficiently store less
relevant, older data on disk (the volume of data in production at the
moment is still small enough to fit entirely in memory)? The
application I maintain is not really a data warehousing app, but this
is likely to be where I first encounter performance issues, if I ever
do.

Where can I learn more about this subject in general?

Regards,
John Moran

Re: Larger volumes of chronologically ordered data and the planner

From

Tom Lane

Date:

03 March 2010, 17:36:44

John Moran <johnfrederickmoran@gmail.com> writes:
> What is PostgreSQL's likely behaviour when it encounters a large
> volume of data that is chronologically ordered (there's a btree index
> on a date column)? Is postgreSQL intelligent enough to discern that
> since the most frequently accessed data is invariably recent data,
> that it should store only that in memory, and efficiently store less
> relevant, older data on disk (the volume of data in production at the
> moment is still small enough to fit entirely in memory)?

There's no dedicated intelligence about such a case, but I don't see why
the ordinary cache management algorithms won't handle it perfectly well.

            regards, tom lane

Re: Larger volumes of chronologically ordered data and the planner

From

Greg Smith

Date:

03 March 2010, 20:24:41

John Moran wrote:
> Is postgreSQL intelligent enough to discern that
> since the most frequently accessed data is invariably recent data,
> that it should store only that in memory, and efficiently store less
> relevant, older data on disk

When you ask for a database block from disk, it increments a usage count
figure for that block when it's read into memory, and again if it turns
out it was already there.  Those requests to allocate new blocks are
constantly decreasing those usage counts as they "clock sweep" over the
cache looking for space that hasn't been used recently.  This will
automatically keep blocks you've used recently in RAM, while evicting
ones that aren't.

The database doesn't have any intelligence to determining what data to
keep in memory or not beyond that.  Its sole notion of "relevant" is
whether someone has accessed that block recently or not.  The operating
system cache will sit as a second layer on top of this, typically with
its own LRU scheme typically for determining what gets cached or not.

I've written a long paper covering the internals here named "Inside the
PostgreSQL Buffer Cache" at
http://www.westnet.com/~gsmith/content/postgresql/ if you want to know
exactly how this is all implemented.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Larger volumes of chronologically ordered data and the planner

From

John Moran

Date:

04 March 2010, 09:26:12

> I've written a long paper covering the internals here named "Inside the
> PostgreSQL Buffer Cache" at
> http://www.westnet.com/~gsmith/content/postgresql/ if you want to know
> exactly how this is all implemented.

Greg,

That's exactly what I was looking for,

Regards,
John