Thread: Some ideas for comment

Some ideas for comment

From

Chris Hoover

Date:

24 August 2005, 14:36:17

Ok, there is always a lot of talk about tuning PostgreSQL on linux and
how PostgreSQL uses the linux kernel cache to cache the tables and
indexes.

My question is, is there anyway to see what files linux is caching at
this moment?

My reasoning behind this question is:

I have several database systems each with 1 PostgreSQL cluster.
However, each cluster has  a large number of identical databases on
it. Since there can be a great amount of size disparity between the
databases, I am wondering if some of slowness we see might be caused
by the kernel cache having to reload a lot of data that keeps getting
swapped out. (most systems have at least 100GB of data/indexes on them
with 8 or 12GB ram).

If this is the case, what sort of symptoms would you expect?

To help mitigate this potential, I have been researching the
following, and am thinking of proposing it to management.  Any
comments would be appreciated.

1.  Implement a partition type layout using views and rules - This
will allow me to have one table in each view with the "active" data,
and the inactive data stored by year in other tables.

So I would have the following (for each major table):

Table View as
select * from active_table
union all
select * from table_2005
union all
select * from table_2004
etc.

Each table would have identical indexes, however only the
"active_table" would have data that is actively being worked.  The
rules and a nightly job can keep the data correctly stored.

I am thinking that with this setup, the active table indexes should
almost always be in memory.  And, if they do happen to be pushed out,
they are much smaller than the indexes I have today (where all data is
in one table), so they should load faster with less i/o pressure.

From the testing I have done so far, I believe I can implement this
system with out having to ask for developer time.  This is a "Good
Thing".

Also, the database is not normalized and is very ugly, by using the
view to partition and abstract the actual data, I will be in a better
position to start normalizing some of the tables w/o developer time
(once again, a "Good Thing")


2.  I am also thinking of recommending we collapse all databases in a
cluster into one "mega" database.  I can then use schema's and views
to control database access and ensure that no customer can see another
customers data.

This would mean that there are only one set of indexes being loaded
into the cache.  While they would be larger, I think in combination
with the partition from idea 1, we would be ahead of the ball game.
Since there would only be one set of indexes, everyone would be
sharing them so they should always be in memory.

I don't have real numbers to give you, but we know that our systems
are hurting i/o wise and we are growing by about 2GB+ per week (net).
We actually grow by about 5GB/week/server.  However, when I run my
weekly maintenance of vacuum full, reindex, and the vacuum analyze, we
end up getting about 3GB back.  Unfortunately, I do not have the i/o
bandwidth to vacuum during the day as it causes major slowdowns on our
system.  Each night, I do run a vacuum analyze across all db's to try
and help.  I also have my fsm parameters set high (8000000 fsm pages,
and 5000 fsm relations) to try and compensate.

I believe this is only hurting us as any queries that choose to
tablescan are only getting slower and slower.  Also, obviously, our
indexes are continually growing.  The partitioning should help as the
actual number of records being worked on each table is a very small
percentage ( a site may have 1 million records, but only load and work
a few thousand each day).  The archive tables would be doing the most
growing while the active tables should stay small.  Most of the
queries that are tablescanning can not be fixed as the database
clusters have been initialized with a non-C locale and won't use
indexes on our queries that are using like with a wild card.


Right now, we are still on 7.3.4.  However, these ideas would be
implemented as part of an upgrade to 8.x (plus, we'll initialize the
new clusters with a C locale).

Anyway, I hope this makes since, and any comments, ideas, and/or
suggestions would be appreciated.

Thanks,

Chris

Re: Some ideas for comment

From

"Merlin Moncure"

Date:

24 August 2005, 16:17:02

> Ok, there is always a lot of talk about tuning PostgreSQL on linux and
> how PostgreSQL uses the linux kernel cache to cache the tables and
> indexes.
[...]
>
> 1.  Implement a partition type layout using views and rules - This
> will allow me to have one table in each view with the "active" data,
> and the inactive data stored by year in other tables.
>
> So I would have the following (for each major table):
>
> Table View as
> select * from active_table
> union all
> select * from table_2005
> union all
> select * from table_2004
> etc.

Linux does a pretty good job of deciding what to cache.  I don't think
this will help much.  You can always look at partial indexes too.

> 2.  I am also thinking of recommending we collapse all databases in a
> cluster into one "mega" database.  I can then use schema's and views
> to control database access and ensure that no customer can see another
> customers data.

hm. keep in mind views are tightly bound to the tables they are created
with (views can't 'float' over tables in different schemas).  pl/pgsql
functions can, though.  This is a more efficient use of server
resources, IMO, but not a windfall.

> This would mean that there are only one set of indexes being loaded
> into the cache.  While they would be larger, I think in combination
> with the partition from idea 1, we would be ahead of the ball game.
> Since there would only be one set of indexes, everyone would be
> sharing them so they should always be in memory.

I would strongly consider adding more memory :).

> I don't have real numbers to give you, but we know that our systems
> are hurting i/o wise and we are growing by about 2GB+ per week (net).
> We actually grow by about 5GB/week/server.  However, when I run my
> weekly maintenance of vacuum full, reindex, and the vacuum analyze, we
> end up getting about 3GB back.  Unfortunately, I do not have the i/o
> bandwidth to vacuum during the day as it causes major slowdowns on our
> system.  Each night, I do run a vacuum analyze across all db's to try
> and help.  I also have my fsm parameters set high (8000000 fsm pages,
> and 5000 fsm relations) to try and compensate.

Generally, you can reduce data turnover for the same workload by
normalizing your database.  IOW, try and make your database more
efficient in the way it stores data.

> Right now, we are still on 7.3.4.  However, these ideas would be
> implemented as part of an upgrade to 8.x (plus, we'll initialize the
> new clusters with a C locale).

yes, do this!

Merlin

Re: Some ideas for comment

From

Alvaro Herrera

Date:

24 August 2005, 16:51:46

On Wed, Aug 24, 2005 at 12:56:54PM -0400, Chris Hoover wrote:

> I don't have real numbers to give you, but we know that our systems
> are hurting i/o wise and we are growing by about 2GB+ per week (net).
> We actually grow by about 5GB/week/server.  However, when I run my
> weekly maintenance of vacuum full, reindex, and the vacuum analyze, we
> end up getting about 3GB back.  Unfortunately, I do not have the i/o
> bandwidth to vacuum during the day as it causes major slowdowns on our
> system.  Each night, I do run a vacuum analyze across all db's to try
> and help.  I also have my fsm parameters set high (8000000 fsm pages,
> and 5000 fsm relations) to try and compensate.

[...]

> Right now, we are still on 7.3.4.  However, these ideas would be
> implemented as part of an upgrade to 8.x (plus, we'll initialize the
> new clusters with a C locale).

If you were on a newer version, I'd suggest that you use the cost-based
vacuum delay, and vacuum at least some of the tables more often.  This
way you can reduce the continual growth of the data files without
affecting day-to-day performance, because you allow the VACUUM-inflicted
I/O to be interleaved by normal query execution.

Sadly (for you), I think the cost-based vacuum delay feature was only
introduced in 8.0.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: Some ideas for comment

From

Tom Lane

Date:

24 August 2005, 17:03:49

"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
>> Right now, we are still on 7.3.4.  However, these ideas would be
>> implemented as part of an upgrade to 8.x (plus, we'll initialize the
>> new clusters with a C locale).

> yes, do this!

Moving from 7.3 to 8.0 is alone likely to give you a noticeable
performance boost.

            regards, tom lane

Re: Some ideas for comment

From

Chris Hoover

Date:

24 August 2005, 18:14:29

On 8/24/05, Merlin Moncure <merlin.moncure@rcsonline.com> wrote:
> > Ok, there is always a lot of talk about tuning PostgreSQL on linux and
> > how PostgreSQL uses the linux kernel cache to cache the tables and
> > indexes.
> [...]
> >
> > 1.  Implement a partition type layout using views and rules - This
> > will allow me to have one table in each view with the "active" data,
> > and the inactive data stored by year in other tables.
> >
> > So I would have the following (for each major table):
> >
> > Table View as
> > select * from active_table
> > union all
> > select * from table_2005
> > union all
> > select * from table_2004
> > etc.
>
> Linux does a pretty good job of deciding what to cache.  I don't think
> this will help much.  You can always look at partial indexes too.
>
Yes, but won't this help create the need to store less?  If I have
1,000.000 rows in a table, but only 4,000 are active, if I move those
4 to another table and link the tables via a view, should that not
help keep the 9,996,000 rows out of the kernel cache (the majority of
the time at least)?

This would mean I have more room for other objects and hopefully less
turn over in the cache, and less disk i/o.

Yes?
[...]
> I would strongly consider adding more memory :).
Unfortunately, it looks like 12GB is all our Dell servers can handle. :(

>
> > I don't have real numbers to give you, but we know that our systems
> > are hurting i/o wise and we are growing by about 2GB+ per week (net).
> > We actually grow by about 5GB/week/server.  However, when I run my
> > weekly maintenance of vacuum full, reindex, and the vacuum analyze, we
> > end up getting about 3GB back.  Unfortunately, I do not have the i/o
> > bandwidth to vacuum during the day as it causes major slowdowns on our
> > system.  Each night, I do run a vacuum analyze across all db's to try
> > and help.  I also have my fsm parameters set high (8000000 fsm pages,
> > and 5000 fsm relations) to try and compensate.
>
> Generally, you can reduce data turnover for the same workload by
> normalizing your database.  IOW, try and make your database more
> efficient in the way it stores data.
>
That's the ultimate goal, but this database structure was developed
and released into production before I started work here.  I'm trying
to slowly change it into a better db, but it is  a slow process.
Normalization does not make it at the top of the priority list,
unfortunately.

> > Right now, we are still on 7.3.4.  However, these ideas would be
> > implemented as part of an upgrade to 8.x (plus, we'll initialize the
> > new clusters with a C locale).
> > > 2.  I am also thinking of recommending we collapse all databases in a
> > cluster into one "mega" database.  I can then use schema's and views
> > to control database access and ensure that no customer can see another
> > customers data.
>
> hm. keep in mind views are tightly bound to the tables they are created
> with (views can't 'float' over tables in different schemas).  pl/pgsql
> functions can, though.  This is a more efficient use of server
> resources, IMO, but not a windfall.

This I know.  Each schema would have to have a "custom" set of views
replacing the tables with the view programmed to only return that
customers data.

I was thinking all of the tables in schema my_tables and the views all
querying the tables stored in the my_tables schema.  I would add an
identifying column to each table so that I can differentiate the data.

Chris

Re: Some ideas for comment

From

Jens-Wolfhard Schicke

Date:

25 August 2005, 04:18:32

--On Mittwoch, August 24, 2005 16:26:40 -0400 Chris Hoover
<revoohc@gmail.com> wrote:

> On 8/24/05, Merlin Moncure <merlin.moncure@rcsonline.com> wrote:
>> Linux does a pretty good job of deciding what to cache.  I don't think
>> this will help much.  You can always look at partial indexes too.
>>
> Yes, but won't this help create the need to store less?  If I have
> 1,000.000 rows in a table, but only 4,000 are active, if I move those
> 4 to another table and link the tables via a view, should that not
> help keep the 9,996,000 rows out of the kernel cache (the majority of
> the time at least)?
The kernel caches per page, not per file. It is likely linux only caches
those pages which contain active rows, as long as no statement does a
seq-scan on that table.

To optimize the thing, you could consider to cluster by some index which
sorts by the "activity" of the rows first. That way pages with active rows
are likely to contain more than only 1 active row and so the cache is
utilized better.

Cluster is rather slow however and tables need to be reclustered from time
to time.


Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke              j.schicke@asco.de
asco GmbH              http://www.asco.de
Mittelweg 7              Tel 0531/3906-127
38106 Braunschweig          Fax 0531/3906-400