Thread: PostgreSQL Arrays and Performance

PostgreSQL Arrays and Performance

From
Marc Philipp
Date:
A few performance issues using PostgreSQL's arrays led us to the
question how postgres actually stores variable length arrays. First,
let me explain our situation.

We have a rather large table containing a simple integer primary key
and a couple more columns of fixed size. However, there is a dates
column of type "timestamp without time zone[]" that is apparently
causing some severe performance problems.

During a daily update process new timestamps are collected and
existing data rows are being updated (new rows are also being added).
These changes affect a large percentage of the existing rows.

What we have been observing in the last few weeks is, that the
overall database size is increasing rapidly due to this table and
vacuum processes seem to deadlock with other processes querying data
from this table.

Therefore, the the database keeps growing and becomes more and more
unusable. The only thing that helps is dumping and restoring it which
is nothing you are eager to do on a large live system and a daily basis.

This problem led us to the question, how these arrays are stored
internally. Are they stored "in-place" with the other columns or
merely as a pointer to another file?

Would it be more efficient to not use an array for this purpose but
split the table in two parts?

Any help is appreciated!


Marc Philipp

Re: PostgreSQL Arrays and Performance

From
Tom Lane
Date:
Marc Philipp <mail@marcphilipp.de> writes:
> A few performance issues using PostgreSQL's arrays led us to the
> question how postgres actually stores variable length arrays. First,
> let me explain our situation.

> We have a rather large table containing a simple integer primary key
> and a couple more columns of fixed size. However, there is a dates
> column of type "timestamp without time zone[]" that is apparently
> causing some severe performance problems.

How large are the arrays?  PG is definitely not designed to do well with
very large arrays (say more than a couple hundred elements).  You should
reconsider your data design if you find yourself trying to do that.

            regards, tom lane

Re: PostgreSQL Arrays and Performance

From
Joe Conway
Date:
Marc Philipp wrote:
> During a daily update process new timestamps are collected and  existing
> data rows are being updated (new rows are also being added).  These
> changes affect a large percentage of the existing rows.
>
> What we have been observing in the last few weeks is, that the  overall
> database size is increasing rapidly due to this table and  vacuum
> processes seem to deadlock with other processes querying data  from this
> table.

This sounds like it has more to do with inadequate freespace map
settings than use of arrays. Every update creates a dead tuple, and if
it is large (because the array is large) and leaked (because you have no
room in your freespace map), that would explain a rapidly increasing
database size.

> Therefore, the the database keeps growing and becomes more and more
> unusable. The only thing that helps is dumping and restoring it which
> is nothing you are eager to do on a large live system and a daily basis.

Arrays are stored as compressed varlenas, pretty much exactly like
varchar or text fields. However, if your use of arrays causes you to
need to perform updates to a large percentage of your rows on a daily
basis, instead of just inserting new rows, you should probably rethink
your data model.

> Would it be more efficient to not use an array for this purpose but
> split the table in two parts?
>

I think so.

Joe



Re: PostgreSQL Arrays and Performance

From
Marc Philipp
Date:
> How large are the arrays?  PG is definitely not designed to do well
> with
> very large arrays (say more than a couple hundred elements).  You
> should
> reconsider your data design if you find yourself trying to do that

At the moment, the arrays are not larger than 200 entries. But there is
not upper bound for their size.

Regards, Marc Philipp



Re: PostgreSQL Arrays and Performance

From
Marc Philipp
Date:
> This sounds like it has more to do with inadequate freespace map
> settings than use of arrays. Every update creates a dead tuple, and
> if
> it is large (because the array is large) and leaked (because you have
> no
> room in your freespace map), that would explain a rapidly increasing
> database size.

Can you tell me more about free-space map settings? What exactly is the
free-space map? The information in the documentation is not very
helpful. How can dead tuples leak?

Regards, Marc Philipp



Re: PostgreSQL Arrays and Performance

From
"Jim C. Nasby"
Date:
On Sun, Jan 08, 2006 at 10:22:22AM +0100, Marc Philipp wrote:
> > This sounds like it has more to do with inadequate freespace map
> > settings than use of arrays. Every update creates a dead tuple, and
> > if
> > it is large (because the array is large) and leaked (because you have
> > no
> > room in your freespace map), that would explain a rapidly increasing
> > database size.
>
> Can you tell me more about free-space map settings? What exactly is the
> free-space map? The information in the documentation is not very
> helpful. How can dead tuples leak?

http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Nov.asp#3
is an article I wrote that might clear things up.
http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Dec.asp#2
might also be an interesting read, though it's just about MVCC in
general.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461