Re: How Big is Too Big for Tables? - Mailing list pgsql-general

From Peter Bex
Subject Re: How Big is Too Big for Tables?
Date
Msg-id 20100728194853.GB579@frohike.homeunix.org
Whole thread Raw
In response to Re: How Big is Too Big for Tables?  (P Kishor <punk.kish@gmail.com>)
List pgsql-general
On Wed, Jul 28, 2010 at 02:05:47PM -0500, P Kishor wrote:
> each row is half a dozen single byte values, so, it is actually 6
> bytes per row (six columns). Even if I combine them somehow, still the
> per row overhead (which, I believe, is about 23 bytes) is more than
> the data. But, that is not the issue.

I had a design like that for an application too. I thought it was
not an issue, but the row overhead causes memory and disk usage to
skyrocket, and will cause queries to slow down to a grind.  The solution
for me was to group my values logically together and store them in the
same row somehow.  In my case, this worked by storing all the values for
one measuring point (timestamp) in an array field, with the array indices
being stored in a bookkeeping table (each measuring moment produced the
same number of values for me, so I was able to do this).

Extracting one value from a long array (some datasets include thousands
of values per measuring moment) is extremely fast. You can also easily
make indices on those array dereferences you need to search on, if those
are always the same.

> First, I can't really merge
> several days into one row. While it might make for fewer rows, it will
> complicate my data extraction and analysis life very complicated.

Perhaps you could put all days of a month in an array, indexed by day
of the month?  That wouldn't be too hard for your logic to deal with,
I think.

> The real issue is that once I put a 100 million rows in the table,
> basically the queries became way too slow.

I had the same issue.  Partitioning falls flat on its face once you're
dealing with such insane amounts of data.  In my experience if your
partitions aren't constant and will keep growing, you will face problems
sooner or later.  If you do partitioning the traditional way by
inheriting the table, you'll also run into additional trouble since for
some operations Postgres will need to obtain a handle on all partitions
and that will easily cause you to run out of shared memory.  You can
increase max_locks_per_transaction, but that's undoable if the number
of partitions keeps growing. You need to keep increasing that value all
the time...

> Of course, I could (and should) upgrade my hardware -- I am using a
> dual Xeon 3 GHz server with 12 GB RAM, but there are limits to that route.

Always try to solve it by changing your data design first, unless what
you're trying to do is fundamentally limited by hardware.  You're not
likely going to request all those record at once, nor will you need to
search through all of them; try to come up with a sane way of quickly
slicing your data to a smaller set which can be quickly retrieved.

> Keep in mind, the circa 100 million rows was for only part of the db.
> If I were to build the entire db, I would have about 4 billion rows
> for a year, if I were to partition the db by years. And, partitioning
> by days resulted in too many tables.

Yeah, sounds similar to the troubles I ran into in my project.

> I wish there were a way around all this so I could use Pg, with my
> available resources, but it looks bleak right now.

Try using the array approach.

Possibly you could create columns for each week or month in a year
and store the individual days in an array in that column.  Extracting
those shouldn't be too hard.

You could store the different types of data you have in different rows
for each unit of information you want to store for a day.

Alternatively, store your data points all in one row, and store a row
for each day.  You could easily start partitioning historical data per
year or per decade.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

pgsql-general by date:

Previous
From: Brian Hirt
Date:
Subject: Need help with full text index configuration
Next
From: Tom Lane
Date:
Subject: Re: Need help with full text index configuration