Home > mailing lists

Re: Best way to handle multi-billion row read-only table? - Mailing list pgsql-general

From	Alex Thurlow
Subject	Re: Best way to handle multi-billion row read-only table?
Date	February 9, 2010 19:24:49
Msg-id	4B71C316.5060708@blastro.com Whole thread Raw
In response to	Best way to handle multi-billion row read-only table? (Asher <asher@piceur.co.uk>)
List	pgsql-general

Tree view

I've only gone up to about a billion rows, but table partitioning seems
to be the way to go to me.  I did per-day partitioning, and just had the
job that inserts the daily data add the table automatically.  With the
partitioning, it only has to pull up the tables for the specific days,
and is therefore a kind of date index already.

Alex Thurlow
Blastro Networks

http://www.blastro.com
http://www.roxwel.com
http://www.yallwire.com


On 2/9/2010 11:47 AM, Asher wrote:
> Hello.
>
> I'm putting together a database to store the readings from various
> measurement devices for later processing. Since these things (water
> pressure monitors attached to very large water pipes) take readings at
> 200Hz and are typically deployed over multiple sites for several
> months at a time I've got many billions of rows of data, each (at the
> moment) with the following simple format:
>
>     value REAL NOT NULL,
>     sample_time TIMESTAMP WITH TIME ZONE NOT NULL,
>     channel INTEGER REFERENCES channel(id) NOT NULL
>
> (Where the "channel" table contains metadata to identify the
> particular sensor, data logger, etc. used to obtain the data and the
> combination of channel and sample_time is unique.)
>
> Once loaded into the database the data will never be deleted or
> modified and will typically be accessed over a particular date range
> for a particular channel (e.g. "sample_time >= X AND sample_time <= Y
> AND channel=Z"). A typical query won't return more than a few million
> rows and speed is not desperately important (as long as the time is
> measured in minutes rather than hours).
>
> Are there any recommended ways to organise this? Should I partition my
> big table into multiple smaller ones which will always fit in memory
> (this would result in several hundreds or thousands of sub-tables)?
> Are there any ways to keep the index size to a minimum? At the moment
> I have a few weeks of data, about 180GB, loaded into a single table
> and indexed on sample_time and channel and the index takes up 180GB too.
>
> Since this is all for a typically budget-restricted PhD project the
> hardware is just a high-end desktop workstation with (at the moment)
> 2*2TB drives organised into a single 4TB partition using FreeBSD's
> vinum system.
>
>
> Many thanks for any help,
> Asher.
>

pgsql-general by date:

From: Justin Graf
Date: 09 February 2010, 18:40:34
Subject: Re: Best way to handle multi-billion row read-only table?

From: John R Pierce
Date: 09 February 2010, 19:56:08
Subject: Re: vacuumdb ERROR: out of memory

Re: Best way to handle multi-billion row read-only table? - Mailing list pgsql-general

Previous

Next