Home > mailing lists

Best way to handle multi-billion row read-only table? - Mailing list pgsql-general

From	Asher
Subject	Best way to handle multi-billion row read-only table?
Date	February 9, 2010 17:10:02
Msg-id	4B719F9E.9080106@piceur.co.uk Whole thread Raw
Responses	Re: Best way to handle multi-billion row read-only table? (Justin Graf <justin@magwerks.com>) Re: Best way to handle multi-billion row read-only table? (Alex Thurlow <alex@blastro.com>) Re: Best way to handle multi-billion row read-only table? (Scott Marlowe <scott.marlowe@gmail.com>) Re: Best way to handle multi-billion row read-only table? ("BillR" <iambill@williamrosmus.com>) Re: Best way to handle multi-billion row read-only table? (Greg Smith <greg@2ndquadrant.com>)
List	pgsql-general

Tree view

Hello.

I'm putting together a database to store the readings from various
measurement devices for later processing. Since these things (water
pressure monitors attached to very large water pipes) take readings at
200Hz and are typically deployed over multiple sites for several months
at a time I've got many billions of rows of data, each (at the moment)
with the following simple format:

    value REAL NOT NULL,
    sample_time TIMESTAMP WITH TIME ZONE NOT NULL,
    channel INTEGER REFERENCES channel(id) NOT NULL

(Where the "channel" table contains metadata to identify the particular
sensor, data logger, etc. used to obtain the data and the combination of
channel and sample_time is unique.)

Once loaded into the database the data will never be deleted or modified
and will typically be accessed over a particular date range for a
particular channel (e.g. "sample_time >= X AND sample_time <= Y AND
channel=Z"). A typical query won't return more than a few million rows
and speed is not desperately important (as long as the time is measured
in minutes rather than hours).

Are there any recommended ways to organise this? Should I partition my
big table into multiple smaller ones which will always fit in memory
(this would result in several hundreds or thousands of sub-tables)? Are
there any ways to keep the index size to a minimum? At the moment I have
a few weeks of data, about 180GB, loaded into a single table and indexed
on sample_time and channel and the index takes up 180GB too.

Since this is all for a typically budget-restricted PhD project the
hardware is just a high-end desktop workstation with (at the moment)
2*2TB drives organised into a single 4TB partition using FreeBSD's vinum
system.


Many thanks for any help,
Asher.

pgsql-general by date:

From: Seb
Date: 09 February 2010, 16:56:09
Subject: Re: SSL connection lost after long-lasting copy command

From: Alvaro Herrera
Date: 09 February 2010, 18:18:47
Subject: Re: SSL connection lost after long-lasting copy command

Best way to handle multi-billion row read-only table? - Mailing list pgsql-general

Previous

Next