Re: Best way to handle multi-billion row read-only table? - Mailing list pgsql-general

From John R Pierce
Subject Re: Best way to handle multi-billion row read-only table?
Date
Msg-id 4B71DCC9.8000402@hogranch.com
Whole thread Raw
In response to Re: Best way to handle multi-billion row read-only table?  (Asher Hoskins <asher@piceur.co.uk>)
Responses Re: Best way to handle multi-billion row read-only table?  (Asher <asher@piceur.co.uk>)
List pgsql-general
Asher Hoskins wrote:
> If I partition so that each partition holds data for a single channel
> (and set a CHECK constraint for this) then I can presumably remove the
> channel from the index since constraint exclusion will mean that only
> partitions holding the channel I'm interested in will be searched in a
> query. Given that within a partition all of my sample_time's will be
> different do you know if there's a more efficient way to index these?

how do you plan on accessing this monster data?  do you expect to be
looking up single values or small set of values at a specific time?

seems to me like this is the sort of data thats more often processed in
the aggregate, like running a fourier analysis of sliding windows, and
that sort of data processing may well be more efficiently done with
fixed block binary files rather than relational databases, as there's no
real relationships in this data.

for instance, a directory for each sensor, with a directory for each
week, and a file for each hour, containing the hours worth of samples in
fixed binary blocks after a file header identifying it.  you can random
access a specific time sample by using fseek
(sampletime-starttimeofblock) * blocksize + headersize or whatever.


pgsql-general by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: ERROR: could not load library "...": Exec format error
Next
From: Guillaume Lelarge
Date:
Subject: Re: viewing large queries in pg_stat_activity