Trading off large objects (arrays, large strings, large tables) for timeseries - Mailing list pgsql-general

My questions briefly:

(1) I made experiments with large (millions of rows/elements) arrays
    of text (text[], each element is 20-30 characters). On 7.4 (Debian
    Sarge prepackaged), inserting such an array takes forever (10
    thousand elements per minute), but accessing, or writing an
    element, or appending an element, is done instantly. On 8.0.1
    (compiled by me), inserting is very fast (two million per minute,
    on the same machine of course), but selecting any element takes
    more than a second, and updating any row, or appending, takes 25
    seconds.

    Why 25 seconds for appending an element? Does it rewrite the
    entire array?

(2) I also tried using a large (80M) text instead (i.e. instead of
    storing an array of lines, I store a huge plain text file). What
    surprised me is that I can get the 'tail' of the file (using
    substring) in only around one second, although it is transparently
    compressed (to 17M). It doesn't decompress the entire string, does
    it? Does it store it somehow chunked?

    It also takes 25 seconds to append something (update ... set
    string=string||'...').  Is there any way to do clever, faster
    appending, like I can in a text file?

What I'm trying to do is find a good way to store timeseries. A
timeseries is essentially a series of (date, value) pairs, and more
specifically it is an array of records, each record consisting of
three items: date TIMESTAMP, value DOUBLE PRECISION, flags TEXT. The
flags is null in more than 99% of the records, but occasionally it
contains flags or short comments. My above experiments are with
comma-separated values in plain ascii; I haven't experimented with an
array of composite type yet.

The most important operations are:
    (1) Retrieving or inserting/replacing the entire timeseries
    (2) Selecting the last record
    (3) Appending a record
I'm not interested in selecting part of a timeseries, or updating a
record in the middle. Such operations do happen, but you can do them
by selecting/replacing the entire timeseries instead. But (2) and (3)
will be very frequent, and I don't want to do them through (1).

I'm also considering a table, of course, where each timeseries record
will be one row. I have experimented only a little with that (id
integer, date timestamp, value double precision, flags text, primary
key(id, date)). It appears to be fast alright, a problem being that it
appears to consume much disk space (about 4 times more than an array
of text, which is about 4 times more than a single huge text).

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Lost rows/data corruption?
Next
From: Peter Eisentraut
Date:
Subject: Re: database encoding "WIN" -- Western or Cyrillic?