Re: Using Postgres to store high volume streams of sensor readings - Mailing list pgsql-general

From Tom Lane
Subject Re: Using Postgres to store high volume streams of sensor readings
Date
Msg-id 16193.1227394959@sss.pgh.pa.us
Whole thread Raw
In response to Re: Using Postgres to store high volume streams of sensor readings  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Using Postgres to store high volume streams of sensor readings
List pgsql-general
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The problem is, most likely, on updating the indexes.  Heap inserts
> should always take more or less the same time, but index insertion
> requires walking down the index struct for each insert, and the path to
> walk gets larger the more data you have.

It's worse than that: his test case inserts randomly ordered keys, which
means that there's no locality of access during the index updates.  Once
the indexes get bigger than RAM, update speed goes into the toilet,
because the working set of index pages that need to be touched also
is bigger than RAM.  That effect is going to be present in *any*
standard-design database, not just Postgres.

It's possible that performance in a real-world situation would be
better, if the incoming data stream isn't so random; but it's
hard to tell about that with the given facts.

One possibly useful trick is to partition the data by timestamp with
partition sizes chosen so that the indexes don't get out of hand.
But the partition management might be enough of a PITA to negate
any win.

            regards, tom lane

pgsql-general by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Postgres mail list traffic over time
Next
From: Tom Lane
Date:
Subject: Re: date stamp on update?