On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> The problem is, most likely, on updating the indexes. Heap inserts
>> should always take more or less the same time, but index insertion
>> requires walking down the index struct for each insert, and the path to
>> walk gets larger the more data you have.
>
> It's worse than that: his test case inserts randomly ordered keys, which
> means that there's no locality of access during the index updates. Once
> the indexes get bigger than RAM, update speed goes into the toilet,
> because the working set of index pages that need to be touched also
> is bigger than RAM. That effect is going to be present in *any*
> standard-design database, not just Postgres.
>
> It's possible that performance in a real-world situation would be
> better, if the incoming data stream isn't so random; but it's
> hard to tell about that with the given facts.
>
> One possibly useful trick is to partition the data by timestamp with
> partition sizes chosen so that the indexes don't get out of hand.
> But the partition management might be enough of a PITA to negate
> any win.
>
> regards, tom lane
Thanks for your feedback! This is just as I supposed, but i didn't
had the Postgres experience to be certain.
I'll include your conclusion to my report.
Ciprian Craciun.