Re: very very slow inserts into very large table - Mailing list pgsql-performance

From Jeff Janes
Subject Re: very very slow inserts into very large table
Date
Msg-id CAMkU=1z9UFFmcELBkOD+nUQOymBZKdX_atqAxT5U=p8HUrVsXw@mail.gmail.com
Whole thread Raw
In response to Re: very very slow inserts into very large table  (Samuel Gendler <sgendler@ideasculptor.com>)
List pgsql-performance
On Mon, Jul 16, 2012 at 10:35 AM, Samuel Gendler
<sgendler@ideasculptor.com> wrote:
> On Mon, Jul 16, 2012 at 7:06 AM, Mark Thornton <mthornton@optrak.com> wrote:
>>>
>>>
>> Every insert updates four indexes, so at least 3 of those will be in
>> random order. The indexes don't fit in memory, so all those updates will
>> involve reading most of the relevant b-tree pages from disk (or at least the
>> leaf level). A total of 10ms of random read from disk (per inserted row)
>> wouldn't surprise me ... which adds up to more than 10 days for your 93
>> million rows.
>
>
> Which is the long way of saying that you will likely benefit from
> partitioning that table into a number of smaller tables, especially if
> queries on that table tend to access only a subset of the data that can be
> defined to always fit into a smaller number of partitions than the total.
> At the very least, inserts will be faster because individual indexes will be
> smaller.

If the select locality and the insert locality are not the same, and
the table is partitioned according to the select locality, then the
total index size needed to be accessed during the inserts will be
slightly larger, not smaller, under the partitioning and the inserts
will not perform well.

On the other hand, if the select locality and the insert locality are
the same, it should be possible to change the index definitions in a
way to get all the gains of your described partitioning, without
actually doing the partitioning.

> But unless all queries can't be constrained to fit within a subset
> of partitions, you'll also see improved performance on selects.

When you can't constrain the queries to fit within a subset of the
partitions is where I see a possible win from partitioning that can't
be obtained other ways.  By using partitioning, you can greatly
increase the insert performance by imposing a small cost on each
query.  The total cost is at least as great, but you have re-arranged
how the cost is amortized into a more acceptable shape.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jon Nelson
Date:
Subject: Re: very very slow inserts into very large table
Next
From: Mark Thornton
Date:
Subject: Re: very very slow inserts into very large table