Re: [HACKERS] Declarative partitioning vs. BulkInsertState - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Declarative partitioning vs. BulkInsertState
Date
Msg-id CA+TgmoaiZpDVUUN8LZ4jv1qFE_QyR+H9ec+79f5vNczYarg5Zg@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Declarative partitioning vs. BulkInsertState
List pgsql-hackers
On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/01/06 20:23, Amit Langote wrote:
>> On 2017/01/05 3:26, Robert Haas wrote:
>>> It's unclear to me why we need to do 0002.  It doesn't seem like it
>>> should be necessary, it doesn't seem like a good idea, and the commit
>>> message you proposed is uninformative.
>>
>> If a single BulkInsertState object is passed to
>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>> different partitions (from one input tuple to next), tuples might end up
>> going into wrong heaps (like demonstrated in one of the reports [1]).  A
>> simple solution is to disable bulk-insert in case of partitioned tables.
>>
>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>> conflated multi-insert stuff and bulk-insert considerations.  I revised
>> 0002 to not do that.
>
> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
> Attaching updated 0002 along with rebased 0001 and 0003.

The BulkInsertState is not there only to improve performance.  It's
also there to make sure we use a BufferAccessStrategy, so that we
don't trash the whole buffer arena.  See commit
85e2cedf985bfecaf43a18ca17433070f439fb0e.  If a partitioned table uses
a separate BulkInsertState for each partition, I believe it will also
end up using a separate ring of buffers for every partition.  That may
well be faster than copying into an unpartitioned table in some cases,
because dirtying everything in the buffer arena without actually
writing any of those buffers is a lot faster than actually doing the
writes.  But it is also anti-social behavior; we have
BufferAccessStrategy objects for a reason.

One idea would be to have each partition use a separate
BulkInsertState but have them point to the same underlying
BufferAccessStrategy, but even that's problematic, because it could
result in us holding a gigantic number of pins (one per partition). I
think maybe a better idea would be to add an additional function
ReleaseBulkInsertStatePin() which gets called whenever we switch
relations, and then just use the same BulkInsertState throughout.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Declarative partitioning - another take