Re: Batch update of indexes - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: Batch update of indexes
Date
Msg-id 56A102AE.7030106@postgrespro.ru
Whole thread Raw
In response to Re: Batch update of indexes  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Batch update of indexes  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
20.01.2016 17:55, Konstantin Knizhnik:
> Hi,
>
>> Hi, I glad to see that you interested in that too.
>> I think this is a good feature and I think it will be very useful to 
>> have.
>> I have already mentioned some related problems and possible 
>> improvements in my presentation.
>> http://www.slideshare.net/AnastasiaLubennikova/indexes-dont-mean-slow-inserts 
>>
>> Last two slides concern to this thread. Briefly, I've suggested to 
>> think about insertion buffer. Something very similar to it is already 
>> implemented in BRIN. It does not index last data from heap, while the 
>> number of last pages is less than pages_per_block.
>
> Do you mean GIN-like usage of insertion buffer (here it is called 
> "pending list")?
> So that we have to combine search in the main tree and in the insert 
> buffer?
> Actually this is what I want to avoided (because at least in case of 
> GIN pending list cause significant degrade of performance,
> while up-to-date state of full text index is rarely required).
>

What I meant is more like a BRIN-like combination of an index scan and 
heap scan.
Maybe it could be called "deferred inserts" or "temporary read-only index"
Maybe it's similar with mysql insert buffer 
http://dev.mysql.com/doc/refman/5.7/en/innodb-insert-buffering.html
I think it'll be more clear with example. Please don't care about syntax.

CREATE TABLE tbl (c1 int);
CREATE INDEX idx on tbl(c1);

SET enable_deferred_insert(idx) = on;
At this moment, we save the last_indexed_item (its TID) somewhere in 
index metapage.

Since that moment, the data inserted into the table doesn't touch the index.
We perform some heavy insert and then go back to the normal index behavior.

SET enable_deferred_insert(idx) = off;
This command takes all the data between the last_indexed_item and the 
end of the table, and inserts it into the index at a time.

Of course there are new problems to deal with, but it's really useful 
for the use case to balance irregular heavy write load, isn't it?

BTW, could you explain, what is the reason to copy data into the pending 
list and then copy it again while flushing pending list into the index? 
Why not read this data directly from the table? I feel that I've missed 
something important here.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expanded Objects and Order By
Next
From: Fujii Masao
Date:
Subject: Re: removal of unused argument in ginInsertCleanup()