Re: insert performance - Mailing list pgsql-performance

From Jim Nasby
Subject Re: insert performance
Date
Msg-id 56A55D19.4030905@BlueTreble.com
Whole thread Raw
In response to Re: insert performance  (Jinhua Luo <luajit.io@gmail.com>)
List pgsql-performance
On 1/21/16 11:54 PM, Jinhua Luo wrote:
> There is another problem.
> When the autovacuum recycles the old pages, the ReadBuffer_common()
> would do mdread() instead of mdextend().
> The read is synchronous, while the write could be mostly asynchronous,
> so the frequent read is much worse than write version.

Please don't top post.

AFAICT your analysis is correct, though I don't see what autovacuum has
to do with anything. When we need a new block to put data on we'll
either find one with free space in the free space map and use it, or
we'll extend the relation.

> Any help? Please.

There's been some discussion on ways to improve the performance of
relation extension (iirc one of those was to not zero the new page), but
ultimately you're at the mercy of the underlying OS and hardware.

If you have ideas for improving this you should speak up on -hackers,
but before doing so you should read the archives about what's been
proposed in the past.

> 2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@gmail.com>:
>> Hi,
>>
>> I thought with async commit enabled, the backend process would rarely
>> do file io. But in fact, it still involves a lot of file io.
>>
>> After inspecting the vfs probes using systemtap, and reading the
>> source codes of postgresql, I found the tight loop of insert or update
>> will cause heavy file io upon the data files (table, indexes) directly
>> by the backend process! And those io has nothing to do with shared
>> buffer dirty writes.
>>
>> The heap_insert() or heap_update() would invoke
>> RelationGetBufferForTuple(), which in turn finally invoke
>> ReadBuffer_common():
>>
>> 1) lookup or allocate the buffer from shared buffer, which may cause
>> dirty write (but in my case, it's rare. Maybe the shared buffer is big
>> enough and the checkpointer or bgwriter always clean it in time). If
>> the target buffer is found, skip following steps.
>>
>> 2)  if it needs to extend the relation (insert or update on new table
>> would normally fall in this case), then it would write zero-filled
>> page into the disk (used to occupy the file space? But most file
>> systems support file hole or space reservation, so maybe this part
>> could be optimized?) This procedure would hold the exclusive lock on
>> the relation. So if the write is slow, it would slow down all pending
>> queries of the lock waiters.
>>
>> 3) Otherwise, it would read from disk.
>>
>> The target buffer would be locked exclusively until the insert or
>> update finish. Note that the insert or update also involve xlog
>> insert, although with async commit enabled, the backend process would
>> not flush the xlog, but chances are that the xlog buffer dirty writes
>> happens (although it's also rare in my case).
>>
>> So I guess the real reason is the file io with lock holding. If io
>> spike happens, it would cause long query duration.
>>
>> Am I correct? Look forward to any advice.
>>
>> Thanks.
>>
>> Regards,
>> Jinhua Luo


--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: High Planning Time
Next
From: Florian Gossin
Date:
Subject: Primary key index partially used