Re: insert performance - Mailing list pgsql-performance

From Jinhua Luo
Subject Re: insert performance
Date
Msg-id CAAc9rOxrMDsAjaaKJXWHY=aYARRhqZJsYUACadng+4-O0qeLmg@mail.gmail.com
Whole thread Raw
In response to Re: insert performance  (Jinhua Luo <luajit.io@gmail.com>)
Responses Re: insert performance
List pgsql-performance
Hi,

I thought with async commit enabled, the backend process would rarely
do file io. But in fact, it still involves a lot of file io.

After inspecting the vfs probes using systemtap, and reading the
source codes of postgresql, I found the tight loop of insert or update
will cause heavy file io upon the data files (table, indexes) directly
by the backend process! And those io has nothing to do with shared
buffer dirty writes.

The heap_insert() or heap_update() would invoke
RelationGetBufferForTuple(), which in turn finally invoke
ReadBuffer_common():

1) lookup or allocate the buffer from shared buffer, which may cause
dirty write (but in my case, it's rare. Maybe the shared buffer is big
enough and the checkpointer or bgwriter always clean it in time). If
the target buffer is found, skip following steps.

2)  if it needs to extend the relation (insert or update on new table
would normally fall in this case), then it would write zero-filled
page into the disk (used to occupy the file space? But most file
systems support file hole or space reservation, so maybe this part
could be optimized?) This procedure would hold the exclusive lock on
the relation. So if the write is slow, it would slow down all pending
queries of the lock waiters.

3) Otherwise, it would read from disk.

The target buffer would be locked exclusively until the insert or
update finish. Note that the insert or update also involve xlog
insert, although with async commit enabled, the backend process would
not flush the xlog, but chances are that the xlog buffer dirty writes
happens (although it's also rare in my case).

So I guess the real reason is the file io with lock holding. If io
spike happens, it would cause long query duration.

Am I correct? Look forward to any advice.

Thanks.

Regards,
Jinhua Luo


pgsql-performance by date:

Previous
From: David Rowley
Date:
Subject: Re: Query order of magnitude slower with slightly different where clause
Next
From: Phil S
Date:
Subject: High Planning Time