Re: Relation extension scalability - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Relation extension scalability
Date
Msg-id CAA4eK1+RCbb9E8UWPjK6ge1tgL8P4jbLe2bg4D8pA-L3p7Cxsw@mail.gmail.com
Whole thread Raw
In response to Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Relation extension scalability  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Tue, Jan 12, 2016 at 2:41 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:

I think it's a worthwhile approach to pursue. But until it actually
fixes the problem of leaving around uninitialized pages I don't think
it's very meaningful to do performance comparisons.

Attached patch solves this issue, I am allocating the buffer for each page and initializing the page, only after that adding to FSM.

Few comments about patch:

1.
Patch is not getting compiled.

1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier

2.
! page = BufferGetPage(buffer);
! PageInit(page, BufferGetPageSize
(buf), 0);
! freespace = PageGetHeapFreeSpace(page);
!
MarkBufferDirty(buffer);
! UnlockReleaseBuffer(buffer);
!
RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer), freespace);

What is the need to mark page dirty here, won't it automatically
be markerd dirty once the page is used?  I think it is required
if you wish to WAL-log this action.

3. I think you don't need to multi-extend a relation if
HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
get a new page by extending a relation.

4. Again why do you need this multi-extend optimization for local
relations (those only accessible to current backend)?

5. Do we need this for nbtree as well?  One way to check that
is by Copying large data in table having index.

 
Note: Test with both data and WAL on Magnetic Disk : No significant improvement visible
-- I think wall write is becoming bottleneck in this case.


In that case, can you try the same test with un-logged tables?

Also, it is good to check the performance of patch with read-write work
load to ensure that extending relation in multiple-chunks should not
regress such cases.

 
Currently i have kept extend_num_page as session level parameter but i think later we can make this as table property.
Any suggestion on this ?


I think we should have a new storage_parameter at table level
extend_by_blocks or something like that instead of GUC. The
default value of this parameter should be 1 which means retain
current behaviour by default.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Jinhua Luo
Date:
Subject: insert/update performance
Next
From: Amit Kapila
Date:
Subject: Re: insert/update performance