Re: Relation extension scalability - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Relation extension scalability
Date
Msg-id CAFiTN-u2dZiXLP9jMEtA+OanwE8VPUPFmcJkk+y4c64Sht_n=A@mail.gmail.com
Whole thread Raw
In response to Re: Relation extension scalability  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers

On Sat, Jan 23, 2016 at 12:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote

Few comments about patch:
Thanks for reviewing..
 
1.
Patch is not getting compiled.

1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared identifier
Oh, My mistake, my preprocessor is ignoring this error and relacing it with BLKSIZE

I will fix in next version of patch.
2.
! page = BufferGetPage(buffer);
! PageInit(page, BufferGetPageSize
(buf), 0);
! freespace = PageGetHeapFreeSpace(page);
!
MarkBufferDirty(buffer);
! UnlockReleaseBuffer(buffer);
!
RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer), freespace);

What is the need to mark page dirty here, won't it automatically
be markerd dirty once the page is used?  I think it is required
if you wish to WAL-log this action.

These pages  are not going to be used immediately and we have done PageInit so i think it should be marked dirty before adding to FSM, so that if buffer get replaced out then it flushes the init data.
 
3. I think you don't need to multi-extend a relation if
HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
get a new page by extending a relation.

Yes, if HEAP_INSERT_SKIP_FSM is used and we use multi-extend atleast in current transaction it will not take pages from FSM and everytime it will do multi-extend, however pages will be used if there are parallel backend, but still not a good idea to extend every time in multiple chunk in current backend.

So i will change this..

4. Again why do you need this multi-extend optimization for local
relations (those only accessible to current backend)?

I think we can change this while adding the  table level "extend_by_blocks" for local table we will not allow this property, so no need to change at this place.

What do you think ?

5. Do we need this for nbtree as well?  One way to check that
is by Copying large data in table having index.

Ok, i will try this test and update.

 
Note: Test with both data and WAL on Magnetic Disk : No significant improvement visible
-- I think wall write is becoming bottleneck in this case.


In that case, can you try the same test with un-logged tables?

OK

Also, it is good to check the performance of patch with read-write work
load to ensure that extending relation in multiple-chunks should not
regress such cases.

Ok
 

Currently i have kept extend_num_page as session level parameter but i think later we can make this as table property.
Any suggestion on this ?


I think we should have a new storage_parameter at table level
extend_by_blocks or something like that instead of GUC. The
default value of this parameter should be 1 which means retain
current behaviour by default.

+1


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Next
From: Dilip Kumar
Date:
Subject: Re: Relation extension scalability