Re: adding new pages bulky way - Mailing list pgsql-hackers
From | Qingqing Zhou |
---|---|
Subject | Re: adding new pages bulky way |
Date | |
Msg-id | d868sp$1efk$1@news.hub.org Whole thread Raw |
In response to | adding new pages bulky way ("Victor Y. Yegorov" <viy@mits.lv>) |
Responses |
Re: adding new pages bulky way
|
List | pgsql-hackers |
"Tom Lane" <tgl@sss.pgh.pa.us> writes > > I very seriously doubt that there would be *any* win > I did a quick proof-concept implemenation to test non-concurrent batch insertion, here is the result: Envrionment: - Pg8.0.1 - NTFS / IDE -- batch 16 pages extension -- test=# insert into t select * from t; INSERT 0 131072 Time: 4167.000 ms test=# insert into t select * from t; INSERT 0 262144 Time: 8111.000 ms test=# insert into t select * from t; INSERT 0 524288 Time: 16444.000 ms test=# insert into t select * from t; INSERT 0 1048576 Time: 41980.000 ms -- batch 32 pages extension -- test=# insert into t select * from t; INSERT 0 131072 Time: 4086.000 ms test=# insert into t select * from t; INSERT 0 262144 Time: 7861.000 ms test=# insert into t select * from t; INSERT 0 524288 Time: 16403.000 ms test=# insert into t select * from t; INSERT 0 1048576 Time: 41290.000 ms -- batch 64 pages extension -- test=# insert into t select * from t; INSERT 0 131072 Time: 4236.000 ms test=# insert into t select * from t; INSERT 0 262144 Time: 8202.000 ms test=# insert into t select * from t; INSERT 0 524288 Time: 17265.000 ms test=# insert into t select * from t; INSERT 0 1048576 Time: 44063.000 ms -- batch 128 pages extension -- test=# insert into t select * from t; INSERT 0 131072 Time: 4256.000 ms test=# insert into t select * from t; INSERT 0 262144 Time: 8242.000 ms test=# insert into t select * from t; INSERT 0 524288 Time: 17375.000 ms test=# insert into t select * from t; INSERT 0 1048576 Time: 43854.000 ms -- one page extension -- test=# insert into t select * from t; INSERT 0 131072 Time: 4496.000 ms test=# insert into t select * from t; INSERT 0 262144 Time: 9013.000 ms test=# insert into t select * from t; INSERT 0 524288 Time: 19508.000 ms test=# insert into t select * from t; INSERT 0 1048576 Time: 49962.000 ms Benefits are there, and it is an approximate 10% improvement if we select good batch size. The explaination is: if a batch insertion need 6400 new pages, originally it does write()+file system logs 6400 times, now it does 6400/64 times(though each time the time cost is bigger). Also, considering write with different size have different cost, seems for my machine 32 is the an optimal choice. What I did include: (1) md.c Modify function mdextend(): - extend 64 pages each time; - after extension, let FSM be aware of it (change FSM a littlebit so it could report freespace also for an empty page) (2) bufmgr.c make ReadPage(+empty_page) treat different of an empty page and non-empty one to avoid unnecesary read for new pages, that is: if (!empty_page) smgrread(reln->rd_smgr, blockNum, (char *)MAKE_PTR(bufHdr->data)); else PageInit((char *) MAKE_PTR(bufHdr->data), BLCKSZ, 0); /* Only for heap pages and race could be here ... */ (3) hio.c RelationGetBufferForTuple(): - pass correct "empty_page" parameter to ReadPage() according to the query result from FSM. Regards, Qingqing
pgsql-hackers by date: