Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers
From | John Lumby |
---|---|
Subject | Extended Prefetching using Asynchronous IO - proposal and patch |
Date | |
Msg-id | BAY175-W45086073075CA064EFE9A0A33A0@phx.gbl Whole thread Raw |
In response to | Re: Race condition within _bt_findinsertloc()? (new page split code) (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Extended Prefetching using Asynchronous IO - proposal
and patch
|
List | pgsql-hackers |
<div dir="ltr">Claudio Freire and I are proposing new functionality for Postgresql <br />to extend the scope of prefetchingand also exploit posix asynchronous IO<br />when doing prefetching, and have a patch based on 9.4dev<br />readyfor consideration.<br /><br />This topic has cropped up at irregular intervals over the years,<br />e.g. this threadback in 2012<br /> <a href="www.postgresql.org/message-id/CAGTBQpbu2M=-M7NUr6DWr0K8gUVmXVhwKohB-Cnj7kYS1AhH4A@mail.gmail.com" target="_blank">www.postgresql.org/message-id/CAGTBQpbu2M=-M7NUr6DWr0K8gUVmXVhwKohB-Cnj7kYS1AhH4A@mail.gmail.com</a><br />andthis thread more recently<br /> http://www.postgresql.org/message-id/CAGTBQpaFC_z=zdWVAXD8wWss3v6jxZ5pNmrrYPsD23LbrqGvgQ@mail.gmail.com<br/><br />We nowhave an implementation which gives useful performance improvement<br />as well as other advantages compared to what iscurrently available,<br />at least for certain environments.<br /><br />Below I am pasting the README we have written forthis new functionality<br />which mentions some of the measurements, advantages (and disadvantages)<br />and we welcomeall and any comments on this.<br /><br />I will send the patch to commitfest later, once this email is posted to hackers,<br/>so that anyone who wishes can try it, or apply directly to me if you wish.<br />The patch is currently basedon 9.4dev but a version based on 9.3.4<br />will be available soon if anyone wants that. The patch is large (43files)<br />so non-trivial to review, but any comments on it (when posted) will be<br />appreciated and acted on. Note that at present the only environment<br />in which it has been applied and tested is linux.<br /><br />John Lumby <br />__________________________________________________________________________________________________<br /><br/><br />Postgresql -- Extended Prefetching using Asynchronous IO<br />============================================================<br/><br />Postgresql currently (9.3.4) provides a limitedprefetching capability<br />using posix_fadvise to give hints to the Operating System kernel<br />about which pagesit expects to read in the near future.<br />This capability is used only during the heap-scan phase of bitmap-indexscans.<br />It is controlled via the effective_io_concurrency configuration parameter.<br /><br />This capabilityis now extended in two ways :<br /> . use asynchronous IO into Postgresql shared buffers as an<br /> alternative to posix_fadvise<br /> . Implement prefetching in other types of scan :<br /> . non-bitmap(i.e. simple) index scans - index pages<br /> currently only for B-tree indexes.<br /> (developed by Claudio Freire <klaussfreire(at)gmail(dot)com>)<br /> . non-bitmap (i.e.simple) index scans - heap pages<br /> currently only for B-tree indexes.<br /> . simple heap scans<br /><br />Posix asynchronous IO is chosen as the function library for asynchronous IO,<br/>since this is well supported and also fits very well with the model of<br />the prefetching process, particularlyas regards checking for completion<br />of an asynchronous read. On linux, Posix asynchronous IO is provided<br/>in the librt library. librt uses independently-schedulable threads to<br />achieve the asynchronicity, rather than kernel functionality.<br /><br />In this implementation, use of asynchronous IO is limitedto prefetching<br />while performing one of the three types of scan<br /> . B-tree bitmap index scan -heap pages (as already exists)<br /> . B-tree non-bitmap (i.e. simple) index scans - index and heap pages<br/> . simple heap scans<br />on permanent relations. It is not used on temporary tables nor for writes.<br/><br />The advantages of Posix asynchronous IO into shared buffers<br />compared to posix_fadvise are :<br /> . Beneficial for non-sequential access patterns as well as sequential<br /> . No restriction on the kinds of IOwhich can be used<br /> (other kinds of asynchronous IO impose restrictions such as<br /> buffer alignment, use of non-buffered IO).<br /> . Does not interfere with standard linux kernel read-ahead functionality.<br/> (It has been stated in <br /> www.postgresql.org/message-id/CAGTBQpbu2M=-M7NUr6DWr0K8gUVmXVhwKohB-Cnj7kYS1AhH4A@mail.gmail.com<br/> that :<br/> "the kernel stops doing read-ahead when a call to posix_fadvise comes.<br /> I noticed the performancehit, and checked the kernel's code.<br /> It effectively changes the prediction mode from sequentialto fadvise,<br /> negating the (assumed) kernel's prefetch logic")<br /> . When the read requestis issued after a prefetch has completed,<br /> no delay associated with a kernel call to copy the page from<br/> kernel page buffers into the Postgresql shared buffer,<br /> since it is already there.<br /> Also, in a memory-constrained environment, there is a greater<br /> probability that the prefetched pagewill "stick" in memory<br /> since the linux kernel victimizes the filesystem page cache in preference<br /> to swapping out user process pages.<br /> . Statistics on prefetch success can be gathered (see "Statistics"below)<br /> which helps the administrator to tune the prefetching settings.<br /><br />These benefitsare most likely to be obtained in a system whose usage profile<br />(e.g. from iostat) shows:<br /> . highIO wait from mostly-read activity<br /> . disk access pattern is not entirely sequential<br /> (so kernelreadahead can't predict it but postgresql can)<br /> . sufficient spare idle CPU to run the librt pthreads<br/> or, stated another way, the CPU subsystem is relatively powerful<br /> compared to thedisk subsystem.<br />In such ideal conditions, and with a workload with plenty of index scans,<br />around 10% - 20%improvement in throughput has been achieved.<br />In an admittedly extreme environment measured by this author, witha workload<br />consisting of 8 client applications each running similar complex queries<br />(same query structure butdifferent predicates and constants),<br />including 2 Bitmap Index Scans and 17 non-bitmap index scans,<br />on a dual-coreIntel laptop (4 hyperthreads) with the database on a single<br />USB3-attached 500GB disk drive, and no part ofthe database in filesystem buffers<br />initially, (filesystem freshly mounted), comparing unpatched build<br />usingposix_fadvise with effective_io_concurrency 4 against same build patched<br />with async IO and effective_io_concurrency4 and max_async_io_prefetchers 32,<br />elapse time repeatably improved from around 640-670 secondsto around 530-550 seconds,<br />a 17% - 18% improvement. <br /><br />The disadvantages of Posix asynchronous IO comparedto posix_fadvise are:<br /> . probably higher CPU utilization:<br /> Firstly, the extra work performedby the librt threads adds CPU<br /> overhead, and secondly, if the asynchronous prefetching is effective,<br/> then it will deliver better (greater) overlap of CPU with IO, which<br /> will reduce elapsedtimes and hence increase CPU utilization percentage<br /> still more (during that shorter elapsed time).<br/> . more context switching, because of the additional threads.<br /><br /><br />Statistics:<br />___________<br/><br />A number of additional statistics relating to effectiveness of asynchronous IO<br />are providedas an extension of the existing pg_stat_statements loadable module.<br />Refer to the appendix "Additional SuppliedModules" in the current<br />PostgreSQL Documentation for details of this module.<br /><br />The following additionalstatistics are provided for asynchronous IO prefetching:<br /><br /> . aio_read_noneed : number of prefetchesfor which no need for prefetch as block already in buffer pool<br /> . aio_read_discrd : number of prefetchesfor which buffer not subsequently read and therefore discarded<br /> . aio_read_forgot : number of prefetchesfor which buffer not subsequently read and then forgotten about<br /> . aio_read_noblok : number of prefetchesfor which no available BufferAiocb control block<br /> . aio_read_failed : number of aio reads for whichaio itself failed or the read failed with an errno<br /> . aio_read_wasted : number of aio reads for which in-progressaio cancelled and disk block not used<br /> . aio_read_waited : number of aio reads for which disk blockused but had to wait for it<br /> . aio_read_ontime : number of aio reads for which disk block used and readyon time when requested<br /><br />Some of these are (hopefully) self-explanatory. Some additional notes:<br /><br/> . aio_read_discrd and aio_read_forgot :<br /> prefetch was wasted work since the buffer wasnot subsequently read<br /> The discrd case indicates that the scanner realized this and discardedthe buffer,<br /> whereas the forgot case indicates that the scanner did not realize it,<br /> which should not normally occur.<br /> A high number in either suggests loweringeffective_io_concurrency.<br /><br /> . aio_read_noblok : <br /> Any significant numberin relation to all the other numbers indicates that<br /> max_async_io_prefetchers should be increased.<br/><br /> . aio_read_waited :<br /> The page was prefetched but the asynchronous readhad not completed by the time the<br /> scanner requested to read it. causes extra overhead inwaiting and indicates<br /> prefetching is not providing much if any benefit.<br /> The disk subsystem may be underpowered/overloaded in relation to the available CPU power.<br /><br /> . aio_read_ontime :<br /> The page was prefetched and the asynchronous read had completed by thetime the<br /> scanner requested to read it. Optimal behaviour. If this number if large<br/> in relation to all the other numbers except (possibly) aio_read_noneed,<br /> then prefetching is working well.<br /><br />To create the extension with support for these additionalstatistics, use the following syntax:<br /> CREATE EXTENSION pg_stat_statements VERSION '1.3'<br />or, ifyou run the new code against an existing database which already has the extension<br />( see installation and migrationbelow ), you can <br /> ALTER EXTENSION pg_stat_statements UPDATE TO '1.3'<br /><br />A suggested set of commandsfor displaying these statistics might be :<br /><br /> /* OPTIONALLY */ DROP extension pg_stat_statements;<br /> CREATE extension pg_stat_statements VERSION '1.3';<br /> /* run your workload */<br /> select userid , dbid , substring(query from 1 for 24) , calls , total_time , rows , shared_blks_read ,blk_read_time , blk_write_time \<br /> , aio_read_noneed , aio_read_noblok , aio_read_failed , aio_read_wasted, aio_read_waited , aio_read_ontime , aio_read_forgot \<br /> from pg_stat_statementswhere shared_blks_read > 0;<br /><br /><br />Installation and Build Configuration:<br />_____________________________________<br/><br />1. First - a prerequsite:<br /># as well as requiring all the usual packagebuild tools such as gcc , make etc,<br /># as described in the instructions for building postgresql,<br /># thefollowing is required :<br /> gnu autoconf at version 2.69 :<br /># run the following command<br />autoconf -V<br />#it *must* return<br />autoconf (GNU Autoconf) 2.69<br /><br />2. If you don't have it or it is a different version,<br/>then you must obtain version 2.69 (which is the current version)<br />from your distribution provider or fromthe gnu software download site.<br /><br />3. Also you must have the source tree for postgresql version 9.4 (developmentversion).<br /># all the following commands assume your current working directory is the top of the sourcetree.<br /><br />4. cd to top of source tree :<br /># check it appears to be a postgresql source tree<br />ls -ldconfigure.in src<br /># should show both the file and the directory<br />grep PostgreSQL COPYRIGHT<br /># should showPostgreSQL Database Management System<br /><br />5. Apply the patch :<br />patch -b -p0 -i <patch_file_path><br/># should report no errors, 42 files patched (see list at bottom of this README)<br /># andall hunks applied<br /># check the patch was appplied to configure.in<br />ls -ld configure.in.orig configure.in<br /># should show both files<br /><br />6. Rebuild the configure script with the patched configure.in :<br />mv configureconfigure.orig;<br />autoconf configure.in >configure;echo "rc= $? from autoconf"; chmod +x configure;<br />ls-lrt configure.orig configure;<br /><br />7. run the new configure script :<br /># if you have run configure before,<br/># then you may first want to save existing config.status and config.log if they exist,<br /># and then specifysame configure flags and options as you specified before.<br /># the patch does not alter or extend the set of configureoptions<br /># if unsure, run ./configure --help<br /># if still unsure, run ./configure<br />./configure<other configure options as desired><br /><br /><br /><br />8. now check that configure decided that thisenvironment supports asynchronous IO :<br />grep USE_AIO_ATOMIC_BUILTIN_COMP_SWAP src/include/pg_config.h<br /># itshould show<br />#define USE_AIO_ATOMIC_BUILTIN_COMP_SWAP 1<br /># if not, apparently your environment does not supportasynch IO -<br /># the config.log will show how it came to that conclusion,<br /># also check for :<br /># .a librt.so somewhere in the loader's library path (probably under /lib , /lib64 , or /usr)<br /># . your gcc must supportthe atomic compare_and_swap __sync_bool_compare_and_swap built-in function<br /># do not proceed without this definebeing set.<br /><br />9. do you want to use the new code on an existing cluster<br /> that was created using thesame code base but without the patch?<br /> If so then run this nasty-looking command :<br /> (cut-and-paste it intoa terminal window or a shell-script file)<br /> Otherwise continue to step 10.<br /> see Migration note below forexplanation.<br />###############################################################################################<br /> fl=src/Makefile.global; typeset -i bkx=0; while [[ $bkx < 200 ]]; do {<br /> bkfl="${fl}.bak${bkx}"; if [[ -a${bkfl} ]]; then ((bkx=bkx+1)); else break; fi;<br /> }; done;<br /> if [[ -a ${bkfl} ]]; then echo "sorry cannot finda backup name for $fl";<br /> elif [[ -a $fl ]]; then {<br /> mv $fl $bkfl && {<br /> sed -e"/^CFLAGS =/ s/\$/ -DAVOID_CATALOG_MIGRATION_FOR_ASYNCIO/" $bkfl > $fl;<br /> str="diff -w $bkfl $fl";echo"$str"; eval "$str";<br /> };<br /> };<br /> else echo "ooopppss $fl is missing";<br /> fi;<br />###############################################################################################<br/># it should reportsomething like<br />diff -w Makefile.global.bak0 Makefile.global<br />222c222<br />< CFLAGS = XXXX<br />---<br />>CFLAGS = XXXX -DAVOID_CATALOG_MIGRATION_FOR_ASYNCIO<br /># where XXXX is some set of flags<br /><br /><br />10. nowrun the rest of the build process as usual -<br /> follow instructions in file INSTALL if that file exists,<br /> else e.g. run<br />make && make install<br /><br />If the build fails with the following error:<br />undefinedreference to `aio_init'<br />Then edit the following file<br />src/include/pg_config_manual.h<br />and add thefollowing line at the bottom:<br /><br />#define DONT_HAVE_AIO_INIT<br /><br />and then run<br />make clean &&make && make install<br />See notes to section Runtime Configuration below for more information on this.<br/><br /><br /><br />Migration , Runtime Configuration, and Use:<br />___________________________________________<br/><br /><br />Database Migration:<br />___________________<br /><br />Thenew prefetching code for non-bitmap index scans introduces a new btree-index<br />function named btpeeknexttuple. The correct way to add such a function involves<br />also adding it to the catalog as an internal functionin pg_proc.<br />However, this results in the new built code considering an existing database to be<br />incompatible, i.e requiring backup on the old code and restore on the new.<br />This is normal behaviour for migrationto a new version of postgresql, and is<br />also a valid way of migrating a database for use with this asynchronousIO feature,<br />but in this case it may be inconvenient.<br /><br />As an alternative, the new code may becompiled with the macro define<br />AVOID_CATALOG_MIGRATION_FOR_ASYNCIO<br />which does what it says by not altering thecatalog. The patched build can then<br />be run against an existing database cluster initdb'd using the unpatched build.<br/><br />There are no known ill-effects of so doing, but :<br /> . in any case, it is strongly suggested tomake a backup of any precious database<br /> before accessing it with a patched build<br /> . be aware thatif this asynchronous IO feature is eventually released as part of postgresql,<br /> migration will probably berequired anyway.<br /><br />This option to avoid catalog migration is intended as a convenience for a quick test,<br />andalso makes it easier to obtain performance comparisons on the same database.<br /><br /><br /><br />Runtime Configuration:<br/>______________________<br /><br />One new configuration parameter settable in postgresql.conf and<br />inany other way as described in the postgresql documentation :<br /><br />max_async_io_prefetchers<br /> Maximum numberof background processes concurrently using asynchronous<br /> librt threads to prefetch pages into shared memory buffers<br/><br />This number can be thought of as the maximum number<br />of librt threads concurrently active, each workingon a list of<br />from 1 to target_prefetch_pages pages ( see notes 1 and 2 ).<br /><br />In practice, this numbersimply controls how many prefetch requests in total<br />may be active concurrently :<br /> max_async_io_prefetchers* target_prefetch_pages ( see note 1)<br /><br />default is max_connections/6<br />and recall thatthe default for max_connections is 100<br /><br /><br />note 1 a number based on effective_io_concurrency and approximatelyn * ln(n)<br /> where n is effective_io_concurrency<br /><br />note 2 Provided that the gnu extensionto Posix AIO which provides the<br />aio_init() function is present, then aio_init() is called<br />to set thelibrt maximum number of threads to max_async_io_prefetchers,<br />and to set the maximum number of concurrent aio readrequests to the product of<br /> max_async_io_prefetchers * target_prefetch_pages<br /><br /><br />As well asthis regular configuration parameter,<br />there are several other parameters that can be set via environment variable.<br/>The reason why they are environment vars rather than regular configuration parameters<br />is that it is notexpected that they should need to be set, but they may be useful :<br /> variable name values default meaning<br /> PG_TRY_PREFETCHING_FOR_BITMAP [Y|N] Y whether to prefetch bitmap heap scans<br /> PG_TRY_PREFETCHING_FOR_ISCAN [Y|N|integer[,[N|Y]]] 256,N whether to prefetch non-bitmap index scans<br /> also numeric size of list of prefetched blocks<br /> also whether to prefetch forward-sequential-patternindex pages<br /> PG_TRY_PREFETCHING_FOR_BTREE [Y|N] Y whetherto prefetch heap pages in non-bitmap index scans<br /> PG_TRY_PREFETCHING_FOR_HEAP [Y|N] N whether to prefetch relation (un-indexed) heap scans<br /><br /><br />The setting for PG_TRY_PREFETCHING_FOR_ISCANis a litle complicated.<br />It can be set to Y or N to control prefetching of non-bitmap indexscans;<br />But in addition it can be set to an integer, which both implies Y<br />and also sets the size of a listused to remember prefetched but unread heap pages.<br />This list is an optimization used to avoid re-prefetching andmaximise the potential<br />set of prefetchable blocks indexed by one index page.<br />And if set to an integer, thisinteger may be followed by either ,Y or ,N<br />to specify to prefetch index pages which are being accessed forward-sequentially.<br/>It has been found that prefetching is not of great benefit for this access pattern,<br />and soit is not the default, but also does no harm (provided sufficient CPU capacity).<br /><br /><br /><br />Usage :<br />______<br/><br /><br />There are no changes in usage other than as noted under Configuration and Statistics.<br />However, in order to assess benefit from this feature, it will be useful to<br />understand the query access plans ofyour workload using EXPLAIN. Before doing that,<br />make sure that statistics are up to date using ANALYZE.<br /><br/><br /><br />Internals:<br />__________<br /><br /><br />Internal changes span two areas and the interface betweenthem :<br /><br /> . buffer manager layer<br /> . programming interface for scanner to call buffer manager<br /> . scanner layer<br /><br /> . buffer manager layer<br /> ____________________<br /><br /> changes comprise :<br/> . allocating, pinning , unpinning buffers<br /> this is complex and discussed briefly below in"Buffer Management"<br /> . acquiring and releasing a BufferAiocb, the control block<br /> associatedwith a single aio_read, and checking for its completion<br /> a new file, backend/storage/buffer/buf_async.c,provides three new functions,<br /> BufStartAsync BufReleaseAsync BufCheckAsync<br /> which handle this.<br /> . calling librt asynch io functions<br/> this follows the example of all other filesystem interfaces<br /> and is straightforward. <br /> two new functions are provided in fd.c:<br /> FileStartaio FileCompleteaio<br /> and corresponding interfaces in smgr.c<br /><br /> . programming interfacefor scanner to call buffer manager<br /> ________________________________________________________<br /> . calling interface for existing function PrefetchBuffer is modified :<br /> . one new argument, BufferAccessStrategystrategy<br /> . now returns an int return code which indicates :<br /> whether pin count on buffer has been increased by 1<br /> whether block was alreadypresent in a buffer<br /> . new function DiscardBuffer<br /> . discard buffer used for a previouslyprefetched page<br /> which scanner decides it does not want to read.<br /> . same argumentsas for PrefetchBuffer except for omission of BufferAccessStrategy<br /> . note - this is different fromthe existing function ReleaseBuffer<br /> in that ReleaseBuffer takes a buffer_descriptor as argument<br/> for a buffer which has been read, but has similar purpose.<br /><br /> . scanner layer<br/> _____________<br /> common to all scanners is that the scanner which wishes to prefetch must do twothings:<br /> . decide which pages to prefetch and call PrefetchBuffer to prefetch them<br /> nodeBitmapHeapscan already does this (but note one extra argument on PrefetchBuffer)<br /> . remember which pages it has prefetched in some list (actual or conceptual, e.g. a page range),<br /> removingeach page from this list if and when it subsequently reads the page.<br /> . at end of scan, call DiscardBufferfor every remembered (i.e. prefetched not unread) page<br /> how this list of prefetched pages is implementedvaries for each of the three scanners and four scan types:<br /> . bitmap index scan - heap pages<br/> . non-bitmap (i.e. simple) index scans - index pages<br /> . non-bitmap (i.e. simple)index scans - heap pages<br /> . simple heap scans<br /> The consequences of forgetting to callDiscardBuffer on a prefetched but unread page are:<br /> . counted in aio_read_forgot (see "Statistics"above)<br /> . may incur an annoying but harmless warning in the pg_log "Buffer Leak ... "<br /> (the buffer is released at commit)<br /> This does sometimes happen ...<br /> <br /><br /><br/>Buffer Management<br />_________________<br /><br />With async io, PrefetchBuffer must allocate and pin a buffer, which is relatively straightforward,<br />but also every other part of buffer manager must know about the possibilitythat a buffer may be in<br />a state of async_io_in_progress state and be prepared to determine the possible completion.<br/>That is, one backend BK1 may start the io but another BK2 may try to read it before BK1 does.<br />PosixAsynchronous IO provides a means for waiting on this or another task's read if in progress,<br />namely aio_suspend(), which this extension uses. Therefore, although StartBufferIO and TerminateBufferIO<br />are called aspart of asynchronous prefetching, their role is limited to maintaining the buffer descriptor flags,<br />and they donot track the asynchronous IO itself. Instead, asynchronous IOs are tracked in<br />a separate set of shared controlblocks, the BufferAiocb list -<br />refer to include/storage/buf_internals.h<br />Checking asynchronous io statusis handled in backend/storage/buffer/buf_async.c BufCheckAsync function.<br />Read the commentary for this functionfor more details.<br /><br />Pinning and unpinning of buffers is the most complex aspect of asynch io prefetching,<br/>and the logic is spread throughout BufStartAsync , BufCheckAsync , and many functions in bufmgr.c.<br />Whena backend BK2 requests ReadBuffer of a page for which asynch read is in progress,<br />buffer manager has to determinewhich backend BK1 pinned this buffer during previous PrefetchBuffer,<br />and for example must not be re-pinneda second time if BK2 is BK1.<br />Information concerning which backend initiated the prefetch is held in the BufferAiocb.<br/><br />The trickiest case concerns the scenario in which :<br /> . BK1 initiates prefetch and acquiresa pin<br /> . BK2 possibly waits for completion and then reads the buffer, and perhaps later on<br /> releases it by ReleaseBuffer.<br /> . Since the asynchronous IO is no longer in progress, there is no longerany<br /> BufferAiocb associated with it. Yet buffer manager must remember that BK1 holds a<br /> "prefetch" pin, i.e. a pin which must not be repeated if and when BK1 finally issues ReadBuffer.<br /> . Thesolution to this problem is to invent the concept of a "banked" pin,<br /> which is a pin obtained when prefetchwas issued, identied as in "banked" status only if and when<br /> the associated asynchronous IO terminates, and redeemable by the next use by same task,<br /> either by ReadBuffer or DiscardBuffer.<br /> Thepid of the backend which holds a banked pin on a buffer (there can be at most one such backend)<br /> is stored inthe buffer descriptor.<br /> This is done without increasing size of the buffer descriptor, which is important since<br/> there may be a very large number of these. This does overload the relevant field in the descriptor.<br/> Refer to include/storage/buf_internals.h for more details<br /> and search for BM_AIO_PREFETCH_PIN_BANKEDin storage/buffer/bufmgr.c and backend/storage/buffer/buf_async.c<br /><br />______________________________________________________________________________<br/>The following 43 files are changed inthis feature (output of the patch command) :<br /><br />patching file configure.in<br />patching file contrib/pg_stat_statements/pg_stat_statements--1.3.sql<br/>patching file contrib/pg_stat_statements/Makefile<br />patchingfile contrib/pg_stat_statements/pg_stat_statements.c<br />patching file contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql<br/>patching file config/c-library.m4<br />patching file src/backend/postmaster/postmaster.c<br/>patching file src/backend/executor/nodeBitmapHeapscan.c<br />patching file src/backend/executor/nodeIndexscan.c<br/>patching file src/backend/executor/instrument.c<br />patching file src/backend/storage/buffer/Makefile<br/>patching file src/backend/storage/buffer/bufmgr.c<br />patching file src/backend/storage/buffer/buf_async.c<br/>patching file src/backend/storage/buffer/buf_init.c<br />patching file src/backend/storage/smgr/md.c<br/>patching file src/backend/storage/smgr/smgr.c<br />patching file src/backend/storage/file/fd.c<br/>patching file src/backend/storage/lmgr/proc.c<br />patching file src/backend/access/heap/heapam.c<br/>patching file src/backend/access/heap/syncscan.c<br />patching file src/backend/access/index/indexam.c<br/>patching file src/backend/access/index/genam.c<br />patching file src/backend/access/nbtree/nbtsearch.c<br/>patching file src/backend/access/nbtree/nbtinsert.c<br />patching file src/backend/access/nbtree/nbtpage.c<br/>patching file src/backend/access/nbtree/nbtree.c<br />patching file src/backend/nodes/tidbitmap.c<br/>patching file src/backend/utils/misc/guc.c<br />patching file src/backend/utils/mmgr/aset.c<br/>patching file src/include/executor/instrument.h<br />patching file src/include/storage/bufmgr.h<br/>patching file src/include/storage/smgr.h<br />patching file src/include/storage/fd.h<br/>patching file src/include/storage/buf_internals.h<br />patching file src/include/catalog/pg_am.h<br/>patching file src/include/catalog/pg_proc.h<br />patching file src/include/pg_config_manual.h<br/>patching file src/include/access/nbtree.h<br />patching file src/include/access/heapam.h<br/>patching file src/include/access/relscan.h<br />patching file src/include/nodes/tidbitmap.h<br/>patching file src/include/utils/rel.h<br />patching file src/include/pg_config.h.in<br/><br /><br />Future Possibilities:<br />____________________<br /><br />There are several possibleextensions of this feature :<br /> . Extend prefetching of index scans to types of index<br /> other thanB-tree.<br /> This should be fairly straightforward, but requires some<br /> good base of benchmarkableworkloads to prove the value.<br /> . Investigate why asynchronous IO prefetching does not greatly<br /> improve sequential relation heap scans and possibly find how to<br /> achieve a benefit.<br /> . Buildknowledge of asycnhronous IO prefetching into the<br /> Query Planner costing.<br /> This is far from straightforward. The Postgresql Query Planner's<br /> costing model is based on resource consumption rather thanelapsed time.<br /> Use of asynchronous IO prefetching is intended to improve elapsed time<br /> as the expenseof (probably) higher resource consumption.<br /> Although Costing understands about the reduced cost of readingbuffered<br /> blocks, it does not take asynchronicity or overlap of CPU with disk<br /> into account. A naive approach might be to try to tweak the Query<br /> Planner's Cost Constant configuration parameters<br/> such as seq_page_cost , random_page_cost<br /> but this is hazardous as explained in the Documentation.<br/><br /><br /><br />John Lumby, johnlumby(at)hotmail(dot)com<br /><br /></div>
pgsql-hackers by date: