Re: Synchronized Scan update - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronized Scan update |
Date | |
Msg-id | 1173813886.3641.966.camel@silverbirch.site Whole thread Raw |
In response to | Re: Synchronized Scan update (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Synchronized Scan update
Re: Synchronized Scan update |
List | pgsql-hackers |
On Tue, 2007-03-13 at 11:28 -0700, Jeff Davis wrote: > On Tue, 2007-03-13 at 17:11 +0000, Simon Riggs wrote: > > On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote: > > > On Mon, 2007-03-12 at 13:21 +0000, Simon Riggs wrote: > > > > So based on those thoughts, sync_scan_offset should be fixed at 16, > > > > rather than being variable. In addition, ss_report_loc() should only > > > > report its position every 16 blocks, rather than do this every time, > > > > which will reduce overhead of this call. > > > > > > If we fix sync_scan_offset at 16, we might as well just get rid of it. > > > Sync scans are only useful on large tables, and getting a free 16 pages > > > over a scan isn't worth the trouble. However, even without > > > sync_scan_offset, > > > > Not sure what you mean by "a free 16 pages". Please explain? > > > > By "free" I mean already in cache, and therefore don't have to do I/O to > get it. I used the term loosely above, so let me re-explain: > > My only point was that 16 is essentially 0 when it comes to > sync_scan_offset, because it's a small number of blocks over the course > of the scan of a large table. > > If sync_scan_offset is 0, my patch will cause scans on a big table to > start where other scans are, and those scans should tend to stay > together and use newly-cached pages efficiently (and achieve the primary > goal of the patch). OK > The advantage of sync_scan_offset is that, in some situations, a second > scan can actually finish faster than if it were the only query > executing, because a previous scan has already caused some blocks to be > cached. However, 16 is a small number because that benefit would only be > once per scan, and sync scans are only helpful on large tables. Alright, understood. That last part is actually something I now want to avoid because it's using the current cache-spoiling behaviour of seqscans to advantage. I'd like to remove that behaviour, but it sounds like we can have both - SeqScans that don't spoil cache - Synch scans by setting "sync_scan_offset" to zero. > > > I like the idea of reducing tuning parameters, but we should, at a > > > minimum, still allow an on/off button for sync scans. My tests revealed > > > that the wrong combination of OS/FS/IO-Scheduler/Controller could result > > > in bad I/O behavior. > > > > Agreed > > > > Do you have an opinion about sync_scan_threshold versus a simple > sync_scan_enable? enable_sync_scan? > > I'd still like to be able to trace each scan to see how far ahead/behind > > it is from the other scans on the same table, however we do that. > > > > Any backend can read the position of other backend's scans, so it should > > Where is that information stored? Right now my patch will overwrite the > hints of other backends, because I'm using a static data structure > (rather than one that grows). I do this to avoid the need for locking. OK, well, we can still read it before we overwrite it to calc the difference. That will at least allow us to get a difference between points as we go along. That seems like its worth having, even if it isn't accurate for 3+ concurrent scans. > > be easy enough to put in a regular LOG entry that shows how far > > ahead/behind they are from other scans. We can trace just one backend > > and have it report on where it is with respect to other backends, or you > > could have them all calculate their position and have just the lead scan > > report the position of all other scans. > > > > I already have each backend log it's progression through the tablescan > every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I > currently use this information to see whether scans are staying together > or not. I think this gives us the information we need without backends > needing to communicate the information during execution. Well, that is good, thank you for adding that after initial discussions. Does it have the time at which a particular numbered block is reached? (i.e. Block #117 is not the same thing as the 117th block scanned). We can use that to compare the time difference of each scan. > I think I will increase the resolution of the scan progress so that we > can track every 5k or even 1k blocks read per pid per scan. That might > tell us more about the shared memory usage versus OS cache. > > Is there any other information you need reported? Not sure yet! I just want to look one level deeper, to see if everything is working like we think it should. > > I'd like to see the trace option to allow us to tell whether its working > > as well as we'd like it to pre-release and in production. Also I want to > > see whether various settings of scan_recycle_buffers help/hinder the > > effectiveness of synch scans, as others have worried it might. > > > > Can you tell me what you mean by trace option, if you mean something > different than tracking the relative positions of the scans? > > I will update my patch and send it along so that we can see how they > work together. Great -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: