Re: Synchronized Scan update - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronized Scan update
Date
Msg-id 1173813886.3641.966.camel@silverbirch.site
Whole thread Raw
In response to Re: Synchronized Scan update  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Synchronized Scan update
Re: Synchronized Scan update
List pgsql-hackers
On Tue, 2007-03-13 at 11:28 -0700, Jeff Davis wrote:
> On Tue, 2007-03-13 at 17:11 +0000, Simon Riggs wrote:
> > On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote:
> > > On Mon, 2007-03-12 at 13:21 +0000, Simon Riggs wrote:
> > > > So based on those thoughts, sync_scan_offset should be fixed at 16,
> > > > rather than being variable. In addition, ss_report_loc() should only
> > > > report its position every 16 blocks, rather than do this every time,
> > > > which will reduce overhead of this call.
> > > 
> > > If we fix sync_scan_offset at 16, we might as well just get rid of it.
> > > Sync scans are only useful on large tables, and getting a free 16 pages
> > > over a scan isn't worth the trouble. However, even without
> > > sync_scan_offset, 
> > 
> > Not sure what you mean by "a free 16 pages". Please explain?
> > 
> 
> By "free" I mean already in cache, and therefore don't have to do I/O to
> get it. I used the term loosely above, so let me re-explain:
> 
> My only point was that 16 is essentially 0 when it comes to
> sync_scan_offset, because it's a small number of blocks over the course
> of the scan of a large table.
> 
> If sync_scan_offset is 0, my patch will cause scans on a big table to
> start where other scans are, and those scans should tend to stay
> together and use newly-cached pages efficiently (and achieve the primary
> goal of the patch).

OK

> The advantage of sync_scan_offset is that, in some situations, a second
> scan can actually finish faster than if it were the only query
> executing, because a previous scan has already caused some blocks to be
> cached. However, 16 is a small number because that benefit would only be
> once per scan, and sync scans are only helpful on large tables.

Alright, understood. That last part is actually something I now want to
avoid because it's using the current cache-spoiling behaviour of
seqscans to advantage. I'd like to remove that behaviour, but it sounds
like we can have both
- SeqScans that don't spoil cache
- Synch scans
by setting "sync_scan_offset" to zero.

> > > I like the idea of reducing tuning parameters, but we should, at a
> > > minimum, still allow an on/off button for sync scans. My tests revealed
> > > that the wrong combination of OS/FS/IO-Scheduler/Controller could result
> > > in bad I/O behavior.
> > 
> > Agreed
> > 
> 
> Do you have an opinion about sync_scan_threshold versus a simple
> sync_scan_enable?

enable_sync_scan?

> > I'd still like to be able to trace each scan to see how far ahead/behind
> > it is from the other scans on the same table, however we do that.
> > 
> > Any backend can read the position of other backend's scans, so it should
> 
> Where is that information stored? Right now my patch will overwrite the
> hints of other backends, because I'm using a static data structure
> (rather than one that grows). I do this to avoid the need for locking.

OK, well, we can still read it before we overwrite it to calc the
difference. That will at least allow us to get a difference between
points as we go along. That seems like its worth having, even if it
isn't accurate for 3+ concurrent scans.

> > be easy enough to put in a regular LOG entry that shows how far
> > ahead/behind they are from other scans. We can trace just one backend
> > and have it report on where it is with respect to other backends, or you
> > could have them all calculate their position and have just the lead scan
> > report the position of all other scans.
> > 
> 
> I already have each backend log it's progression through the tablescan
> every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
> currently use this information to see whether scans are staying together
> or not. I think this gives us the information we need without backends
> needing to communicate the information during execution.

Well, that is good, thank you for adding that after initial discussions.

Does it have the time at which a particular numbered block is reached?
(i.e. Block #117 is not the same thing as the 117th block scanned). We
can use that to compare the time difference of each scan.

> I think I will increase the resolution of the scan progress so that we
> can track every 5k or even 1k blocks read per pid per scan. That might
> tell us more about the shared memory usage versus OS cache.
> 
> Is there any other information you need reported?

Not sure yet! I just want to look one level deeper, to see if everything
is working like we think it should.

> > I'd like to see the trace option to allow us to tell whether its working
> > as well as we'd like it to pre-release and in production. Also I want to
> > see whether various settings of scan_recycle_buffers help/hinder the
> > effectiveness of synch scans, as others have worried it might.
> > 
> 
> Can you tell me what you mean by trace option, if you mean something
> different than tracking the relative positions of the scans?
> 
> I will update my patch and send it along so that we can see how they
> work together. 

Great

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: 8.1.x (tested 8.1.8) timezone bugs
Next
From: Tom Lane
Date:
Subject: Re: 8.1.x (tested 8.1.8) timezone bugs