Re: Synchronized Scan update - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Synchronized Scan update
Date
Msg-id 1173818382.23455.183.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Synchronized Scan update  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: Synchronized Scan update
List pgsql-hackers
On Tue, 2007-03-13 at 19:24 +0000, Simon Riggs wrote:
> > The advantage of sync_scan_offset is that, in some situations, a second
> > scan can actually finish faster than if it were the only query
> > executing, because a previous scan has already caused some blocks to be
> > cached. However, 16 is a small number because that benefit would only be
> > once per scan, and sync scans are only helpful on large tables.
> 
> Alright, understood. That last part is actually something I now want to
> avoid because it's using the current cache-spoiling behaviour of
> seqscans to advantage. I'd like to remove that behaviour, but it sounds
> like we can have both
> - SeqScans that don't spoil cache
> - Synch scans
> by setting "sync_scan_offset" to zero.
> 

Precisely.

If there is a cache-spoiling effect of the OS buffer cache that we want
to take advantage of, we could still set it to a non-zero value. But the
utility of sync_scan_offset does decrease with your patch, so removing
it altogether is a possibility (hopefully the numbers will tell us what
to do).


> > Do you have an opinion about sync_scan_threshold versus a simple
> > sync_scan_enable?
> 
> enable_sync_scan?
> 

After looking at other GUC names, I suggest that it's either
"sync_scan" (for on/off) or "sync_scan_threshold" (if we do want to
allow a numerical threshold). All the GUCs beginning with "enable_" are
planner settings.

If we only allow on/off, we could probably just sync scan every table
because of your recycle_buffers patch.

> > > I'd still like to be able to trace each scan to see how far ahead/behind
> > > it is from the other scans on the same table, however we do that.
> > > 
> > > Any backend can read the position of other backend's scans, so it should
> > 
> > Where is that information stored? Right now my patch will overwrite the
> > hints of other backends, because I'm using a static data structure
> > (rather than one that grows). I do this to avoid the need for locking.
> 
> OK, well, we can still read it before we overwrite it to calc the
> difference. That will at least allow us to get a difference between
> points as we go along. That seems like its worth having, even if it
> isn't accurate for 3+ concurrent scans.

Let me know if the things I list below don't cover what the information
you're looking for here. It would be easy for me to emit a log message
at the time it's overwriting the hint, but that would be a lot of noise:
every time ss_report_loc() is called, which we discussed would be once
per 100 pages read per scan. 

> > > be easy enough to put in a regular LOG entry that shows how far
> > > ahead/behind they are from other scans. We can trace just one backend
> > > and have it report on where it is with respect to other backends, or you
> > > could have them all calculate their position and have just the lead scan
> > > report the position of all other scans.
> > > 
> > 
> > I already have each backend log it's progression through the tablescan
> > every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
> > currently use this information to see whether scans are staying together
> > or not. I think this gives us the information we need without backends
> > needing to communicate the information during execution.
> 
> Well, that is good, thank you for adding that after initial discussions.
> 
> Does it have the time at which a particular numbered block is reached?
> (i.e. Block #117 is not the same thing as the 117th block scanned). We
> can use that to compare the time difference of each scan.

Right now it logs when a scan starts, what block number of the table it
starts on, and also prints out the current block it's scanning every N
blocks (100k or 10k depending on debug level). The time and the pid are,
of course, available from log_prefix.

I'll add the table OID to each log message in case we test, for example,
a single backend scanning multiple tables at once.

I'll also clean it up a bit, so that the information is a little easier
to grep out of the postgres logfiles and easier to analyze/graph.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: 8.1.x (tested 8.1.8) timezone bugs
Next
From: "Simon Riggs"
Date:
Subject: Re: Synchronized Scan update