Re: Synchronized Scan update - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Synchronized Scan update |
Date | |
Msg-id | 1173818382.23455.183.camel@dogma.v10.wvs Whole thread Raw |
In response to | Re: Synchronized Scan update ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: Synchronized Scan update
|
List | pgsql-hackers |
On Tue, 2007-03-13 at 19:24 +0000, Simon Riggs wrote: > > The advantage of sync_scan_offset is that, in some situations, a second > > scan can actually finish faster than if it were the only query > > executing, because a previous scan has already caused some blocks to be > > cached. However, 16 is a small number because that benefit would only be > > once per scan, and sync scans are only helpful on large tables. > > Alright, understood. That last part is actually something I now want to > avoid because it's using the current cache-spoiling behaviour of > seqscans to advantage. I'd like to remove that behaviour, but it sounds > like we can have both > - SeqScans that don't spoil cache > - Synch scans > by setting "sync_scan_offset" to zero. > Precisely. If there is a cache-spoiling effect of the OS buffer cache that we want to take advantage of, we could still set it to a non-zero value. But the utility of sync_scan_offset does decrease with your patch, so removing it altogether is a possibility (hopefully the numbers will tell us what to do). > > Do you have an opinion about sync_scan_threshold versus a simple > > sync_scan_enable? > > enable_sync_scan? > After looking at other GUC names, I suggest that it's either "sync_scan" (for on/off) or "sync_scan_threshold" (if we do want to allow a numerical threshold). All the GUCs beginning with "enable_" are planner settings. If we only allow on/off, we could probably just sync scan every table because of your recycle_buffers patch. > > > I'd still like to be able to trace each scan to see how far ahead/behind > > > it is from the other scans on the same table, however we do that. > > > > > > Any backend can read the position of other backend's scans, so it should > > > > Where is that information stored? Right now my patch will overwrite the > > hints of other backends, because I'm using a static data structure > > (rather than one that grows). I do this to avoid the need for locking. > > OK, well, we can still read it before we overwrite it to calc the > difference. That will at least allow us to get a difference between > points as we go along. That seems like its worth having, even if it > isn't accurate for 3+ concurrent scans. Let me know if the things I list below don't cover what the information you're looking for here. It would be easy for me to emit a log message at the time it's overwriting the hint, but that would be a lot of noise: every time ss_report_loc() is called, which we discussed would be once per 100 pages read per scan. > > > be easy enough to put in a regular LOG entry that shows how far > > > ahead/behind they are from other scans. We can trace just one backend > > > and have it report on where it is with respect to other backends, or you > > > could have them all calculate their position and have just the lead scan > > > report the position of all other scans. > > > > > > > I already have each backend log it's progression through the tablescan > > every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I > > currently use this information to see whether scans are staying together > > or not. I think this gives us the information we need without backends > > needing to communicate the information during execution. > > Well, that is good, thank you for adding that after initial discussions. > > Does it have the time at which a particular numbered block is reached? > (i.e. Block #117 is not the same thing as the 117th block scanned). We > can use that to compare the time difference of each scan. Right now it logs when a scan starts, what block number of the table it starts on, and also prints out the current block it's scanning every N blocks (100k or 10k depending on debug level). The time and the pid are, of course, available from log_prefix. I'll add the table OID to each log message in case we test, for example, a single backend scanning multiple tables at once. I'll also clean it up a bit, so that the information is a little easier to grep out of the postgres logfiles and easier to analyze/graph. Regards,Jeff Davis
pgsql-hackers by date: