Re: Synchronized Scan update - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronized Scan update |
Date | |
Msg-id | 1173705670.3641.600.camel@silverbirch.site Whole thread Raw |
In response to | Re: Synchronized Scan update (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Synchronized Scan update
Re: Synchronized Scan update |
List | pgsql-hackers |
On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote: > Is there any consensus about whether to include these two parameters as > GUCs or constants if my patch is to be accepted? > > (1) sync_scan_threshold: Use synchronized scanning for tables greater > than this many pages; smaller tables will not be affected. > (2) sync_scan_offset: Start a new scan this many pages before a > currently running scan to take advantage of the pages > that are likely already in cache. > > Right now they are just constants defined in a header, but a GUC might > make sense. I'd like to know which version is more acceptable when I > submit my final patch. I'm looking at ways of helping Synch Scan and scan_recycle_buffers to work well together. I've had a look through the synch scan patch in more detail to help this along. ISTM they can play nicely together. Results from tests show that with a single scan we get better performance with scan_recycle_buffers = ~32. That gives a +25% performance bonus. Synch scan has the capability to reduce I/O by 50% across two concurrent scans, so that would be likely to outweigh the gain from L2 cache reduction, unless we can get both together. It's possible that multiple scans would use the same CPU, or at least use multiple cores on the same chip and hence same L2 cache. That's more likely if the synch scan works well, since the second scan can process the buffer while the lead process performs I/O. So its likely that on dual and quad core CPUs that we'll be able to get both benefits in most cases. So based on those thoughts, sync_scan_offset should be fixed at 16, rather than being variable. In addition, ss_report_loc() should only report its position every 16 blocks, rather than do this every time, which will reduce overhead of this call. To match that, scan_recycle_buffers should be fixed at 32. So GUCs for sync_scan_offset and scan_recycle_buffers would not be required at all. IMHO we can also remove sync_scan_threshold and just use NBuffers instead. That way we get the benefit of both patches or neither, making it easier to understand what's going on. On my patch, I'll make buffer recycling only work for SeqScans and VACUUMs. If need be, the value of scan_recycle_buffers can be varied upwards should the scans drift apart, as a way of bringing them back together. We aren't tracking whether they are together or apart, so I would like to see some debug output from synch scans to allow us to assess how far behind the second scan is as it progresses. e.g. LOG: synch scan currently on block N, trailing pathfinder by M blocks issued every 128 blocks as we go through the scans. Thoughts? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: