Thread: New/Revised TODO? Gathering actual read performance data for use by planner
New/Revised TODO? Gathering actual read performance data for use by planner
From
Michael Nolan
Date:
In the TODO list is this item:<br /><br /><b>Modify the planner to better estimate caching effects <br /></b><br />Tom mentionedthis in his presentation at PGCON, and I also chatted with Tom about it briefly afterwards.<br /><br />Based onlast year's discussion of this TODO item, it seems thoughts have been focused on estimating how much data is<br /> beingsatisfied from PG's shared buffers. However, I think that's only part of the problem. <br /><br />Specifically, readperformance is going to be affected by:<br /><br />1. Reads fulfilled from shared buffers.<br /> 2. Reads fulfilledfrom system cache.<br />3. Reads fulfilled from disk controller cache.<br />4. Reads from physical media.<br /><br/>#4 is further complicated by the type of physical media for that specific block. For example, reads that can<br />be fulfilled from a SSD are going to be much faster than ones that access hard drives (or even slower types of media.)<br/><br />System load is going to impact all of these as well.<br /><br />Therefore, I suggest that an alternativeto the above TODO may be to gather performance data without knowing <br /> (or more importantly without needingto know) which of the above sources fulfilled the read. <br /><br />This data would probably need to be kept separatelyfor each table or index, as some tables or indexes <br />may be mostly or fully in cache or on faster physicalmedia than others, although in the absence of other <br /> data about a specific table or index, data about otherrelations in the same tablespace might be of some use. <br /><br />Tom mentioned that the cost of doing multiple systemtime-of-day calls for each block read might be <br /> prohibitive, it may also be that the data may also be too coarseon some systems to be truly useful <br />(eg, the epoch time in seconds.) <br /><br />If this data were available,that could mean that successive plans for the same query could have <br /> significantly different plans (and thusactual performance), based on what has happened recently, <br />so these statistics would have to be relatively shortterm and updated frequently, but without becoming <br />computational bottlenecks. <br /><br />The problem is one I'minterested in working on.<br />--<br />Mike Nolan<br />
Re: New/Revised TODO? Gathering actual read performance data for use by planner
From
Greg Smith
Date:
Michael Nolan wrote: > Based on last year's discussion of this TODO item, it seems thoughts > have been focused on estimating how much data is > being satisfied from PG's shared buffers. However, I think that's > only part of the problem. Sure, but neither it nor what you're talking about are the real gating factor on making an improvement here. Figuring out how to expose all this information to the optimizer so it can use it when planning is the hard part. Collecting a read time profile is just one of the many ways you can estimate what's in cache, and each of the possible methods has good and bad trade-offs. I've been suggesting that people assume that's a solved problem--I'm pretty sure what you're proposing was done by Greg Stark once and a prototype built even--and instead ask what you're going to do next if you had this data. > This data would probably need to be kept separately for each table or > index, as some tables or indexes > may be mostly or fully in cache or on faster physical media than > others, although in the absence of other > data about a specific table or index, data about other relations in > the same tablespace might be of some use. This is the important part. Model how the data needs to get stored such that the optimizer can make decisions using it, and I consider it easy to figure out how it will get populated later. There are actually multiple ways to do it, and it may end up being something people plug-in an implementation that fits their workload into, rather than just having one available. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Re: New/Revised TODO? Gathering actual read performance data for use by planner
From
Robert Haas
Date:
On Wed, May 25, 2011 at 10:17 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> This data would probably need to be kept separately for each table or >> index, as some tables or indexes >> may be mostly or fully in cache or on faster physical media than others, >> although in the absence of other >> data about a specific table or index, data about other relations in the >> same tablespace might be of some use. > > This is the important part. Model how the data needs to get stored such > that the optimizer can make decisions using it, and I consider it easy to > figure out how it will get populated later. I basically agree. There have been several recent discussions of this topic on both -hackers and -performance; it is likely that the TODO needs to be updated with some more recent links. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: New/Revised TODO? Gathering actual read performance data for use by planner
From
Michael Nolan
Date:
On Wed, May 25, 2011 at 11:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Anything to help the NKOTB to get up to speed would be appreciated, though I still think it is not just a 'caching' issue.
The question I hesitated to ask in Ottawa was: So, what information would you like and what would you do with it?
--
Mike Nolan
I basically agree. There have been several recent discussions of this
topic on both -hackers and -performance; it is likely that the TODO
needs to be updated with some more recent links.
Anything to help the NKOTB to get up to speed would be appreciated, though I still think it is not just a 'caching' issue.
The question I hesitated to ask in Ottawa was: So, what information would you like and what would you do with it?
--
Mike Nolan
Re: New/Revised TODO? Gathering actual read performance data for use by planner
From
Robert Haas
Date:
On Wed, May 25, 2011 at 8:37 PM, Michael Nolan <htfoot@gmail.com> wrote: > On Wed, May 25, 2011 at 11:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> I basically agree. There have been several recent discussions of this >> topic on both -hackers and -performance; it is likely that the TODO >> needs to be updated with some more recent links. > > Anything to help the NKOTB to get up to speed would be appreciated, though I > still think it is not just a 'caching' issue. > > The question I hesitated to ask in Ottawa was: So, what information would > you like and what would you do with it? *scratches head* I'm not sure I understand your question. Can you elaborate? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company