Thread: Wildly erratic query performance
I've got a particular query that is giving me ridiculously erratic query performance. I have the SQL in a pgadmin query window, and from one execution to another, with no changes, the time it takes varies from half a second to, well, at least 10 minutes or so at which point I give up an cancel the query. A typical time is 2-3 seconds, but it's all over the map. I've seen numbers like 112 seconds for one which returns without exceeding my patience. In every half a dozen or so execution there will be one time which is an order of magnitude bigger than the others. A typical series of executions might be something like 2 seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds. Note that the database is running on my local machine, the same machine I'm running the queries from, and nothing else is using this postgresql installation. The data in the database is also not changing--there are no inserts or updates happening between queries. I ran a vaccuum (full, analyze) just before I trying these queries. I do monitor my CPU usage and there is definitely not some other process on my machine sucking up all the cpu cycles now and then to explain this. This is postgreslq 8.3, on Windows XP. The query joins about 17 tables (without an explicit JOIN, just using the WHERE criteria) with a few further predicates. One thing which distinguishes it from other similar queries I've been doing where I haven't seen this odd erraticness is there are 2 predicates ORred together (and then ANDed with all the other conditions which are all ANDed) which effectively divides 2 subsets of joined tables which are not joined to each other, but both joined to another set of tables. (I don't know if that was a comprehensible way of explaining this...but I don't know if it's relevant enough to be worth explaining in more detail). I've tried running explain, however the wild erraticness seems to go away when I use explain, taking in the ballpark of 1.5 seconds every time. This is faster than my average query time using a plain execute, even if I don't discount all the unusually long times. Is there any reasonable explanation for this phenomena? I do realize I could help the query planner with explicit JOINs, however I have not yet embarked on this optimization, and might not bother if the query performance is acceptable without doing so. I don't expect the execution plan to be optimal, however I do expect it to be deterministic. Thanks, Eric
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Eric Schwarzenbach > Sent: Friday, October 31, 2008 12:35 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Wildly erratic query performance > > I've got a particular query that is giving me ridiculously erratic > query > performance. I have the SQL in a pgadmin query window, and from one > execution to another, with no changes, the time it takes varies from > half a second to, well, at least 10 minutes or so at which point I give > up an cancel the query. A typical time is 2-3 seconds, but it's all > over > the map. I've seen numbers like 112 seconds for one which returns > without exceeding my patience. In every half a dozen or so execution > there will be one time which is an order of magnitude bigger than the > others. A typical series of executions might be something like 2 > seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds. > > Note that the database is running on my local machine, the same machine > I'm running the queries from, and nothing else is using this postgresql > installation. The data in the database is also not changing--there are > no inserts or updates happening between queries. I ran a vaccuum (full, > analyze) just before I trying these queries. I do monitor my CPU usage > and there is definitely not some other process on my machine sucking up > all the cpu cycles now and then to explain this. > > This is postgreslq 8.3, on Windows XP. The query joins about 17 tables > (without an explicit JOIN, just using the WHERE criteria) with a few > further predicates. One thing which distinguishes it from other similar > queries I've been doing where I haven't seen this odd erraticness is > there are 2 predicates ORred together (and then ANDed with all the > other > conditions which are all ANDed) which effectively divides 2 subsets of > joined tables which are not joined to each other, but both joined to > another set of tables. (I don't know if that was a comprehensible way > of > explaining this...but I don't know if it's relevant enough to be worth > explaining in more detail). > > I've tried running explain, however the wild erraticness seems to go > away when I use explain, taking in the ballpark of 1.5 seconds every > time. This is faster than my average query time using a plain execute, > even if I don't discount all the unusually long times. > > Is there any reasonable explanation for this phenomena? > > I do realize I could help the query planner with explicit JOINs, > however > I have not yet embarked on this optimization, and might not bother if > the query performance is acceptable without doing so. I don't expect > the > execution plan to be optimal, however I do expect it to be > deterministic. Something is missing from your descriptions. An explain analyze on the query and a list of the schema for the relevant tables would be helpful. Are the queries identical? Just changing the where clause a bit can cause big differences in query speed. Consider: SELECT age, status, phone FROM work_force WHERE state IN ('NY', 'CA', 'TX'); May run more slowly than: SELECT age, status, phone FROM work_force WHERE state IN ('ID', 'MT', 'NV'); Because the first three states have large populations and the last three states have smaller populations. Does the database machine run solely as a database server or are there other things going on? E.g. If you are doing a compile and link of 10,000 source files during one query and the machine is otherwise idle during a different one, we will expect different results. There will be (of course) a logical explanation for the query time differences. I suggest the following: 1. Do an explain analyze on a query that is slow 2. Do an explain analyze on a query that is fast (unless the queries are literally identical in every way) I guess that (along with the schema) will be enough to get an idea what is happening.
Dann, Thanks for your response. I thought I'd covered most of what your are asking in my first message, but these results are weird enough that I can understand you might not give me the benefit of the doubt and without very explicit confirmation. To answer your questions: YES the query each time is IDENTICAL. I am not changing a single character, I am simply clicking the execute button in pgadmin for each run. This is my personal laptop, so of course there are other processes running, like say, a web browser, an email client, etc. And of course, as on any such machine there may be minor processes that execute in the background without my awareness. But I am not actively doing anything else (like running a large compile, yeesh!) while the query is executing. This is why I noted that I was monitoring the CPU usage (and processes), so that I can be confident that something major is not suddenly running in the background without my initiating it directly. And to make the difference between 2 seconds and a minute, let alone 10 minutes, would take a pretty major and hard not to notice process. As I explained already (no pun intended) running the query using EXPLAIN makes the wild variation go away. So I cannot get explain results for a fast and for a slow execution. I did not include schema information and such because I am not clear I am allowed to make them public, and because I'm not looking for a highly specific answer, merely are there ANY conditions where the SAME EXACT QUERY should perform so radically differently. If the query planner, for example, used a random number generator to choose the order in which it performed my joins, such that the join order would be different each time, this would explain it--that possibility would seem bizarre to me, but it would certainly answer my question. Eric Dann Corbit wrote: >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- >> owner@postgresql.org] On Behalf Of Eric Schwarzenbach >> Sent: Friday, October 31, 2008 12:35 PM >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] Wildly erratic query performance >> >> I've got a particular query that is giving me ridiculously erratic >> query >> performance. I have the SQL in a pgadmin query window, and from one >> execution to another, with no changes, the time it takes varies from >> half a second to, well, at least 10 minutes or so at which point I >> > give > >> up an cancel the query. A typical time is 2-3 seconds, but it's all >> over >> the map. I've seen numbers like 112 seconds for one which returns >> without exceeding my patience. In every half a dozen or so execution >> there will be one time which is an order of magnitude bigger than the >> others. A typical series of executions might be something like 2 >> seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds. >> >> Note that the database is running on my local machine, the same >> > machine > >> I'm running the queries from, and nothing else is using this >> > postgresql > >> installation. The data in the database is also not changing--there are >> no inserts or updates happening between queries. I ran a vaccuum >> > (full, > >> analyze) just before I trying these queries. I do monitor my CPU usage >> and there is definitely not some other process on my machine sucking >> > up > >> all the cpu cycles now and then to explain this. >> >> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables >> (without an explicit JOIN, just using the WHERE criteria) with a few >> further predicates. One thing which distinguishes it from other >> > similar > >> queries I've been doing where I haven't seen this odd erraticness is >> there are 2 predicates ORred together (and then ANDed with all the >> other >> conditions which are all ANDed) which effectively divides 2 subsets of >> joined tables which are not joined to each other, but both joined to >> another set of tables. (I don't know if that was a comprehensible way >> of >> explaining this...but I don't know if it's relevant enough to be worth >> explaining in more detail). >> >> I've tried running explain, however the wild erraticness seems to go >> away when I use explain, taking in the ballpark of 1.5 seconds every >> time. This is faster than my average query time using a plain execute, >> even if I don't discount all the unusually long times. >> >> Is there any reasonable explanation for this phenomena? >> >> I do realize I could help the query planner with explicit JOINs, >> however >> I have not yet embarked on this optimization, and might not bother if >> the query performance is acceptable without doing so. I don't expect >> the >> execution plan to be optimal, however I do expect it to be >> deterministic. >> > > Something is missing from your descriptions. > An explain analyze on the query and a list of the schema for the > relevant tables would be helpful. > Are the queries identical? Just changing the where clause a bit can > cause big differences in query speed. > > Consider: > SELECT age, status, phone FROM work_force WHERE state IN ('NY', 'CA', > 'TX'); > May run more slowly than: > SELECT age, status, phone FROM work_force WHERE state IN ('ID', 'MT', > 'NV'); > Because the first three states have large populations and the last three > states have smaller populations. > > Does the database machine run solely as a database server or are there > other things going on? E.g. If you are doing a compile and link of > 10,000 source files during one query and the machine is otherwise idle > during a different one, we will expect different results. > > There will be (of course) a logical explanation for the query time > differences. > > I suggest the following: > 1. Do an explain analyze on a query that is slow > 2. Do an explain analyze on a query that is fast (unless the queries > are literally identical in every way) > I guess that (along with the schema) will be enough to get an idea what > is happening. > >
On Fri, Oct 31, 2008 at 1:34 PM, Eric Schwarzenbach <subscriber@blackbrook.org> wrote: > I've got a particular query that is giving me ridiculously erratic query > performance. I have the SQL in a pgadmin query window, and from one > execution to another, with no changes, the time it takes varies from SNIP > This is postgreslq 8.3, on Windows XP. The query joins about 17 tables > (without an explicit JOIN, just using the WHERE criteria) with a few OK, whether you use join syntax or where clause syntax, postgresql can attempt to use the GEQO method to determine a close fit for the query plan. You can change these settings: #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 To control the GEQO. Just crank the threshold to 20 or something so it doesn't kick in for now and see how long your queries take. The planning time will go up, because pgsql will do exhaustive logic to determine the best plan, but it should consistently pick a good plan. and look at these too: #from_collapse_limit = 8 #join_collapse_limit = 8
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Fri, Oct 31, 2008 at 1:34 PM, Eric Schwarzenbach > <subscriber@blackbrook.org> wrote: >> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables >> (without an explicit JOIN, just using the WHERE criteria) with a few > OK, whether you use join syntax or where clause syntax, postgresql can > attempt to use the GEQO method to determine a close fit for the query > plan. You can change these settings: > #geqo = on > #geqo_threshold = 12 > #geqo_effort = 5 # range 1-10 > To control the GEQO. Just crank the threshold to 20 or something so > it doesn't kick in for now and see how long your queries take. The > planning time will go up, because pgsql will do exhaustive logic to > determine the best plan, but it should consistently pick a good plan. The exhaustive search's time can be exponential in the number of tables to be joined, so the above advice might or might not be workable. If you find that planning takes too long when you disable geqo or bump up the threshold, an alternative possibility is to kick up the geqo_effort parameter to make it more likely that the randomized search will find a decent plan. > and look at these too: > #from_collapse_limit = 8 > #join_collapse_limit = 8 If the query is given in the form of a "flat" FROM-list of 17 tables, neither of those knobs will affect anything. regards, tom lane
On Fri, Oct 31, 2008 at 04:36:02PM -0400, Eric Schwarzenbach wrote: > As I explained already (no pun intended) running the query using EXPLAIN > makes the wild variation go away. So I cannot get explain results for a > fast and for a slow execution. EXPLAIN only determines and outputs the query plan. It does not actually run the query. You probably want to use EXPLAIN ANALYZE