Thread: Query times change by orders of magnitude as DB ages
Dear All, Thanks for your help earlier with the previous question. I wonder if I might ask another. We have various queries that need to run, of which I'm going to focus on 2, "vox" and "du_report". Both of them are extremely sensitive to the precise values of random_page_cost and seq_page_cost. Experimentally, I've used: A: seq_page_cost = 0.25; random_page_cost = 0.75 B: seq_page_cost = 0.5; random_page_cost = 2 C: seq_page_cost = 1; random_page_cost = 4 (and a few in between). If I pick the wrong one, then either vox becomes 2 orders of magnitude slower (22ms -> 3.5 seconds), or du_report becomes 10x slower. I can't use the same setting for both. So, as a very ugly hack, I've tuned the sweet spots for each query. Vox normally sits at B; du_report at C. Now, the real killer is that the position of that sweet spot changes over time as the DB ages over a few days (even though autovacuum is on). Worse still, doing a cluster of most of the tables and vacuum full analyze made most of the queries respond much better, but the vox query became very slow again, until I set it to A (which, a few days ago, did not work well). * Why is the query planner so precisely sensitive to the combination of page costs and time since last vacuum full? * Why is it that what improves one query can make another get so much worse? * Is there any way I can nail the query planner to a particular query plan, rather than have it keep changing its mind? * Is it normal to keep having to tune the query-planner's settings, or should it be possible to set it once, and leave it? Tuning this feels rather like adjusting several old radios, which are exceptionally finicky about the precise settings, having a very sharp resonance peak (in different places), and which drift out of tune at different rates. I must be doing something wrong, but what? Thanks for your advice, Richard
Hello, > * Is there any way I can nail the query planner to a particular query plan, > rather than have it keep changing its mind? All these setting leads to choosing different plans. If you have small number of complex sensitive queires, you can run explain on them with correct settings, then re-order query (joins, subselects) according to given query plan, and, before running it, call set local join_collapse_limit = 1; set local from_collapse_limit = 1; This will prevent joins/subselects reordering inside current transaction block, leading to consistent plans. But that gives no 100% guarantee for chosing, for example, hash join over nested loop. You can, as noted in presiouse message, experiment with gego_* constants - especially, lower geqo_threshold to catch better plans (but this can take many runs). Or, for production, set geqo=off - this can dramatically increasy query planning, but results would be more consistent. >Is it normal to keep having to tune the query-planner's settings, or should it be possible to >set it once, and leave it? I have collapse limits set for some complex reporting queries, and think it's adequate solutuon. >Worse still, doing a cluster of most of the tables and vacuum full analyze made most of the queries >respond much better,but the vox query became very slow again, until I set it to A (which, a few days >ago, did not work well). Is your autovacuuming tuned correctly? For large tables, i set it running much more agressivly then in default install. Best regards, Sergey Aleynikov
On Sun, 22 Nov 2009, Richard Neill wrote: > Worse still, doing a cluster of most of the tables and vacuum full analyze Why are you doing a vacuum full? That command is not meant to be used except in the most unusual of circumstances, as it causes bloat to indexes. If you have run a cluster command, then running vacuum full will make the table and index layout worse, not better. Matthew -- Riker: Our memory pathways have become accustomed to your sensory input. Data: I understand - I'm fond of you too, Commander. And you too Counsellor
On Sun, Nov 22, 2009 at 10:31 AM, Richard Neill <rn214@cam.ac.uk> wrote: > Dear All, > > Thanks for your help earlier with the previous question. I wonder if I might > ask another. > > > We have various queries that need to run, of which I'm going to focus on 2, > "vox" and "du_report". > > Both of them are extremely sensitive to the precise values of > random_page_cost and seq_page_cost. Experimentally, I've used: > > A: seq_page_cost = 0.25; random_page_cost = 0.75 > B: seq_page_cost = 0.5; random_page_cost = 2 > C: seq_page_cost = 1; random_page_cost = 4 > > (and a few in between). > > > If I pick the wrong one, then either vox becomes 2 orders of magnitude > slower (22ms -> 3.5 seconds), or du_report becomes 10x slower. I can't use > the same setting for both. > > So, as a very ugly hack, I've tuned the sweet spots for each query. > Vox normally sits at B; du_report at C. > > > Now, the real killer is that the position of that sweet spot changes over > time as the DB ages over a few days (even though autovacuum is on). > > Worse still, doing a cluster of most of the tables and vacuum full analyze > made most of the queries respond much better, but the vox query became very > slow again, until I set it to A (which, a few days ago, did not work well). > > > * Why is the query planner so precisely sensitive to the combination of > page costs and time since last vacuum full? It sounds like your tables are getting bloated. If you have autovacuum turned on, this shouldn't be happening. What sort of workload is this? What PG version? > * Why is it that what improves one query can make another get so much worse? Because it changes the plan you get. > * Is there any way I can nail the query planner to a particular query plan, > rather than have it keep changing its mind? See other responses. > * Is it normal to keep having to tune the query-planner's settings, or > should it be possible to set it once, and leave it? Leave it. ...Robert
Matthew Wakeling wrote: > On Sun, 22 Nov 2009, Richard Neill wrote: >> Worse still, doing a cluster of most of the tables and vacuum full >> analyze > > Why are you doing a vacuum full? That command is not meant to be used > except in the most unusual of circumstances, as it causes bloat to indexes. We'd left it too long, and the DB was reaching 90% of disk space. I didn't realise that vacuum full was ever actively bad, only sometimes unneeded. I do now - thanks for the tip. > > If you have run a cluster command, then running vacuum full will make > the table and index layout worse, not better. > So, having managed to bloat the indexes in this way, what can I do to fix it? Will a regular vacuum do the job? Richard
Matthew Wakeling wrote: > On Sun, 22 Nov 2009, Richard Neill wrote: >> Worse still, doing a cluster of most of the tables and vacuum full >> analyze > > Why are you doing a vacuum full? That command is not meant to be used > except in the most unusual of circumstances, as it causes bloat to indexes. We'd left it too long, and the DB was reaching 90% of disk space. I didn't realise that vacuum full was ever actively bad, only sometimes unneeded. I do now - thanks for the tip. > > If you have run a cluster command, then running vacuum full will make > the table and index layout worse, not better. > So, having managed to bloat the indexes in this way, what can I do to fix it? Will a regular vacuum do the job? Richard
On Wed, 25 Nov 2009, Richard Neill wrote: >> On Sun, 22 Nov 2009, Richard Neill wrote: >>> Worse still, doing a cluster of most of the tables and vacuum full analyze >> >> Why are you doing a vacuum full? That command is not meant to be used >> except in the most unusual of circumstances, as it causes bloat to indexes. > > We'd left it too long, and the DB was reaching 90% of disk space. I > didn't realise that vacuum full was ever actively bad, only sometimes > unneeded. I do now - thanks for the tip. The problem is that vacuum full does a full compact of the table, but it has to update all the indexes as it goes. This makes it slow, and causes bloat to the indexes. There has been some discussion of removing the command or at least putting a big warning next to it. > So, having managed to bloat the indexes in this way, what can I do to > fix it? Will a regular vacuum do the job? In fact, cluster is exactly the command you are looking for. It will drop the indexes, do a complete table rewrite (in the correct order), and then recreate all the indexes again. In normal operation, a regular vacuum will keep the table under control, but if you actually want to shrink the database files in exceptional circumstances, then cluster is the tool for the job. Matthew -- Matthew: That's one of things about Cambridge - all the roads keep changing names as you walk along them, like Hills Road in particular. Sagar: Yes, Sidney Street is a bit like that too. Matthew: Sidney Street *is* Hills Road.
Matthew Wakeling wrote: > On Wed, 25 Nov 2009, Richard Neill wrote: >>> On Sun, 22 Nov 2009, Richard Neill wrote: >>>> Worse still, doing a cluster of most of the tables and vacuum full >>>> analyze > > In fact, cluster is exactly the command you are looking for. It will > drop the indexes, do a complete table rewrite (in the correct order), > and then recreate all the indexes again. > > In normal operation, a regular vacuum will keep the table under control, > but if you actually want to shrink the database files in exceptional > circumstances, then cluster is the tool for the job. > Thanks - now I understand. In terms of just index bloat, does a regular vacuum help? Richard
Sergey Aleynikov wrote: > Hello, > >> * Is there any way I can nail the query planner to a particular query plan, >> rather than have it keep changing its mind? > > All these setting leads to choosing different plans. If you have small > number of complex sensitive queires, you can run explain on them with > correct settings, then re-order query (joins, subselects) according to > given query plan, and, before running it, call > > set local join_collapse_limit = 1; > set local from_collapse_limit = 1; It's a simple query, but using a complex view. So I can't really re-order it. > This will prevent joins/subselects reordering inside current > transaction block, leading to consistent plans. But that gives no 100% > guarantee for chosing, for example, hash join over nested loop. Are you saying that this means that the query planner frequently makes the wrong choice here? > >> Worse still, doing a cluster of most of the tables and vacuum full analyze made most of the queries >respond much better, but the vox query became very slow again, until I set it to A (which, a few days >ago, did not work well). > > Is your autovacuuming tuned correctly? For large tables, i set it > running much more agressivly then in default install. I hadn't changed it from the defaults; now I've changed it to: autovacuum_max_workers = 6 autovacuum_vacuum_scale_factor = 0.002 autovacuum_analyze_scale_factor = 0.001 is that enough? The DB isn't growing that much, but it does seem to need frequent vacuum/analyze. Richard
Richard Neill <rn214@cam.ac.uk> wrote: > In terms of just index bloat, does a regular vacuum help? You might want to use the REINDEX command to correct serious index bloat. A regular vacuum will make dead space available for re-use, but won't eliminate bloat directly. (If run regularly, it will prevent bloat.) -Kevin
On Wed, Nov 25, 2009 at 4:26 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> In terms of just index bloat, does a regular vacuum help?You might want to use the REINDEX command to correct serious index
bloat. A regular vacuum will make dead space available for re-use,
but won't eliminate bloat directly. (If run regularly, it will
prevent bloat.)
for that reason, it makes sense to actually partition your data - even tho you don't see performance degradation because of data size, but purely because of nature of data.
Other way, is to perform regular cluster && reindex - but this blocks relations you are clustering..
Other way, is to perform regular cluster && reindex - but this blocks relations you are clustering..
--
GJ
Grzegorz Jaœkiewicz<gryzman@gmail.com> wrote: > Other way, is to perform regular cluster && reindex If you CLUSTER there is no reason to REINDEX; indexes are rebuilt by the CLUSTER command. Also, if you do a good job with regular VACUUMs, there isn't any bloat to fix. In that case a regular CLUSTER would only be needed if it was worth the cost to keep data physically organized in the index sequence. -Kevin
On Wed, Nov 25, 2009 at 4:58 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
If you CLUSTER there is no reason to REINDEX; indexes are rebuilt by
the CLUSTER command.
Also, if you do a good job with regular VACUUMs, there isn't any bloat
to fix. In that case a regular CLUSTER would only be needed if it was
worth the cost to keep data physically organized in the index
sequence.
the out of order data layout is primary reason for index bloat. And that happens , and gets worse over time once data is more and more distributed. ("random" deletes, etc).
Thus suggestion of partitioning. I for one, hope in 8.5 we will get much more user friendly partitioning interface - and we would no longer have to write custom triggers. Which is probably the only reason I am only going to partition a table only if it is really really really ... needed.
Thus suggestion of partitioning. I for one, hope in 8.5 we will get much more user friendly partitioning interface - and we would no longer have to write custom triggers. Which is probably the only reason I am only going to partition a table only if it is really really really ... needed.
--
GJ
On Wed, Nov 25, 2009 at 7:27 AM, Richard Neill <rn214@cam.ac.uk> wrote: > Sergey Aleynikov wrote: >> >> Hello, >> >>> * Is there any way I can nail the query planner to a particular query >>> plan, >>> rather than have it keep changing its mind? >> >> All these setting leads to choosing different plans. If you have small >> number of complex sensitive queires, you can run explain on them with >> correct settings, then re-order query (joins, subselects) according to >> given query plan, and, before running it, call >> >> set local join_collapse_limit = 1; >> set local from_collapse_limit = 1; > > It's a simple query, but using a complex view. So I can't really re-order > it. Almost all queries can be reordered to some degree, but you might have to inline the view into the main query to actually be able to do it. Forcing a particular query plan in the manner described here is generally sort of a last resort, though. Usually you want to figure out how to tune things so that the query planner picks the right plan by itself - that's sort of the point of having a query planner... ...Robert
On Wed, 25 Nov 2009, Grzegorz Jaśkiewicz wrote: > the out of order data layout is primary reason for index bloat. And that happens , and > gets worse over time once data is more and more distributed. ("random" deletes, etc). That's not index bloat. Sure, having the table not in the same order as the index will slow down an index scan, but that's a completely different problem altogether. Index bloat is caused by exactly the same mechanism as table bloat. The index needs to have an entry for every row in the table that may be visible by anyone. As with the table, it is not possible to deterministically delete the rows as they become non-visible, so the index (and the table) will be left with dead entries on delete and update. The vacuum command performs garbage collection and marks these dead rows and index entries as free, so that some time in the future more data can be written to those places. Index bloat is when there is an excessive amount of dead space in an index. It can be prevented by (auto)vacuuming regularly, but can only be reversed by REINDEX (or of course deleting the index, or adding loads of new entries to fill up the dead space after vacuuming). Matthew -- for a in past present future; do for b in clients employers associates relatives neighbours pets; do echo "The opinions here in no way reflect the opinions of my $a $b." done; done
Hello, 2009/11/25 Richard Neill <rn214@cam.ac.uk>: >It's a simple query, but using a complex view. So I can't really re-order it. View is inserted directly into your query by PG, and then reordered according to from_collapse_limit. Probably, problems lies in the view? How good is it performing? Or from_collapse_limit is _too low_, so view isn't expanded right? >Are you saying that this means that the query planner frequently makes the wrong choice here? Look at explain analyze. If on some step estimation from planner differs by (for start) two order of magnitude from what's really retrieved, then there's a wrong statistics count. But if, on every step, estimation is not too far away from reality - you suffer from what i've described - planner can't reoder efficiently enough query. Because of it happen sometimes - i suspect gego. Or wrong statistics. >I hadn't changed it from the defaults; now I've changed it to: > autovacuum_max_workers = 6 > autovacuum_vacuum_scale_factor = 0.002 > autovacuum_analyze_scale_factor = 0.001 If your tables are not >100mln rows, that's agressive enough. On 100mln rows, this'd analyze table every 100k changed (inserted/updated/deleted) rows. Is this enough for you? Default on large tables are definatly too low. If you get now consistent times - then you've been hit by wrong statistics. Best regards, Sergey Aleynikov
Hello, 2009/11/25 Richard Neill <rn214@cam.ac.uk>: Also, if you find odd statistics of freshly analyzed table - try increasing statistics target, using ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ... If you're using defaults - it's again low for large tables. Start with 200, for example. Best regards, Sergey Aleynikov
Sergey Aleynikov wrote: > Hello, > > 2009/11/25 Richard Neill <rn214@cam.ac.uk>: > > Also, if you find odd statistics of freshly analyzed table - try > increasing statistics target, using > ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ... > > If you're using defaults - it's again low for large tables. Start with > 200, for example. Thanks. I already had it set way up: 3000. Is there a good description of exactly what analyse does, and how? (in particular, what sort of statistics it gathers). Richard
On 11/25/09 4:18 AM, "Matthew Wakeling" <matthew@flymine.org> wrote: > > The problem is that vacuum full does a full compact of the table, but it > has to update all the indexes as it goes. This makes it slow, and causes > bloat to the indexes. There has been some discussion of removing the > command or at least putting a big warning next to it. > For tables without an index, you still need something. Vacuum full isn't that bad here, but cluster has other advantages. Ideally, you could CLUSTER without using an index, maybe something like CLUSTER table using (column a, ...) To order it by specific columns Or even simply CLUSTER using () For when you don't care about the order at all, and just want to compact the whole thing to its proper size (including fillfactor) and most likely defragmented too. Additionally, I've found it very important to set fillfactor to something other than the default for tables that have lots of updates, especially if there are bulk updates on non-indexed columns.