Thread: 7.4.1 upgrade issues
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm running into an issue where a big query that may take 30-40 seconds to reply is holding up all other backends from performing their queries. Once the big query is finished, all the tiny ones fly through. This is seemingly ne behavior on the box, as with previous versions things would slow down, but not wait for the cpu/resource hog queries to finish. The box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu, and disk speed. I've considered renicing the processes, I was wondering if anyone had a different suggestion. TIA, Gavin
On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote: > I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm > running into an issue where a big query that may take 30-40 seconds to > reply is holding up all other backends from performing their queries. By "holding up", do you mean that it's causing the other transactions to block (INSERT WAITING, for instance), or that it's making everything real slow? It could be your sort_mem is set too high. Remember that the new-in-7.4 hash behaviour works with the sort_mem setting, and if it's set too high and you have enough cases of this, you might actually cause your box to start swapping. > and disk speed. I've considered renicing the processes, I was wondering That is unlikely to help, and certainly won't if the queries are actually blocked. -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
Gavin M. Roy wrote: > I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm > running into an issue where a big query that may take 30-40 seconds to > reply is holding up all other backends from performing their queries. > Once the big query is finished, all the tiny ones fly through. This is > seemingly ne behavior on the box, as with previous versions things would > slow down, but not wait for the cpu/resource hog queries to finish. The > box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu, > and disk speed. I've considered renicing the processes, I was wondering > if anyone had a different suggestion. Hi Gavin. Assuming a VACUUM ANALYZE after reload, one possibility is that the query in question contains >= 11 joins. I forgot to adjust the GEQO settings during an upgrade and experienced the associated sluggishness in planning time. Mike Mascari
It's not WAITING, the larger queries are eating cpu (99%) and the rest are running so slow it would seem they're waitng for processing time. My sort mem is fairly high, but this is a dedicated box, and there is no swapping going on afaik, Gavin Andrew Sullivan wrote: >On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote: > > >>I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm >>running into an issue where a big query that may take 30-40 seconds to >>reply is holding up all other backends from performing their queries. >> >> > >By "holding up", do you mean that it's causing the other transactions >to block (INSERT WAITING, for instance), or that it's making >everything real slow? > >It could be your sort_mem is set too high. Remember that the >new-in-7.4 hash behaviour works with the sort_mem setting, and if >it's set too high and you have enough cases of this, you might >actually cause your box to start swapping. > > > >>and disk speed. I've considered renicing the processes, I was wondering >> >> > >That is unlikely to help, and certainly won't if the queries are >actually blocked. > > >
It is using indexs, and not seqscan, and there was an analyze after reload... I'll play with GEQO, thanks. Gavin Mike Mascari wrote: > Gavin M. Roy wrote: > >> I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm >> running into an issue where a big query that may take 30-40 seconds >> to reply is holding up all other backends from performing their >> queries. Once the big query is finished, all the tiny ones fly >> through. This is seemingly ne behavior on the box, as with previous >> versions things would slow down, but not wait for the cpu/resource >> hog queries to finish. The box is Slackware 8.1, on a fairly decent >> box with plenty of ram, cpu, and disk speed. I've considered >> renicing the processes, I was wondering if anyone had a different >> suggestion. > > > Hi Gavin. > > Assuming a VACUUM ANALYZE after reload, one possibility is that the > query in question contains >= 11 joins. I forgot to adjust the GEQO > settings during an upgrade and experienced the associated sluggishness > in planning time. > > Mike Mascari > >
"Gavin M. Roy" said: > I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm > running into an issue where a big query that may take 30-40 seconds to > reply is holding up all other backends from performing their queries. > Once the big query is finished, all the tiny ones fly through. This is > seemingly ne behavior on the box, as with previous versions things would > slow down, but not wait for the cpu/resource hog queries to finish. The > box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu, > and disk speed. I've considered renicing the processes, I was wondering > if anyone had a different suggestion. > It sounds like you are suggesting this same system and data worked fine on 7.3.4. Just the same, you might want to provide more detail anyway. EIDE drives when used (not recommended for servers IMO) are often not configured properly and can cause similar issues in a system with tons of ram and cpu. Best, Jim -- Jim Wilson - IT Manager Kelco Industries PO Box 160 58 Main Street Milbridge, ME 04658 207-546-7989 - FAX 207-546-2791 http://www.kelcomaine.com
"Gavin M. Roy" <gmr@ehpg.net> writes: > It's not WAITING, the larger queries are eating cpu (99%) and the rest > are running so slow it would seem they're waitng for processing time. Could we see EXPLAIN ANALYZE output for the large query? (Also the usual supporting evidence, ie table schemas for all the tables involved.) regards, tom lane
I'll post it if you want, but the issue isn't with the optimizer, index usage, or seq scan, the issue seems to be more revolving around the backend getting so much cpu priority it's not allowing other backends to process, or something along those lines. For the hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/ accompanying fast drives... Again, I'll send the explain, etc if you think it would help answer my question, but from my perspective, the amount of time the query takes to execute isnt my issue, but the fact that nothing else can seemingly execute while its running. Gavin Tom Lane wrote: >"Gavin M. Roy" <gmr@ehpg.net> writes: > > >>It's not WAITING, the larger queries are eating cpu (99%) and the rest >>are running so slow it would seem they're waitng for processing time. >> >> > >Could we see EXPLAIN ANALYZE output for the large query? (Also the >usual supporting evidence, ie table schemas for all the tables >involved.) > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > >
"Gavin M. Roy" <gmr@ehpg.net> writes: > ... the issue seems to be more revolving around the > backend getting so much cpu priority it's not allowing other backends to > process, or something along those lines. I can't think of any difference between 7.3 and 7.4 that would create a problem of that sort where there was none before. For that matter, since Postgres runs nonprivileged it's hard to see how it could create a priority problem in any version. I thought the previous suggestion about added use of hashtables was a pretty good idea. We could confirm or disprove it by looking at EXPLAIN output. regards, tom lane
This reminds me of the scheduler optimizations that have been flying around the Linux kernel deveopment over the last year or so. There are cases apparently where this kind of behavior can come up. IIRC it's fixed in later kernels but don't take my word for it, I'm just writing to give a heads-up. Take a look at the Linux kernel mailing list, and you'll probably find good articles at Linux Weekly News (lwn.net.) On 2004.03.06 23:32 Gavin M. Roy wrote: > I'll post it if you want, but the issue isn't with the optimizer, > index usage, or seq scan, the issue seems to be more revolving around > the backend getting so much cpu priority it's not allowing other > backends to process, or something along those lines. For the > hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/ > accompanying fast drives... > > Again, I'll send the explain, etc if you think it would help answer > my question, but from my perspective, the amount of time the query > takes to execute isnt my issue, but the fact that nothing else can > seemingly execute while its running. > > Gavin > > Tom Lane wrote: > >> "Gavin M. Roy" <gmr@ehpg.net> writes: >> >>> It's not WAITING, the larger queries are eating cpu (99%) and the >>> rest are running so slow it would seem they're waitng for >>> processing time. >> >> Could we see EXPLAIN ANALYZE output for the large query? (Also the >> usual supporting evidence, ie table schemas for all the tables >> involved.) Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Thanks, I'll take a look, we've rewritten the queries and indexes to avoid the issue, but I'd like to get an ultimate solution to the issue, and the concept that it's a linux kernel scheduling thing is probably dead on. Gavin Karl O. Pinc wrote: > This reminds me of the scheduler optimizations that have been flying > around the Linux kernel deveopment over the last year or so. There are > cases apparently where this kind of behavior can come up. IIRC it's > fixed in later kernels but don't take my word for it, I'm just writing > to give a heads-up. Take a look at the Linux kernel mailing list, > and you'll probably find good articles at Linux Weekly News (lwn.net.) > > On 2004.03.06 23:32 Gavin M. Roy wrote: > >> I'll post it if you want, but the issue isn't with the optimizer, >> index usage, or seq scan, the issue seems to be more revolving around >> the backend getting so much cpu priority it's not allowing other >> backends to process, or something along those lines. For the >> hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/ >> accompanying fast drives... >> >> Again, I'll send the explain, etc if you think it would help answer >> my question, but from my perspective, the amount of time the query >> takes to execute isnt my issue, but the fact that nothing else can >> seemingly execute while its running. >> >> Gavin >> >> Tom Lane wrote: >> >>> "Gavin M. Roy" <gmr@ehpg.net> writes: >>> >>>> It's not WAITING, the larger queries are eating cpu (99%) and the >>>> rest are running so slow it would seem they're waitng for >>>> processing time. >>> >>> >>> Could we see EXPLAIN ANALYZE output for the large query? (Also the >>> usual supporting evidence, ie table schemas for all the tables >>> involved.) >> > > Karl <kop@meme.com> > Free Software: "You don't pay back, you pay forward." > -- Robert A. Heinlein