Thread: PSA: If you are running Precise/12.04 upgrade your kernel.
Hello, I had the distinct displeasure of staying up entirely too late with a customer this week because they upgraded to 12.04 and immediately experienced a huge performance regression. In the process they also upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify because of this. However, nothing I did fixed the problem. Until... I upgraded the kernel. Upgrading from 3.2Precise to the 3.9.4 kernel produced the following results: http://www.commandprompt.com/blogs/joshua_drake/2013/06/the_steaming_pile_that_is_precise_with_kernel_32/ I have since verified this on more than one machine as well. Upgrading the kernel has drastically reduced overall IOWAIT times. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > Hello, > > I had the distinct displeasure of staying up entirely too late with a > customer this week because they upgraded to 12.04 and immediately > experienced a huge performance regression. In the process they also upgraded > to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify > because of this. However, nothing I did fixed the problem. Until... I > upgraded the kernel. > > Upgrading from 3.2Precise to the 3.9.4 kernel produced the following > results: I've since heard that 3.4 also fixes this issue as well. What are you using for your IO on these boxes?
On 06/06/2013 03:48 PM, Scott Marlowe wrote: > > On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >> >> Hello, >> >> I had the distinct displeasure of staying up entirely too late with a >> customer this week because they upgraded to 12.04 and immediately >> experienced a huge performance regression. In the process they also upgraded >> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify >> because of this. However, nothing I did fixed the problem. Until... I >> upgraded the kernel. >> >> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following >> results: > > I've since heard that 3.4 also fixes this issue as well. > > What are you using for your IO on these boxes? I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well as DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385 G7). JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 07/06/13 08:35, Joshua D. Drake wrote: > > Hello, > > I had the distinct displeasure of staying up entirely too late with a > customer this week because they upgraded to 12.04 and immediately > experienced a huge performance regression. In the process they also > upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to > change/fix/modify because of this. However, nothing I did fixed the > problem. Until... I upgraded the kernel. > > Upgrading from 3.2Precise to the 3.9.4 kernel produced the following > results: > > http://www.commandprompt.com/blogs/joshua_drake/2013/06/the_steaming_pile_that_is_precise_with_kernel_32/ > > > I have since verified this on more than one machine as well. Upgrading > the kernel has drastically reduced overall IOWAIT times. I'd be curious to hear if the same problem applies to the 3.2 kernel that's in the recently-released Debian "Wheezy"? (My ubuntu precise boxes have been running the backported kernels for a while, as it is, but some debian squeeze boxes are due to be upgraded to debian wheezy soon)
Folks, This is bad news as I run Ubuntu 12.04 LTS. However, my ubuntu 12.04 LTS boxes have been updated to "3.5.0-32-generic" (official update). Any idea whether the Postgresql has problems with this kernel? I'd like to follow the "official" LTS updates because I am not sure what other surprises I could face if I move to an unofficial one. Thanks! Nikhil On 07-06-2013 04:18, Scott Marlowe wrote: > On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >> Hello, >> >> I had the distinct displeasure of staying up entirely too late with a >> customer this week because they upgraded to 12.04 and immediately >> experienced a huge performance regression. In the process they also upgraded >> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify >> because of this. However, nothing I did fixed the problem. Until... I >> upgraded the kernel. >> >> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following >> results: > I've since heard that 3.4 also fixes this issue as well. > > What are you using for your IO on these boxes? > >
Perhaps someone with a spare server floating around could install Ubuntu LTS and run some pg-bench benchmarks with the various kernel options? Like you, I'd have to stick to official updates for production systems. -Toby On 07/06/13 15:36, Nikhil G Daddikar wrote: > Folks, > > This is bad news as I run Ubuntu 12.04 LTS. However, my ubuntu 12.04 LTS > boxes have been updated to "3.5.0-32-generic" (official update). Any > idea whether the Postgresql has problems with this kernel? I'd like to > follow the "official" LTS updates because I am not sure what other > surprises I could face if I move to an unofficial one. > > Thanks! > Nikhil > > > > On 07-06-2013 04:18, Scott Marlowe wrote: >> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> >> wrote: >>> Hello, >>> >>> I had the distinct displeasure of staying up entirely too late with a >>> customer this week because they upgraded to 12.04 and immediately >>> experienced a huge performance regression. In the process they also >>> upgraded >>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to >>> change/fix/modify >>> because of this. However, nothing I did fixed the problem. Until... I >>> upgraded the kernel. >>> >>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following >>> results: >> I've since heard that 3.4 also fixes this issue as well. >> >> What are you using for your IO on these boxes? >> >> > > >
On 06/06/13 15:35, Joshua D. Drake wrote: > > I had the distinct displeasure of staying up entirely too late with a > customer this week because they upgraded to 12.04 and immediately > experienced a huge performance regression. In the process they also > upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to > change/fix/modify because of this. However, nothing I did fixed the > problem. Until... I upgraded the kernel. We ran head-long into this problem the day after you posted this. We are in the process of moving from PG 8.4 on UB Server 10.0 LTS onto PG 9.2 on UB Server 12.04 LTS and encountered this very issue during the pg_upgradecluster process. A colleague mentioned this LKML thread: <http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/00725.html> Seems it was fixed in 3.9.x. I'm wonder if there is any way to easily determine if the fix was back-ported to the various Ubunutu-maintained kernels for Precise? Bosco.
On 06/14/2013 09:12 AM, Bosco Rama wrote: > A colleague mentioned this LKML thread: > <http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/00725.html> > > Seems it was fixed in 3.9.x. I'm wonder if there is any way to easily > determine if the fix was back-ported to the various Ubunutu-maintained > kernels for Precise? It is pretty easy to test for using iozone with multiple threads. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Fri, Jun 7, 2013 at 5:51 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > > On 06/06/2013 03:48 PM, Scott Marlowe wrote: >> >> >> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> >> wrote: >>> >>> I had the distinct displeasure of staying up entirely too late with a >>> customer this week because they upgraded to 12.04 and immediately >>> experienced a huge performance regression. In the process they also >>> upgraded >>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to >>> change/fix/modify >>> because of this. However, nothing I did fixed the problem. Until... I >>> upgraded the kernel. >>> >>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following >>> results: >> >> >> I've since heard that 3.4 also fixes this issue as well. >> >> What are you using for your IO on these boxes? > > I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well as > DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385 G7). This might sound familiar: http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html tl;dr for that thread seems to be a driver problem (fusionIO?), I'm unsure if Ubuntu specific or in the upstream kernel. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On 06/17/2013 01:34 PM, Stuart Bishop wrote: >>> I've since heard that 3.4 also fixes this issue as well. >>> >>> What are you using for your IO on these boxes? >> >> I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well as >> DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385 G7). > > > This might sound familiar: > > http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html > > tl;dr for that thread seems to be a driver problem (fusionIO?), I'm > unsure if Ubuntu specific or in the upstream kernel. If it is a driver problem, then two different drivers were buggy the Nimble Storage San driver (iSCSI) as well as the DL385 DAS (LSI). Anyway the upgrade to 3.9 makes the problem disappear. There are other insights in the comments of the blog post. JD > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 06/17/2013 04:00 PM, Joshua D. Drake wrote: >> http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html >> >> tl;dr for that thread seems to be a driver problem (fusionIO?), I'm >> unsure if Ubuntu specific or in the upstream kernel. That instance wasn't a driver problem. The problem was that the FusionIO driver uses kernel threads to perform IO, and it seems that several of the 3.x kernels have issues with task migration using the new CFS CPU scheduler which replaced the O(1) one. The next thread related to this that fixed our particular case was this one: http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Good to know. I've got a few spare machines I might be able to test 3.2 kernels on in the next few months On Thu, Jun 20, 2013 at 12:54 PM, Shaun Thomas <sthomas@optionshouse.com> wrote: > On 06/17/2013 04:00 PM, Joshua D. Drake wrote: > >>> >>> http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html >>> >>> tl;dr for that thread seems to be a driver problem (fusionIO?), I'm >>> unsure if Ubuntu specific or in the upstream kernel. > > > That instance wasn't a driver problem. The problem was that the FusionIO > driver uses kernel threads to perform IO, and it seems that several of the > 3.x kernels have issues with task migration using the new CFS CPU scheduler > which replaced the O(1) one. > > The next thread related to this that fixed our particular case was this one: > > http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-676-8870 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related > to this email -- To understand recursion, one must first understand recursion.