Thread: PSA: If you are running Precise/12.04 upgrade your kernel.

PSA: If you are running Precise/12.04 upgrade your kernel.

From
"Joshua D. Drake"
Date:
Hello,

I had the distinct displeasure of staying up entirely too late with a
customer this week because they upgraded to 12.04 and immediately
experienced a huge performance regression. In the process they also
upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to
change/fix/modify because of this. However, nothing I did fixed the
problem. Until... I upgraded the kernel.

Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
results:

http://www.commandprompt.com/blogs/joshua_drake/2013/06/the_steaming_pile_that_is_precise_with_kernel_32/

I have since verified this on more than one machine as well. Upgrading
the kernel has drastically reduced overall IOWAIT times.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Scott Marlowe
Date:
On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> Hello,
>
> I had the distinct displeasure of staying up entirely too late with a
> customer this week because they upgraded to 12.04 and immediately
> experienced a huge performance regression. In the process they also upgraded
> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify
> because of this. However, nothing I did fixed the problem. Until... I
> upgraded the kernel.
>
> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
> results:

I've since heard that 3.4 also fixes this issue as well.

What are you using for your IO on these boxes?


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
"Joshua D. Drake"
Date:
On 06/06/2013 03:48 PM, Scott Marlowe wrote:
>
> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>>
>> Hello,
>>
>> I had the distinct displeasure of staying up entirely too late with a
>> customer this week because they upgraded to 12.04 and immediately
>> experienced a huge performance regression. In the process they also upgraded
>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify
>> because of this. However, nothing I did fixed the problem. Until... I
>> upgraded the kernel.
>>
>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
>> results:
>
> I've since heard that 3.4 also fixes this issue as well.
>
> What are you using for your IO on these boxes?

I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well
as DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385
G7).

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Toby Corkindale
Date:
On 07/06/13 08:35, Joshua D. Drake wrote:
>
> Hello,
>
> I had the distinct displeasure of staying up entirely too late with a
> customer this week because they upgraded to 12.04 and immediately
> experienced a huge performance regression. In the process they also
> upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to
> change/fix/modify because of this. However, nothing I did fixed the
> problem. Until... I upgraded the kernel.
>
> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
> results:
>
> http://www.commandprompt.com/blogs/joshua_drake/2013/06/the_steaming_pile_that_is_precise_with_kernel_32/
>
>
> I have since verified this on more than one machine as well. Upgrading
> the kernel has drastically reduced overall IOWAIT times.

I'd be curious to hear if the same problem applies to the 3.2 kernel
that's in the recently-released Debian "Wheezy"?

(My ubuntu precise boxes have been running the backported kernels for a
while, as it is, but some debian squeeze boxes are due to be upgraded to
debian wheezy soon)



Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Nikhil G Daddikar
Date:
Folks,

This is bad news as I run Ubuntu 12.04 LTS. However, my ubuntu 12.04 LTS
boxes have been updated to "3.5.0-32-generic" (official update). Any
idea whether the Postgresql has problems with this kernel? I'd like to
follow the "official" LTS updates because I am not sure what other
surprises I could face if I move to an unofficial one.

Thanks!
Nikhil



On 07-06-2013 04:18, Scott Marlowe wrote:
> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> Hello,
>>
>> I had the distinct displeasure of staying up entirely too late with a
>> customer this week because they upgraded to 12.04 and immediately
>> experienced a huge performance regression. In the process they also upgraded
>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to change/fix/modify
>> because of this. However, nothing I did fixed the problem. Until... I
>> upgraded the kernel.
>>
>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
>> results:
> I've since heard that 3.4 also fixes this issue as well.
>
> What are you using for your IO on these boxes?
>
>



Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Toby Corkindale
Date:
Perhaps someone with a spare server floating around could install Ubuntu
LTS and run some pg-bench benchmarks with the various kernel options?

Like you, I'd have to stick to official updates for production systems.

-Toby

On 07/06/13 15:36, Nikhil G Daddikar wrote:
> Folks,
>
> This is bad news as I run Ubuntu 12.04 LTS. However, my ubuntu 12.04 LTS
> boxes have been updated to "3.5.0-32-generic" (official update). Any
> idea whether the Postgresql has problems with this kernel? I'd like to
> follow the "official" LTS updates because I am not sure what other
> surprises I could face if I move to an unofficial one.
>
> Thanks!
> Nikhil
>
>
>
> On 07-06-2013 04:18, Scott Marlowe wrote:
>> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com>
>> wrote:
>>> Hello,
>>>
>>> I had the distinct displeasure of staying up entirely too late with a
>>> customer this week because they upgraded to 12.04 and immediately
>>> experienced a huge performance regression. In the process they also
>>> upgraded
>>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to
>>> change/fix/modify
>>> because of this. However, nothing I did fixed the problem. Until... I
>>> upgraded the kernel.
>>>
>>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
>>> results:
>> I've since heard that 3.4 also fixes this issue as well.
>>
>> What are you using for your IO on these boxes?
>>
>>
>
>
>



Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Bosco Rama
Date:
On 06/06/13 15:35, Joshua D. Drake wrote:
>
> I had the distinct displeasure of staying up entirely too late with a
> customer this week because they upgraded to 12.04 and immediately
> experienced a huge performance regression. In the process they also
> upgraded to PostgreSQL 9.1 from 8.4. There were a lot of knobs to
> change/fix/modify because of this. However, nothing I did fixed the
> problem. Until... I upgraded the kernel.

We ran head-long into this problem the day after you posted this.  We
are in the process of moving from PG 8.4 on UB Server 10.0 LTS onto
PG 9.2 on UB Server 12.04 LTS and encountered this very issue during the
pg_upgradecluster process.

A colleague mentioned this LKML thread:
  <http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/00725.html>

Seems it was fixed in 3.9.x.  I'm wonder if there is any way to easily
determine if the fix was back-ported to the various Ubunutu-maintained
kernels for Precise?

Bosco.


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
"Joshua D. Drake"
Date:
On 06/14/2013 09:12 AM, Bosco Rama wrote:

> A colleague mentioned this LKML thread:
>    <http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/00725.html>
>
> Seems it was fixed in 3.9.x.  I'm wonder if there is any way to easily
> determine if the fix was back-ported to the various Ubunutu-maintained
> kernels for Precise?

It is pretty easy to test for using iozone with multiple threads.

JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Stuart Bishop
Date:
On Fri, Jun 7, 2013 at 5:51 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> On 06/06/2013 03:48 PM, Scott Marlowe wrote:
>>
>>
>> On Thu, Jun 6, 2013 at 4:35 PM, Joshua D. Drake <jd@commandprompt.com>
>> wrote:
>>>
>>> I had the distinct displeasure of staying up entirely too late with a
>>> customer this week because they upgraded to 12.04 and immediately
>>> experienced a huge performance regression. In the process they also
>>> upgraded
>>> to PostgreSQL 9.1 from 8.4. There were a lot of knobs to
>>> change/fix/modify
>>> because of this. However, nothing I did fixed the problem. Until... I
>>> upgraded the kernel.
>>>
>>> Upgrading from 3.2Precise to the 3.9.4 kernel produced the following
>>> results:
>>
>>
>> I've since heard that 3.4 also fixes this issue as well.
>>
>> What are you using for your IO on these boxes?
>
> I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well as
> DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385 G7).


This might sound familiar:

http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html

tl;dr for that thread seems to be a driver problem (fusionIO?), I'm
unsure if Ubuntu specific or in the upstream kernel.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
"Joshua D. Drake"
Date:
On 06/17/2013 01:34 PM, Stuart Bishop wrote:

>>> I've since heard that 3.4 also fixes this issue as well.
>>>
>>> What are you using for your IO on these boxes?
>>
>> I was able to demonstrate it over iSCSI to a Nimble Storage SAN as well as
>> DAS with 2 drive RAID 1 for xlogs and 8 drive RAID 10 for data (DL385 G7).
>
>
> This might sound familiar:
>
> http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html
>
> tl;dr for that thread seems to be a driver problem (fusionIO?), I'm
> unsure if Ubuntu specific or in the upstream kernel.

If it is a driver problem, then two different drivers were buggy the
Nimble Storage San driver (iSCSI) as well as the DL385 DAS (LSI). Anyway
the upgrade to 3.9 makes the problem disappear. There are other insights
in the comments of the blog post.

JD



>


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
    a rose in the deeps of my heart. - W.B. Yeats


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Shaun Thomas
Date:
On 06/17/2013 04:00 PM, Joshua D. Drake wrote:

>> http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html
>>
>> tl;dr for that thread seems to be a driver problem (fusionIO?), I'm
>> unsure if Ubuntu specific or in the upstream kernel.

That instance wasn't a driver problem. The problem was that the FusionIO
driver uses kernel threads to perform IO, and it seems that several of
the 3.x kernels have issues with task migration using the new CFS CPU
scheduler which replaced the O(1) one.

The next thread related to this that fixed our particular case was this one:

http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: PSA: If you are running Precise/12.04 upgrade your kernel.

From
Scott Marlowe
Date:
Good to know. I've got a few spare machines I might be able to test
3.2 kernels on in the next few months


On Thu, Jun 20, 2013 at 12:54 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 06/17/2013 04:00 PM, Joshua D. Drake wrote:
>
>>>
>>> http://postgresql.1045698.n5.nabble.com/Ubuntu-12-04-3-2-Kernel-Bad-for-PostgreSQL-Performance-td5735284.html
>>>
>>> tl;dr for that thread seems to be a driver problem (fusionIO?), I'm
>>> unsure if Ubuntu specific or in the upstream kernel.
>
>
> That instance wasn't a driver problem. The problem was that the FusionIO
> driver uses kernel threads to perform IO, and it seems that several of the
> 3.x kernels have issues with task migration using the new CFS CPU scheduler
> which replaced the O(1) one.
>
> The next thread related to this that fixed our particular case was this one:
>
> http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-676-8870
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related
> to this email



--
To understand recursion, one must first understand recursion.