Thread: 9.0 performance degradation with kernel 3.11

9.0 performance degradation with kernel 3.11

From
Filip Rembiałkowski
Date:
Hi

After upgrading our 9.0 database server

from:
openSUSE 11.4, kernel 2.6.37.6-24-default, Pg 9.0.13

to:
openSUSE 13.1, kernel v 3.11.10-21-default, Pg 9.0.15

... and  overall server load is +1 after that.

We did not add any new services/daemons.

It's hard to track down to individual queries - when I tested most
individual query times are same as before the migration.


Any - ANY - hints will be much appreciated.

Thanks
Filip


Re: 9.0 performance degradation with kernel 3.11

From
Glyn Astill
Date:
> From: Filip Rembiałkowski <filip.rembialkowski@gmail.com>
>To: pgsql-performance@postgresql.org
>Sent: Thursday, 13 November 2014, 8:10
>Subject: [PERFORM] 9.0 performance degradation with kernel 3.11
>
>
>Hi
>
>After upgrading our 9.0 database server
>
>from:
>openSUSE 11.4, kernel 2.6.37.6-24-default, Pg 9.0.13
>
>to:
>openSUSE 13.1, kernel v 3.11.10-21-default, Pg 9.0.15
>
>... and  overall server load is +1 after that.
>
>We did not add any new services/daemons.
>
>It's hard to track down to individual queries - when I tested most
>individual query times are same as before the migration.
>
>
>Any - ANY - hints will be much appreciated.
>
>Thanks
>Filip
>

It's hard to say much going on the little information, but assuming everything was rosy for you with your 2.6 version,
andyou've kept the basics like hardware, filesystem, io scheduler etc the same, there are a few kernel tunables to
tweakon later kernels. 

Usually defragmentation of transparent huge pages causes an issue and it's best to turn off the defrag option:


    echo always > /sys/kernel/mm/transparent_hugepage/enabled
    echo madvise > /sys/kernel/mm/transparent_hugepage/defrag


It's also recommended to increase the value of sched_migration_cost (I think now called sched_migration_cost_ns in
3.11+)and disable sched_autogroup_enabled. 


    kernel.sched_migration_cost=5000000
    kernel.sched_autogroup_enabled=0

Also disable vm.zone_reclaim_mode

    vm.zone_reclaim_mode=0


On some of our systems I also saw marked improvements increasing the values of kernel.sched_min_granularity_ns and
kernel.sched_wakeup_granularity_nstoo, on some other systems this had no effect.  So you may want to try to see if some
largervalues there help. 

A lot of the earlier 3.x kernels aren't great with PostgreSQL, one of the noted issues being a "stable pages" feature
thatblocks processes modifying pages that are currently being written back until the write completes.  I think people
havenoted this gets better in 3.9 onwards, but I personally didn't see much of a marked improvement until 3.16. 



>Thanks
>Filip
>