Thread: Centos 6.9 and centos 7

Centos 6.9 and centos 7

From
Nicola Contu
Date:
Hello, we recently upgrade OS from centos 6.9 to a new server with centos 7. The centos 6.9 server has became the preproduction server now. We are running postgres 9.6.6 on both servers. They are both on SSD disk, these are the only differences : - DB partition on centos 7 is on a RAID 10 - file system is xfs on centos 7 (ext4 in centos 6.9) - more memory on the centos 7 (so params on the postgres.conf are higher) max_connections = 220 shared_buffers = 10GB effective_cache_size = 120GB work_mem = 349525kB maintenance_work_mem = 2GB min_wal_size = 1GB max_wal_size = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 - we have two replicas on the centos 7. One is async one is sync synchronous_standby_names = '1 ( "****" )' synchronous_commit = on The have the same db inside, with same data. Running the same script on the two servers will give different results. Even a select query is faster on the centos 6.9 server. Half time on the preprod server centos 7 : dbname=# \timing Timing is on. cmdv3=# SELECT id FROM client_billing_account WHERE name = 'name'; id ------- ***** (1 row) Time: 3.884 ms centos 6.9 dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id ------- ***** (1 row) Time: 1.620 ms This table has 32148 records. Do you think we can modify anything to achieve same performances? I read about few kernel params : kernel.sched_migration_cost_ns = 5000000 kernel.sched_autogroup_enabled = 0 vm.dirty_background_bytes = 67108864 vm.dirty_bytes = 1073741824 vm.zone_reclaim_mode = 0 vm.swappiness = 1.1 Is there anything you can advice to solve or identify the problem? Thanks a lot, Nicola

Re: Centos 6.9 and centos 7

From
Chris Mair
Date:
> centos 7 :
> Time: 3.884 ms
> 
> centos 6.9
Time: 1.620 ms
> 
>
> Is there anything you can advice to solve or identify the problem?

Can you run this query 10 times on each server and note the timings?

I'd like to see the reproducability of this.

Also: both machines are otherwise idle (check with top or uptime)?

Bye,
Chris.




Re: Centos 6.9 and centos 7

From
Nicola Contu
Date:
These are the timings in centos 7 : Time: 4.248 ms Time: 2.983 ms Time: 3.027 ms Time: 3.298 ms Time: 4.420 ms Time: 2.599 ms Time: 2.555 ms Time: 3.008 ms Time: 6.220 ms Time: 4.275 ms Time: 2.841 ms Time: 3.699 ms Time: 3.387 ms These are the timings in centos 6: Time: 1.722 ms Time: 1.670 ms Time: 1.843 ms Time: 1.823 ms Time: 1.723 ms Time: 1.724 ms Time: 1.747 ms Time: 1.734 ms Time: 1.764 ms Time: 1.622 ms This is top on centos 6 : [root@****]# top top - 14:33:32 up 577 days, 23:08, 1 user, load average: 0.16, 0.11, 0.15 Tasks: 1119 total, 1 running, 1118 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132040132k total, 129530504k used, 2509628k free, 108084k buffers Swap: 11665404k total, 331404k used, 11334000k free, 124508916k cached This is top on centos 7: top - 14:35:38 up 73 days, 19:00, 6 users, load average: 22.46, 20.89, 20.54 Tasks: 821 total, 13 running, 807 sleeping, 0 stopped, 1 zombie %Cpu(s): 14.2 us, 5.0 sy, 0.0 ni, 77.5 id, 3.1 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 26383592+total, 4301464 free, 6250384 used, 25328406+buff/cache KiB Swap: 16777212 total, 11798876 free, 4978336 used. 24497036+avail Mem The production machine is obviously more accessed. But that does not seem to be the problem as running the same query on the replica of the production machine (same config of the master but not accessed by anyone) gives the same bad result: Time: 6.366 ms 2017-12-04 15:19 GMT+01:00 Chris Mair : > centos 7 : >> Time: 3.884 ms >> >> centos 6.9 >> > Time: 1.620 ms > >> >> >> Is there anything you can advice to solve or identify the problem? >> > > Can you run this query 10 times on each server and note the timings? > > I'd like to see the reproducability of this. > > Also: both machines are otherwise idle (check with top or uptime)? > > Bye, > Chris. > > >

Re: Centos 6.9 and centos 7

From
Nicola Contu
Date:
To make a better testing, I used a third server. This is identical to the centos 7 machine, and it is not included in the replica cluster. Nobody is accessing this machine, this is top : top - 14:48:36 up 73 days, 17:39, 3 users, load average: 0.00, 0.01, 0.05 Tasks: 686 total, 1 running, 685 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 26383592+total, 1782196 free, 2731144 used, 25932257+buff/cache KiB Swap: 16777212 total, 16298536 free, 478676 used. 21693456+avail Mem These are timings : Time: 2.841 ms Time: 1.980 ms Time: 2.240 ms Time: 2.947 ms Time: 2.828 ms Time: 2.227 ms Time: 1.998 ms Time: 1.990 ms Time: 2.643 ms Time: 2.143 ms Time: 2.919 ms Time: 2.246 ms I never got same results of the centos 6.9 machine. 2017-12-04 15:40 GMT+01:00 Nicola Contu : > These are the timings in centos 7 : > > Time: 4.248 ms > Time: 2.983 ms > Time: 3.027 ms > Time: 3.298 ms > Time: 4.420 ms > Time: 2.599 ms > Time: 2.555 ms > Time: 3.008 ms > Time: 6.220 ms > Time: 4.275 ms > Time: 2.841 ms > Time: 3.699 ms > Time: 3.387 ms > > > These are the timings in centos 6: > Time: 1.722 ms > Time: 1.670 ms > Time: 1.843 ms > Time: 1.823 ms > Time: 1.723 ms > Time: 1.724 ms > Time: 1.747 ms > Time: 1.734 ms > Time: 1.764 ms > Time: 1.622 ms > > > This is top on centos 6 : > > [root@****]# top > top - 14:33:32 up 577 days, 23:08, 1 user, load average: 0.16, 0.11, 0.15 > Tasks: 1119 total, 1 running, 1118 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 132040132k total, 129530504k used, 2509628k free, 108084k buffers > Swap: 11665404k total, 331404k used, 11334000k free, 124508916k cached > > This is top on centos 7: > > top - 14:35:38 up 73 days, 19:00, 6 users, load average: 22.46, 20.89, > 20.54 > Tasks: 821 total, 13 running, 807 sleeping, 0 stopped, 1 zombie > %Cpu(s): 14.2 us, 5.0 sy, 0.0 ni, 77.5 id, 3.1 wa, 0.0 hi, 0.2 si, > 0.0 st > KiB Mem : 26383592+total, 4301464 free, 6250384 used, 25328406+buff/cache > KiB Swap: 16777212 total, 11798876 free, 4978336 used. 24497036+avail Mem > > > The production machine is obviously more accessed. But that does not seem > to be the problem as running the same query on the replica of the > production machine (same config of the master but not accessed by anyone) > gives the same bad result: > Time: 6.366 ms > > > 2017-12-04 15:19 GMT+01:00 Chris Mair : > >> centos 7 : >>> Time: 3.884 ms >>> >>> centos 6.9 >>> >> Time: 1.620 ms >> >>> >>> >>> Is there anything you can advice to solve or identify the problem? >>> >> >> Can you run this query 10 times on each server and note the timings? >> >> I'd like to see the reproducability of this. >> >> Also: both machines are otherwise idle (check with top or uptime)? >> >> Bye, >> Chris. >> >> >> >

Re: Centos 6.9 and centos 7

From
Tomas Vondra
Date:
On 12/04/2017 02:19 PM, Nicola Contu wrote:
...>
> centos 7 : 
> 
> dbname=# \timing Timing is on. cmdv3=# SELECT id FROM
> client_billing_account WHERE name = 'name'; id ------- ***** (1 row)
> Time: 3.884 ms
> 
> centos 6.9
> 
> dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id
> ------- ***** (1 row) Time: 1.620 ms
> 

We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries.

Are those VMs or bare metal? What CPUs and RAM are there? Have you
checked that power management is disabled / cpufreq uses the same
policy? That typically affects short CPU-bound queries.

Other than that, I recommend performing basic system benchmarks (CPU,
memory, ...) and only if those machines perform equally should you look
for issues in PostgreSQL. Chances are the root cause is in hw or OS, in
which case you need to address that first.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Centos 6.9 and centos 7

From
Alban Hertroys
Date:
Did  you run ANALYZE on your tables before the test?

On 4 December 2017 at 16:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On 12/04/2017 02:19 PM, Nicola Contu wrote:
> ...>
>> centos 7 :
>>
>> dbname=# \timing Timing is on. cmdv3=# SELECT id FROM
>> client_billing_account WHERE name = 'name'; id ------- ***** (1 row)
>> Time: 3.884 ms
>>
>> centos 6.9
>>
>> dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id
>> ------- ***** (1 row) Time: 1.620 ms
>>
>
> We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries.
>
> Are those VMs or bare metal? What CPUs and RAM are there? Have you
> checked that power management is disabled / cpufreq uses the same
> policy? That typically affects short CPU-bound queries.
>
> Other than that, I recommend performing basic system benchmarks (CPU,
> memory, ...) and only if those machines perform equally should you look
> for issues in PostgreSQL. Chances are the root cause is in hw or OS, in
> which case you need to address that first.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Centos 6.9 and centos 7

From
Nicola Contu
Date:
No I did not run a vacuum analyze. Do you want me to try with that first? @Tomas: Talking abut power management, I changed the profile for tuned-adm to latency-performance instead of balanced (that is the default) that is increasing performances for now and they are similar to centos 6.9. Time: 2.121 ms Time: 2.026 ms Time: 1.664 ms Time: 1.749 ms Time: 1.656 ms Time: 1.675 ms Do you think this can be easily done in production as well? 2017-12-04 16:37 GMT+01:00 Alban Hertroys : > Did you run ANALYZE on your tables before the test? > > On 4 December 2017 at 16:01, Tomas Vondra > wrote: > > > > On 12/04/2017 02:19 PM, Nicola Contu wrote: > > ...> > >> centos 7 : > >> > >> dbname=# \timing Timing is on. cmdv3=# SELECT id FROM > >> client_billing_account WHERE name = 'name'; id ------- ***** (1 row) > >> Time: 3.884 ms > >> > >> centos 6.9 > >> > >> dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id > >> ------- ***** (1 row) Time: 1.620 ms > >> > > > > We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries. > > > > Are those VMs or bare metal? What CPUs and RAM are there? Have you > > checked that power management is disabled / cpufreq uses the same > > policy? That typically affects short CPU-bound queries. > > > > Other than that, I recommend performing basic system benchmarks (CPU, > > memory, ...) and only if those machines perform equally should you look > > for issues in PostgreSQL. Chances are the root cause is in hw or OS, in > > which case you need to address that first. > > > > regards > > > > -- > > Tomas Vondra http://www.2ndQuadrant.com > > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > > > > -- > If you can't see the forest for the trees, > Cut the trees and you'll see there is no forest. >

Re: Centos 6.9 and centos 7

From
Tomas Vondra
Date:
On 12/04/2017 04:57 PM, Nicola Contu wrote:
> No I did not run a vacuum analyze. Do you want me to try with that first?
> 
> @Tomas:
> Talking abut power management, I changed the profile for tuned-adm
> to latency-performance instead of balanced (that is the default)
> 
> that is increasing performances for now and they are similar to centos 6.9.
> 
> Time: 2.121 ms
> Time: 2.026 ms
> Time: 1.664 ms
> Time: 1.749 ms
> Time: 1.656 ms
> Time: 1.675 ms
> 
> Do you think this can be easily done in production as well? 
> 

How am I supposed to know? Not only that depends on your internal
deployment policies, but it's also much more a CentOS/RedHat question
than PostgreSQL.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Centos 6.9 and centos 7

From
Alban Hertroys
Date:
> On 4 Dec 2017, at 16:57, Nicola Contu <nicola.contu@gmail.com> wrote:
>
> No I did not run a vacuum analyze. Do you want me to try with that first?

That means your statistics may not be up to date, although by now autovacuum should have done the job (you didn't turn
thatoff or anything, did you?). Bad statistics result in non-optimal query plans and therefore could very well cause
yourtiming differences. 

An easy way to verify, since you still have access to both versions of the database, is to compare the statistics of
therelevant tables between the two. They should be similar. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.