Thread: Centos 6.9 and centos 7
Hello,
we recently upgrade OS from centos 6.9 to a new server with centos 7.
The centos 6.9 server has became the preproduction server now.
We are running postgres 9.6.6 on both servers.
They are both on SSD disk, these are the only differences :
- DB partition on centos 7 is on a RAID 10
- file system is xfs on centos 7 (ext4 in centos 6.9)
- more memory on the centos 7 (so params on the postgres.conf are higher)
max_connections = 220
shared_buffers = 10GB
effective_cache_size = 120GB
work_mem = 349525kB
maintenance_work_mem = 2GB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
- we have two replicas on the centos 7. One is async one is sync
synchronous_standby_names = '1 ( "****" )'
synchronous_commit = on
The have the same db inside, with same data.
Running the same script on the two servers will give different results.
Even a select query is faster on the centos 6.9 server. Half time on the
preprod server
centos 7 :
dbname=# \timing Timing is on. cmdv3=# SELECT id FROM
client_billing_account WHERE name = 'name'; id ------- ***** (1 row) Time:
3.884 ms
centos 6.9
dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id
------- ***** (1 row) Time: 1.620 ms
This table has 32148 records.
Do you think we can modify anything to achieve same performances?
I read about few kernel params :
kernel.sched_migration_cost_ns = 5000000
kernel.sched_autogroup_enabled = 0
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 1073741824
vm.zone_reclaim_mode = 0
vm.swappiness = 1.1
Is there anything you can advice to solve or identify the problem?
Thanks a lot,
Nicola
> centos 7 : > Time: 3.884 ms > > centos 6.9 Time: 1.620 ms > > > Is there anything you can advice to solve or identify the problem? Can you run this query 10 times on each server and note the timings? I'd like to see the reproducability of this. Also: both machines are otherwise idle (check with top or uptime)? Bye, Chris.
These are the timings in centos 7 :
Time: 4.248 ms
Time: 2.983 ms
Time: 3.027 ms
Time: 3.298 ms
Time: 4.420 ms
Time: 2.599 ms
Time: 2.555 ms
Time: 3.008 ms
Time: 6.220 ms
Time: 4.275 ms
Time: 2.841 ms
Time: 3.699 ms
Time: 3.387 ms
These are the timings in centos 6:
Time: 1.722 ms
Time: 1.670 ms
Time: 1.843 ms
Time: 1.823 ms
Time: 1.723 ms
Time: 1.724 ms
Time: 1.747 ms
Time: 1.734 ms
Time: 1.764 ms
Time: 1.622 ms
This is top on centos 6 :
[root@****]# top
top - 14:33:32 up 577 days, 23:08, 1 user, load average: 0.16, 0.11, 0.15
Tasks: 1119 total, 1 running, 1118 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 132040132k total, 129530504k used, 2509628k free, 108084k buffers
Swap: 11665404k total, 331404k used, 11334000k free, 124508916k cached
This is top on centos 7:
top - 14:35:38 up 73 days, 19:00, 6 users, load average: 22.46, 20.89,
20.54
Tasks: 821 total, 13 running, 807 sleeping, 0 stopped, 1 zombie
%Cpu(s): 14.2 us, 5.0 sy, 0.0 ni, 77.5 id, 3.1 wa, 0.0 hi, 0.2 si,
0.0 st
KiB Mem : 26383592+total, 4301464 free, 6250384 used, 25328406+buff/cache
KiB Swap: 16777212 total, 11798876 free, 4978336 used. 24497036+avail Mem
The production machine is obviously more accessed. But that does not seem
to be the problem as running the same query on the replica of the
production machine (same config of the master but not accessed by anyone)
gives the same bad result:
Time: 6.366 ms
2017-12-04 15:19 GMT+01:00 Chris Mair :
> centos 7 :
>> Time: 3.884 ms
>>
>> centos 6.9
>>
> Time: 1.620 ms
>
>>
>>
>> Is there anything you can advice to solve or identify the problem?
>>
>
> Can you run this query 10 times on each server and note the timings?
>
> I'd like to see the reproducability of this.
>
> Also: both machines are otherwise idle (check with top or uptime)?
>
> Bye,
> Chris.
>
>
>
To make a better testing, I used a third server.
This is identical to the centos 7 machine, and it is not included in the
replica cluster.
Nobody is accessing this machine, this is top :
top - 14:48:36 up 73 days, 17:39, 3 users, load average: 0.00, 0.01, 0.05
Tasks: 686 total, 1 running, 685 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem : 26383592+total, 1782196 free, 2731144 used, 25932257+buff/cache
KiB Swap: 16777212 total, 16298536 free, 478676 used. 21693456+avail Mem
These are timings :
Time: 2.841 ms
Time: 1.980 ms
Time: 2.240 ms
Time: 2.947 ms
Time: 2.828 ms
Time: 2.227 ms
Time: 1.998 ms
Time: 1.990 ms
Time: 2.643 ms
Time: 2.143 ms
Time: 2.919 ms
Time: 2.246 ms
I never got same results of the centos 6.9 machine.
2017-12-04 15:40 GMT+01:00 Nicola Contu :
> These are the timings in centos 7 :
>
> Time: 4.248 ms
> Time: 2.983 ms
> Time: 3.027 ms
> Time: 3.298 ms
> Time: 4.420 ms
> Time: 2.599 ms
> Time: 2.555 ms
> Time: 3.008 ms
> Time: 6.220 ms
> Time: 4.275 ms
> Time: 2.841 ms
> Time: 3.699 ms
> Time: 3.387 ms
>
>
> These are the timings in centos 6:
> Time: 1.722 ms
> Time: 1.670 ms
> Time: 1.843 ms
> Time: 1.823 ms
> Time: 1.723 ms
> Time: 1.724 ms
> Time: 1.747 ms
> Time: 1.734 ms
> Time: 1.764 ms
> Time: 1.622 ms
>
>
> This is top on centos 6 :
>
> [root@****]# top
> top - 14:33:32 up 577 days, 23:08, 1 user, load average: 0.16, 0.11, 0.15
> Tasks: 1119 total, 1 running, 1118 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 132040132k total, 129530504k used, 2509628k free, 108084k buffers
> Swap: 11665404k total, 331404k used, 11334000k free, 124508916k cached
>
> This is top on centos 7:
>
> top - 14:35:38 up 73 days, 19:00, 6 users, load average: 22.46, 20.89,
> 20.54
> Tasks: 821 total, 13 running, 807 sleeping, 0 stopped, 1 zombie
> %Cpu(s): 14.2 us, 5.0 sy, 0.0 ni, 77.5 id, 3.1 wa, 0.0 hi, 0.2 si,
> 0.0 st
> KiB Mem : 26383592+total, 4301464 free, 6250384 used, 25328406+buff/cache
> KiB Swap: 16777212 total, 11798876 free, 4978336 used. 24497036+avail Mem
>
>
> The production machine is obviously more accessed. But that does not seem
> to be the problem as running the same query on the replica of the
> production machine (same config of the master but not accessed by anyone)
> gives the same bad result:
> Time: 6.366 ms
>
>
> 2017-12-04 15:19 GMT+01:00 Chris Mair :
>
>> centos 7 :
>>> Time: 3.884 ms
>>>
>>> centos 6.9
>>>
>> Time: 1.620 ms
>>
>>>
>>>
>>> Is there anything you can advice to solve or identify the problem?
>>>
>>
>> Can you run this query 10 times on each server and note the timings?
>>
>> I'd like to see the reproducability of this.
>>
>> Also: both machines are otherwise idle (check with top or uptime)?
>>
>> Bye,
>> Chris.
>>
>>
>>
>
On 12/04/2017 02:19 PM, Nicola Contu wrote: ...> > centos 7 : > > dbname=# \timing Timing is on. cmdv3=# SELECT id FROM > client_billing_account WHERE name = 'name'; id ------- ***** (1 row) > Time: 3.884 ms > > centos 6.9 > > dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id > ------- ***** (1 row) Time: 1.620 ms > We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries. Are those VMs or bare metal? What CPUs and RAM are there? Have you checked that power management is disabled / cpufreq uses the same policy? That typically affects short CPU-bound queries. Other than that, I recommend performing basic system benchmarks (CPU, memory, ...) and only if those machines perform equally should you look for issues in PostgreSQL. Chances are the root cause is in hw or OS, in which case you need to address that first. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Did you run ANALYZE on your tables before the test? On 4 December 2017 at 16:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On 12/04/2017 02:19 PM, Nicola Contu wrote: > ...> >> centos 7 : >> >> dbname=# \timing Timing is on. cmdv3=# SELECT id FROM >> client_billing_account WHERE name = 'name'; id ------- ***** (1 row) >> Time: 3.884 ms >> >> centos 6.9 >> >> dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id >> ------- ***** (1 row) Time: 1.620 ms >> > > We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries. > > Are those VMs or bare metal? What CPUs and RAM are there? Have you > checked that power management is disabled / cpufreq uses the same > policy? That typically affects short CPU-bound queries. > > Other than that, I recommend performing basic system benchmarks (CPU, > memory, ...) and only if those machines perform equally should you look > for issues in PostgreSQL. Chances are the root cause is in hw or OS, in > which case you need to address that first. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
No I did not run a vacuum analyze. Do you want me to try with that first?
@Tomas:
Talking abut power management, I changed the profile for tuned-adm
to latency-performance instead of balanced (that is the default)
that is increasing performances for now and they are similar to centos 6.9.
Time: 2.121 ms
Time: 2.026 ms
Time: 1.664 ms
Time: 1.749 ms
Time: 1.656 ms
Time: 1.675 ms
Do you think this can be easily done in production as well?
2017-12-04 16:37 GMT+01:00 Alban Hertroys :
> Did you run ANALYZE on your tables before the test?
>
> On 4 December 2017 at 16:01, Tomas Vondra
> wrote:
> >
> > On 12/04/2017 02:19 PM, Nicola Contu wrote:
> > ...>
> >> centos 7 :
> >>
> >> dbname=# \timing Timing is on. cmdv3=# SELECT id FROM
> >> client_billing_account WHERE name = 'name'; id ------- ***** (1 row)
> >> Time: 3.884 ms
> >>
> >> centos 6.9
> >>
> >> dbname=# SELECT id FROM client_billing_account WHERE name = 'name'; id
> >> ------- ***** (1 row) Time: 1.620 ms
> >>
> >
> > We need to see EXPLAIN (ANALYZE,BUFFERS) for the queries.
> >
> > Are those VMs or bare metal? What CPUs and RAM are there? Have you
> > checked that power management is disabled / cpufreq uses the same
> > policy? That typically affects short CPU-bound queries.
> >
> > Other than that, I recommend performing basic system benchmarks (CPU,
> > memory, ...) and only if those machines perform equally should you look
> > for issues in PostgreSQL. Chances are the root cause is in hw or OS, in
> > which case you need to address that first.
> >
> > regards
> >
> > --
> > Tomas Vondra http://www.2ndQuadrant.com
> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> >
>
>
>
> --
> If you can't see the forest for the trees,
> Cut the trees and you'll see there is no forest.
>
On 12/04/2017 04:57 PM, Nicola Contu wrote: > No I did not run a vacuum analyze. Do you want me to try with that first? > > @Tomas: > Talking abut power management, I changed the profile for tuned-adm > to latency-performance instead of balanced (that is the default) > > that is increasing performances for now and they are similar to centos 6.9. > > Time: 2.121 ms > Time: 2.026 ms > Time: 1.664 ms > Time: 1.749 ms > Time: 1.656 ms > Time: 1.675 ms > > Do you think this can be easily done in production as well? > How am I supposed to know? Not only that depends on your internal deployment policies, but it's also much more a CentOS/RedHat question than PostgreSQL. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 4 Dec 2017, at 16:57, Nicola Contu <nicola.contu@gmail.com> wrote: > > No I did not run a vacuum analyze. Do you want me to try with that first? That means your statistics may not be up to date, although by now autovacuum should have done the job (you didn't turn thatoff or anything, did you?). Bad statistics result in non-optimal query plans and therefore could very well cause yourtiming differences. An easy way to verify, since you still have access to both versions of the database, is to compare the statistics of therelevant tables between the two. They should be similar. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.