Thread: [PERFORM] Speed differences between two servers

[PERFORM] Speed differences between two servers

From
Vincent Veyron
Date:
Hello,

I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure
thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much
fasterfor more typical SELECT and INSERT queries done by users. 

Here is the timing for the procedure :

#Kimsufi server
time psql -f myfile.sql mydb
real    0m12.585s
user    0m0.200s
sys    0m0.076s

#Online server
time psql -f myfile.sql mydb
real    1m15.410s
user    0m0.144s
sys    0m0.028s

As you can see, the Kimsufi server takes 12 seconds to complete the procedure, while the Online one needs 75 seconds.

For more usual queries however, the ratio is reversed, as shown by explain analyze for a typical query:

#Kimsufi server
marica=> explain (analyze, buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur,
noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized 
FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE
plainto_tsquery('vol')@@ tsv_libelle)  AND id_client = 13 ORDER BY 2 
;
                                                                            QUERY PLAN
                                          

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=543.29..543.56 rows=106 width=116) (actual time=19.870..19.885 rows=75 loops=1)
   Sort Key: t1.ref_dossier
   Sort Method: quicksort  Memory: 35kB
   Buffers: shared hit=689
   ->  Nested Loop  (cost=430.59..539.73 rows=106 width=116) (actual time=4.103..19.143 rows=75 loops=1)
         Buffers: shared hit=689
         ->  HashAggregate  (cost=430.31..430.49 rows=18 width=4) (actual time=2.077..2.266 rows=124 loops=1)
               Group Key: tblcontentieux_log.id_contentieux
               Buffers: shared hit=112
               ->  Bitmap Heap Scan on tblcontentieux_log  (cost=29.11..429.95 rows=142 width=4) (actual
time=0.712..1.550rows=147 loops=1) 
                     Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
                     Heap Blocks: exact=105
                     Buffers: shared hit=112
                     ->  Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx  (cost=0.00..29.07 rows=142 width=0)
(actualtime=0.632..0.632 rows=147 loops=1) 
                           Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
                           Buffers: shared hit=7
         ->  Index Scan using tblcontentieux_pkey on tblcontentieux t1  (cost=0.28..4.59 rows=1 width=116) (actual
time=0.018..0.019rows=1 loops=124) 
               Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux)
               Filter: (id_client = 13)
               Rows Removed by Filter: 0
               Buffers: shared hit=372
 Planning time: 3.666 ms
 Execution time: 20.176 ms

#Online server
marica=> explain (analyze,buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur,
noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized 
FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE
plainto_tsquery('vol')@@ tsv_libelle)  AND id_client = 13 ORDER BY 2; 
                                                                            QUERY PLAN
                                          

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=492.01..492.27 rows=104 width=116) (actual time=10.660..10.673 rows=75 loops=1)
   Sort Key: t1.ref_dossier
   Sort Method: quicksort  Memory: 35kB
   Buffers: shared hit=686
   ->  Nested Loop  (cost=390.14..488.52 rows=104 width=116) (actual time=1.363..10.066 rows=75 loops=1)
         Buffers: shared hit=686
         ->  HashAggregate  (cost=389.85..390.03 rows=18 width=4) (actual time=0.615..0.725 rows=124 loops=1)
               Group Key: tblcontentieux_log.id_contentieux
               Buffers: shared hit=109
               ->  Bitmap Heap Scan on tblcontentieux_log  (cost=13.08..389.51 rows=139 width=4) (actual
time=0.156..0.465rows=147 loops=1) 
                     Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
                     Heap Blocks: exact=106
                     Buffers: shared hit=109
                     ->  Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx  (cost=0.00..13.04 rows=139 width=0)
(actualtime=0.126..0.126 rows=147 loops=1) 
                           Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
                           Buffers: shared hit=3
         ->  Index Scan using tblcontentieux_pkey on tblcontentieux t1  (cost=0.28..4.02 rows=1 width=116) (actual
time=0.010..0.011rows=1 loops=124) 
               Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux)
               Filter: (id_client = 13)
               Rows Removed by Filter: 0
               Buffers: shared hit=372
 Planning time: 1.311 ms
 Execution time: 10.813 ms


Both are bare metal servers, with 4GB of RAM; the dataset is small (compressed dump is 3MB). The main differences that
Ifound are in disk I/O as shown by hdparm, and processor type : 

#Kimsufi server
hdparm -tT /dev/sda
 Timing cached reads:   1744 MB in  2.00 seconds = 872.16 MB/sec
 Timing buffered disk reads: 482 MB in  3.00 seconds = 160.48 MB/sec
Processor Intel(R) Atom(TM) CPU N2800   @ 1.86GHz (4 cores, cache size  : 512 KB)
Disk 2TB, 7200rpm, db on 500MB partition

#Online server
hdparm -tT /dev/sda
 Timing cached reads:   2854 MB in  2.00 seconds = 1427.05 MB/sec
 Timing buffered disk reads: 184 MB in  3.00 seconds =  61.26 MB/sec
Processor Intel(R) Atom(TM) CPU  C2350  @ 1.74GHz (2 cores, cache size  : 1024 KB)
Disk 1TB, 7200rpm, db on 1TB partition

I've created two pastebins with the output of the following commands for each server:
# hdparm /dev/sda
# hdparm -i /dev/sda
# df
# cat /proc/cpuinfo
# cat /proc/meminfo

#Kimsufi server
https://pastebin.com/3860hS92

#Online server
https://pastebin.com/FT1HFbD7


My questions:

-Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the
Onlineserver? 

-Why are regular queries much faster on this same server?




--
                    Bien à vous, Vincent Veyron

https://legalcase.libremen.com/
Legal case management software


Re: [PERFORM] Speed differences between two servers

From
Scott Marlowe
Date:
On Mon, May 8, 2017 at 11:49 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
> Hello,
>
> I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure
thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much
fasterfor more typical SELECT and INSERT queries done by users. 
>
> Here is the timing for the procedure :
>
> #Kimsufi server
> time psql -f myfile.sql mydb
> real    0m12.585s
> user    0m0.200s
> sys     0m0.076s
>
> #Online server
> time psql -f myfile.sql mydb
> real    1m15.410s
> user    0m0.144s
> sys     0m0.028s
>
> My questions:
>
> -Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the
Onlineserver? 

The most likely cause of the difference would be that one server IS
honoring fsync requests from the db and the other one isn't.

If you run pgbench on both (something simple like pgbench -c 1 -T 60,
aka one thread for 60 seconds) on a machine running on a 7200RPM hard
drive, you should get approximately 120 transactions per second, or
less, since that's how many times a second a disk spinning at that
speed can write out data. If you get say 110 on the slow machine and
800 on the fast one, there's the culprit, the fast machine is not
honoring fsync requests and is not crash-safe. I.e. if you start
writing to the db and pull the power plug out the back of the machine
it will likely power up with a corrupted database.

> -Why are regular queries much faster on this same server?

That's a whole nother subject. Most likely the faster machine can fit
the whole db in memory, or has much faster memory, or the whole
dataset is cached etc etc.

For now concentrate on figuring out of you've got an fsync problem. If
the data is just test data etc that you can afford to lose then you
can leave off fsync and not worry. But in production this is rarely
the case.


Re: [PERFORM] Speed differences between two servers

From
Vincent Veyron
Date:
On Mon, 8 May 2017 12:48:29 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:

Hi Scott,

Thank you for your input.

>
> The most likely cause of the difference would be that one server IS
> honoring fsync requests from the db and the other one isn't.
>
> If you run pgbench on both (something simple like pgbench -c 1 -T 60,
> aka one thread for 60 seconds) on a machine running on a 7200RPM hard
> drive, you should get approximately 120 transactions per second

Here are the results :

#Kimsufi
pgbench -c 1 -T 60 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 6618
latency average: 9.069 ms
tps = 110.270771 (including connections establishing)
tps = 110.283733 (excluding connections establishing)

#Online
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1150
latency average: 52.317 ms
tps = 19.114403 (including connections establishing)
tps = 19.115739 (excluding connections establishing)


>
> > -Why are regular queries much faster on this same server?
>
> That's a whole nother subject. Most likely the faster machine can fit
> the whole db in memory, or has much faster memory, or the whole
> dataset is cached etc etc.
>

The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server.

using 'dmidecode -t 17' :

#Kimsufi
Memory Device
    Array Handle: 0x0016
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: None
    Locator: SO DIMM 0
    Bank Locator: Channel A DIMM0
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1066 MHz
    Manufacturer: 0x0000000000000000
    Serial Number: 0x00000000
    Asset Tag: Unknown
    Part Number: 0x000000000000000000000000000000000000
    Rank: Unknown
    Configured Clock Speed: 1066 MHz

[repeated for second locator]

#Online
Memory Device
    Array Handle: 0x0015
    Error Information Handle: No Error
    Total Width: Unknown
    Data Width: Unknown
    Size: 4096 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM0
    Bank Locator: BANK 0
    Type: DDR3
    Type Detail: Synchronous Unbuffered (Unregistered)
    Speed: 1600 MHz
    Manufacturer: <BAD INDEX>
    Serial Number: <BAD INDEX>
    Asset Tag: <BAD INDEX>
    Part Number: <BAD INDEX>
    Rank: 1
    Configured Clock Speed: 1333 MHz
    Minimum voltage:  Unknown
    Maximum voltage:  Unknown
    Configured voltage:  Unknown




--
                    Bien à vous, Vincent Veyron

https://libremen.com
Logiciels de gestion, libres


Re: [PERFORM] Speed differences between two servers

From
Scott Marlowe
Date:
On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
> On Mon, 8 May 2017 12:48:29 -0600
> Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
> Hi Scott,
>
> Thank you for your input.
>
>>
>> The most likely cause of the difference would be that one server IS
>> honoring fsync requests from the db and the other one isn't.
>>
>> If you run pgbench on both (something simple like pgbench -c 1 -T 60,
>> aka one thread for 60 seconds) on a machine running on a 7200RPM hard
>> drive, you should get approximately 120 transactions per second
>
> Here are the results :
>
> #Kimsufi
> pgbench -c 1 -T 60 test
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 60 s
> number of transactions actually processed: 6618
> latency average: 9.069 ms
> tps = 110.270771 (including connections establishing)
> tps = 110.283733 (excluding connections establishing)

Just under 120, looks like fsync is working.

>
> #Online
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 60 s
> number of transactions actually processed: 1150
> latency average: 52.317 ms
> tps = 19.114403 (including connections establishing)
> tps = 19.115739 (excluding connections establishing)

OK that's horrendous. My mobile phone is likely faster. We need to
figure out why it's so slow. If it's in a RAID-1 set it might be
syncing.

>> > -Why are regular queries much faster on this same server?
>>
>> That's a whole nother subject. Most likely the faster machine can fit
>> the whole db in memory, or has much faster memory, or the whole
>> dataset is cached etc etc.
>>
>
> The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server.

Yeah it fits in memory. Select queries will only hit disk at bootup.

First machine
SNIP
>         Speed: 1066 MHz
SNIP
>         Configured Clock Speed: 1066 MHz

Second machine

>         Speed: 1600 MHz
SNIP
>         Configured Clock Speed: 1333 MHz

Yeah the second machine likely has a noticeably faster CPU than the
first as well. It's about two years younger so yeah it's probably just
cpu/mem that's making it fast.


Re: [PERFORM] Speed differences between two servers

From
Scott Marlowe
Date:
On Mon, May 8, 2017 at 5:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
>> On Mon, 8 May 2017 12:48:29 -0600
>> Scott Marlowe <scott.marlowe@gmail.com> wrote:

>>> > -Why are regular queries much faster on this same server?
>>>
>>> That's a whole nother subject. Most likely the faster machine can fit
>>> the whole db in memory, or has much faster memory, or the whole
>>> dataset is cached etc etc.
>>>
>>
>> The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server.
>
> Yeah it fits in memory. Select queries will only hit disk at bootup.
>
> First machine
> SNIP
>>         Speed: 1066 MHz
> SNIP
>>         Configured Clock Speed: 1066 MHz
>
> Second machine
>
>>         Speed: 1600 MHz
> SNIP
>>         Configured Clock Speed: 1333 MHz
>
> Yeah the second machine likely has a noticeably faster CPU than the
> first as well. It's about two years younger so yeah it's probably just
> cpu/mem that's making it fast.

OK went back and looked at your original post. I seems like those two
queries that are 10 and 20 ms have essentially the same plan on
similar sized dbs, so it's reasonable to assume the newer machine is
about twice as fast.

Without seeing what your test sql file does I have no idea what the
big difference in the other direction. You'll have to pull out and run
the individual queries, or turn on auto explain or something to see
the plans and compare. A lot of time it's just some simple tuning in
postgresql.conf or maybe a database got an alter database on it to
change something? Either way use show all; to compare settings and get
explain (analyze) off of the slow queries.


Re: [PERFORM] Speed differences between two servers

From
Vincent Veyron
Date:
On Mon, 8 May 2017 17:35:38 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Without seeing what your test sql file does I have no idea what the
> big difference in the other direction.

It truncates 59 tables, copies data back from a set of text files, inserts a few single records and does a few select
setval('')to reset the serial columns. 

here it is :
https://pastebin.com/LVsvFzkj

>You'll have to pull out and run
> the individual queries, or turn on auto explain or something to see
> the plans and compare.

I used log_duration; it shows that the truncate and all the \copy are much slower, while all insert/select statements
aretwice as fast 

>A lot of time it's just some simple tuning in
> postgresql.conf or maybe a database got an alter database on it to
> change something?

Server setups are identical : same software, same configurations, same databases.

I've put in a ticket at the Online provider with the data to see if they have an answer (now 14H00 in Paris, so they
maytake a while to respond) 


--
                    Bien à vous, Vincent Veyron

https://compta.libremen.com
Logiciel libre de comptabilité générale en partie double



Re: [PERFORM] Speed differences between two servers

From
Vincent Veyron
Date:
Well, the response to the ticket was quite fast :

https://status.online.net/index.php?do=details&task_id=720

Here's the stated cause :

>Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis.

>The fans create vibrations amplifying errors on the discs.

Now on to decide whether I'm waiting for the fix or re-building a new box...

Thanks a bunch for your help.


--
                    Bien à vous, Vincent Veyron

https://libremen.com
Logiciels de gestion, libres


Re: [PERFORM] Speed differences between two servers

From
Scott Marlowe
Date:
On Tue, May 9, 2017 at 7:08 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
>
> Well, the response to the ticket was quite fast :
>
> https://status.online.net/index.php?do=details&task_id=720
>
> Here's the stated cause :
>
>>Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis.
>
>>The fans create vibrations amplifying errors on the discs.
>
> Now on to decide whether I'm waiting for the fix or re-building a new box...
>
> Thanks a bunch for your help.

You're welcome, any time.

As for the hard drives, can you upgrade to a pair of SSDs? If your
data set fits on (and will continue to fit on) SSDs, the performance
gained from SSDs is HUGE and worth a few hundred extra for the drive.
Note that you want to use the Intel enterprise stuff that survives
power loss, not the cheap low end SSDs. May be an easy fix for your
hosting company and a big performance gain.


Re: [PERFORM] Speed differences between two servers

From
Vincent Veyron
Date:
On Tue, 9 May 2017 10:24:20 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
> As for the hard drives, can you upgrade to a pair of SSDs? If your
> data set fits on (and will continue to fit on) SSDs, the performance
> gained from SSDs is HUGE and worth a few hundred extra for the drive.
> Note that you want to use the Intel enterprise stuff that survives
> power loss, not the cheap low end SSDs. May be an easy fix for your
> hosting company and a big performance gain.
>

Sure, but I'm getting plenty of performance already : my little machines can serve 40 requests/second with 6 or 7
queriesper request. So I'm fine for a while. 

You can see for yourself if you enter the demo account for the site in my sig, and click in a couple of files (it's the
databasethat gets re-created by the procedure I mentionned in my original post) 


--
                    Bien à vous, Vincent Veyron

https://legalcase.libremen.com/
Legal case, contract and insurance claim management software