Thread: [PERFORM] Speed differences between two servers
Hello, I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much fasterfor more typical SELECT and INSERT queries done by users. Here is the timing for the procedure : #Kimsufi server time psql -f myfile.sql mydb real 0m12.585s user 0m0.200s sys 0m0.076s #Online server time psql -f myfile.sql mydb real 1m15.410s user 0m0.144s sys 0m0.028s As you can see, the Kimsufi server takes 12 seconds to complete the procedure, while the Online one needs 75 seconds. For more usual queries however, the ratio is reversed, as shown by explain analyze for a typical query: #Kimsufi server marica=> explain (analyze, buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur, noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE plainto_tsquery('vol')@@ tsv_libelle) AND id_client = 13 ORDER BY 2 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=543.29..543.56 rows=106 width=116) (actual time=19.870..19.885 rows=75 loops=1) Sort Key: t1.ref_dossier Sort Method: quicksort Memory: 35kB Buffers: shared hit=689 -> Nested Loop (cost=430.59..539.73 rows=106 width=116) (actual time=4.103..19.143 rows=75 loops=1) Buffers: shared hit=689 -> HashAggregate (cost=430.31..430.49 rows=18 width=4) (actual time=2.077..2.266 rows=124 loops=1) Group Key: tblcontentieux_log.id_contentieux Buffers: shared hit=112 -> Bitmap Heap Scan on tblcontentieux_log (cost=29.11..429.95 rows=142 width=4) (actual time=0.712..1.550rows=147 loops=1) Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle) Heap Blocks: exact=105 Buffers: shared hit=112 -> Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx (cost=0.00..29.07 rows=142 width=0) (actualtime=0.632..0.632 rows=147 loops=1) Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle) Buffers: shared hit=7 -> Index Scan using tblcontentieux_pkey on tblcontentieux t1 (cost=0.28..4.59 rows=1 width=116) (actual time=0.018..0.019rows=1 loops=124) Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux) Filter: (id_client = 13) Rows Removed by Filter: 0 Buffers: shared hit=372 Planning time: 3.666 ms Execution time: 20.176 ms #Online server marica=> explain (analyze,buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur, noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE plainto_tsquery('vol')@@ tsv_libelle) AND id_client = 13 ORDER BY 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=492.01..492.27 rows=104 width=116) (actual time=10.660..10.673 rows=75 loops=1) Sort Key: t1.ref_dossier Sort Method: quicksort Memory: 35kB Buffers: shared hit=686 -> Nested Loop (cost=390.14..488.52 rows=104 width=116) (actual time=1.363..10.066 rows=75 loops=1) Buffers: shared hit=686 -> HashAggregate (cost=389.85..390.03 rows=18 width=4) (actual time=0.615..0.725 rows=124 loops=1) Group Key: tblcontentieux_log.id_contentieux Buffers: shared hit=109 -> Bitmap Heap Scan on tblcontentieux_log (cost=13.08..389.51 rows=139 width=4) (actual time=0.156..0.465rows=147 loops=1) Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle) Heap Blocks: exact=106 Buffers: shared hit=109 -> Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx (cost=0.00..13.04 rows=139 width=0) (actualtime=0.126..0.126 rows=147 loops=1) Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle) Buffers: shared hit=3 -> Index Scan using tblcontentieux_pkey on tblcontentieux t1 (cost=0.28..4.02 rows=1 width=116) (actual time=0.010..0.011rows=1 loops=124) Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux) Filter: (id_client = 13) Rows Removed by Filter: 0 Buffers: shared hit=372 Planning time: 1.311 ms Execution time: 10.813 ms Both are bare metal servers, with 4GB of RAM; the dataset is small (compressed dump is 3MB). The main differences that Ifound are in disk I/O as shown by hdparm, and processor type : #Kimsufi server hdparm -tT /dev/sda Timing cached reads: 1744 MB in 2.00 seconds = 872.16 MB/sec Timing buffered disk reads: 482 MB in 3.00 seconds = 160.48 MB/sec Processor Intel(R) Atom(TM) CPU N2800 @ 1.86GHz (4 cores, cache size : 512 KB) Disk 2TB, 7200rpm, db on 500MB partition #Online server hdparm -tT /dev/sda Timing cached reads: 2854 MB in 2.00 seconds = 1427.05 MB/sec Timing buffered disk reads: 184 MB in 3.00 seconds = 61.26 MB/sec Processor Intel(R) Atom(TM) CPU C2350 @ 1.74GHz (2 cores, cache size : 1024 KB) Disk 1TB, 7200rpm, db on 1TB partition I've created two pastebins with the output of the following commands for each server: # hdparm /dev/sda # hdparm -i /dev/sda # df # cat /proc/cpuinfo # cat /proc/meminfo #Kimsufi server https://pastebin.com/3860hS92 #Online server https://pastebin.com/FT1HFbD7 My questions: -Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the Onlineserver? -Why are regular queries much faster on this same server? -- Bien à vous, Vincent Veyron https://legalcase.libremen.com/ Legal case management software
On Mon, May 8, 2017 at 11:49 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > Hello, > > I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much fasterfor more typical SELECT and INSERT queries done by users. > > Here is the timing for the procedure : > > #Kimsufi server > time psql -f myfile.sql mydb > real 0m12.585s > user 0m0.200s > sys 0m0.076s > > #Online server > time psql -f myfile.sql mydb > real 1m15.410s > user 0m0.144s > sys 0m0.028s > > My questions: > > -Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the Onlineserver? The most likely cause of the difference would be that one server IS honoring fsync requests from the db and the other one isn't. If you run pgbench on both (something simple like pgbench -c 1 -T 60, aka one thread for 60 seconds) on a machine running on a 7200RPM hard drive, you should get approximately 120 transactions per second, or less, since that's how many times a second a disk spinning at that speed can write out data. If you get say 110 on the slow machine and 800 on the fast one, there's the culprit, the fast machine is not honoring fsync requests and is not crash-safe. I.e. if you start writing to the db and pull the power plug out the back of the machine it will likely power up with a corrupted database. > -Why are regular queries much faster on this same server? That's a whole nother subject. Most likely the faster machine can fit the whole db in memory, or has much faster memory, or the whole dataset is cached etc etc. For now concentrate on figuring out of you've got an fsync problem. If the data is just test data etc that you can afford to lose then you can leave off fsync and not worry. But in production this is rarely the case.
On Mon, 8 May 2017 12:48:29 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: Hi Scott, Thank you for your input. > > The most likely cause of the difference would be that one server IS > honoring fsync requests from the db and the other one isn't. > > If you run pgbench on both (something simple like pgbench -c 1 -T 60, > aka one thread for 60 seconds) on a machine running on a 7200RPM hard > drive, you should get approximately 120 transactions per second Here are the results : #Kimsufi pgbench -c 1 -T 60 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 6618 latency average: 9.069 ms tps = 110.270771 (including connections establishing) tps = 110.283733 (excluding connections establishing) #Online starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 1150 latency average: 52.317 ms tps = 19.114403 (including connections establishing) tps = 19.115739 (excluding connections establishing) > > > -Why are regular queries much faster on this same server? > > That's a whole nother subject. Most likely the faster machine can fit > the whole db in memory, or has much faster memory, or the whole > dataset is cached etc etc. > The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server. using 'dmidecode -t 17' : #Kimsufi Memory Device Array Handle: 0x0016 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: None Locator: SO DIMM 0 Bank Locator: Channel A DIMM0 Type: DDR3 Type Detail: Synchronous Speed: 1066 MHz Manufacturer: 0x0000000000000000 Serial Number: 0x00000000 Asset Tag: Unknown Part Number: 0x000000000000000000000000000000000000 Rank: Unknown Configured Clock Speed: 1066 MHz [repeated for second locator] #Online Memory Device Array Handle: 0x0015 Error Information Handle: No Error Total Width: Unknown Data Width: Unknown Size: 4096 MB Form Factor: DIMM Set: None Locator: DIMM0 Bank Locator: BANK 0 Type: DDR3 Type Detail: Synchronous Unbuffered (Unregistered) Speed: 1600 MHz Manufacturer: <BAD INDEX> Serial Number: <BAD INDEX> Asset Tag: <BAD INDEX> Part Number: <BAD INDEX> Rank: 1 Configured Clock Speed: 1333 MHz Minimum voltage: Unknown Maximum voltage: Unknown Configured voltage: Unknown -- Bien à vous, Vincent Veyron https://libremen.com Logiciels de gestion, libres
On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > On Mon, 8 May 2017 12:48:29 -0600 > Scott Marlowe <scott.marlowe@gmail.com> wrote: > > Hi Scott, > > Thank you for your input. > >> >> The most likely cause of the difference would be that one server IS >> honoring fsync requests from the db and the other one isn't. >> >> If you run pgbench on both (something simple like pgbench -c 1 -T 60, >> aka one thread for 60 seconds) on a machine running on a 7200RPM hard >> drive, you should get approximately 120 transactions per second > > Here are the results : > > #Kimsufi > pgbench -c 1 -T 60 test > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 6618 > latency average: 9.069 ms > tps = 110.270771 (including connections establishing) > tps = 110.283733 (excluding connections establishing) Just under 120, looks like fsync is working. > > #Online > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 1150 > latency average: 52.317 ms > tps = 19.114403 (including connections establishing) > tps = 19.115739 (excluding connections establishing) OK that's horrendous. My mobile phone is likely faster. We need to figure out why it's so slow. If it's in a RAID-1 set it might be syncing. >> > -Why are regular queries much faster on this same server? >> >> That's a whole nother subject. Most likely the faster machine can fit >> the whole db in memory, or has much faster memory, or the whole >> dataset is cached etc etc. >> > > The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server. Yeah it fits in memory. Select queries will only hit disk at bootup. First machine SNIP > Speed: 1066 MHz SNIP > Configured Clock Speed: 1066 MHz Second machine > Speed: 1600 MHz SNIP > Configured Clock Speed: 1333 MHz Yeah the second machine likely has a noticeably faster CPU than the first as well. It's about two years younger so yeah it's probably just cpu/mem that's making it fast.
On Mon, May 8, 2017 at 5:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: >> On Mon, 8 May 2017 12:48:29 -0600 >> Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> > -Why are regular queries much faster on this same server? >>> >>> That's a whole nother subject. Most likely the faster machine can fit >>> the whole db in memory, or has much faster memory, or the whole >>> dataset is cached etc etc. >>> >> >> The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server. > > Yeah it fits in memory. Select queries will only hit disk at bootup. > > First machine > SNIP >> Speed: 1066 MHz > SNIP >> Configured Clock Speed: 1066 MHz > > Second machine > >> Speed: 1600 MHz > SNIP >> Configured Clock Speed: 1333 MHz > > Yeah the second machine likely has a noticeably faster CPU than the > first as well. It's about two years younger so yeah it's probably just > cpu/mem that's making it fast. OK went back and looked at your original post. I seems like those two queries that are 10 and 20 ms have essentially the same plan on similar sized dbs, so it's reasonable to assume the newer machine is about twice as fast. Without seeing what your test sql file does I have no idea what the big difference in the other direction. You'll have to pull out and run the individual queries, or turn on auto explain or something to see the plans and compare. A lot of time it's just some simple tuning in postgresql.conf or maybe a database got an alter database on it to change something? Either way use show all; to compare settings and get explain (analyze) off of the slow queries.
On Mon, 8 May 2017 17:35:38 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: > Without seeing what your test sql file does I have no idea what the > big difference in the other direction. It truncates 59 tables, copies data back from a set of text files, inserts a few single records and does a few select setval('')to reset the serial columns. here it is : https://pastebin.com/LVsvFzkj >You'll have to pull out and run > the individual queries, or turn on auto explain or something to see > the plans and compare. I used log_duration; it shows that the truncate and all the \copy are much slower, while all insert/select statements aretwice as fast >A lot of time it's just some simple tuning in > postgresql.conf or maybe a database got an alter database on it to > change something? Server setups are identical : same software, same configurations, same databases. I've put in a ticket at the Online provider with the data to see if they have an answer (now 14H00 in Paris, so they maytake a while to respond) -- Bien à vous, Vincent Veyron https://compta.libremen.com Logiciel libre de comptabilité générale en partie double
Well, the response to the ticket was quite fast : https://status.online.net/index.php?do=details&task_id=720 Here's the stated cause : >Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis. >The fans create vibrations amplifying errors on the discs. Now on to decide whether I'm waiting for the fix or re-building a new box... Thanks a bunch for your help. -- Bien à vous, Vincent Veyron https://libremen.com Logiciels de gestion, libres
On Tue, May 9, 2017 at 7:08 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > > Well, the response to the ticket was quite fast : > > https://status.online.net/index.php?do=details&task_id=720 > > Here's the stated cause : > >>Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis. > >>The fans create vibrations amplifying errors on the discs. > > Now on to decide whether I'm waiting for the fix or re-building a new box... > > Thanks a bunch for your help. You're welcome, any time. As for the hard drives, can you upgrade to a pair of SSDs? If your data set fits on (and will continue to fit on) SSDs, the performance gained from SSDs is HUGE and worth a few hundred extra for the drive. Note that you want to use the Intel enterprise stuff that survives power loss, not the cheap low end SSDs. May be an easy fix for your hosting company and a big performance gain.
On Tue, 9 May 2017 10:24:20 -0600 Scott Marlowe <scott.marlowe@gmail.com> wrote: > > As for the hard drives, can you upgrade to a pair of SSDs? If your > data set fits on (and will continue to fit on) SSDs, the performance > gained from SSDs is HUGE and worth a few hundred extra for the drive. > Note that you want to use the Intel enterprise stuff that survives > power loss, not the cheap low end SSDs. May be an easy fix for your > hosting company and a big performance gain. > Sure, but I'm getting plenty of performance already : my little machines can serve 40 requests/second with 6 or 7 queriesper request. So I'm fine for a while. You can see for yourself if you enter the demo account for the site in my sig, and click in a couple of files (it's the databasethat gets re-created by the procedure I mentionned in my original post) -- Bien à vous, Vincent Veyron https://legalcase.libremen.com/ Legal case, contract and insurance claim management software