Thread: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks
---------- Forwarded Message ---------- Subject: Re: [PERFORM] MySQL vs PG TPC-H benchmarks Date: Wed, 21 Apr 2004 13:55:21 +0100 From: Paul Thomas <paul@tmsl.demon.co.uk> To: Nick Barr <nicky@chuckie.co.uk> Cc: "pgsql-performance @ postgresql . org" <pgsql-performance@postgresql.org> On 21/04/2004 09:31 Nick Barr wrote: > Hi, > > Has anyone had a look at: > > http://people.ac.upc.es/zgomez/ > > I realize that MySQL & PG cannot really be compared (especially when you > consider the issues that MySQL has with things like data integrity) but > still surely PG would perform better than the stats show (i.e. #7 31.28 > seconds versus 42 minutes!!!). Looks like he's using the default postgresql.conf settings in which case I'm not suprised at pg looking so slow. His stated use of foreign keys invalidates the tests anyway as MyISAM tables don't support FKs so we're probably seeing FK check overheads in pg that are simply ignore by MySQL. In an honest test, MySQL should be reported as failing those tests. Perhaps one of the advocay team will pick up the batton? > On a side note it certainly looks like linux kernel 2.6 is quite a bit > faster in comparision to 2.4. Yes, I've seen other benchmarks which also show that. -- Paul Thomas +------------------------------+--------------------------------------------- + | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+--------------------------------------------- + ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ------------------------------------------------------- -- Josh Berkus Aglio Database Solutions San Francisco
Folks, I've sent a polite e-mail to Mr. Gomez offering our help. Please, nobody flame him! -- Josh Berkus Aglio Database Solutions San Francisco
> Perhaps one of the advocay team will pick up the batton? He is using COPY to load the data...I can't think of any earthly reason why it takes > 1d to load 10gb data...probabaly all boils down to default shared buffer setting. I don't even really consider this 'optimizing', just basic configuring to match the software to the machine. Also, the create table statements do not have primary/foreign key definitions...from his comments on the results page it's not clear if this is intentional... If RI is set up properly it may explain why the results are off. Perhaps the data generating app is not functioning properly in some way. ( this might explain the tpc errors as well ). The fact that his results are not returning correct row count is setting off warning bells. Most of the use cases are relatively simple joins, actually. Maybe one of the key columns is all nulls, or some similar strangeness. It would be useful to know if his server is I/O or cpu bound. My guess is that the server swapping stuff all while... Running ANALYZE after import can work wonders...or does it? I don't usually use COPY to do the import. Perhaps create indexes/constraints after import? Some explains might be helpful. Still, shared buffers is the first thing to look at. Maybe if I get around to it, I'll try the tpc-h out here. Merlin
Josh Berkus wrote: > Folks, > > I've sent a polite e-mail to Mr. Gomez offering our help. Please, nobody > flame him! > Please keep in mind that the entire test has, other than a similar database schema and query types maybe, nothing to do with a TPC-H. I don't see any kind of SUT. Foreign key support on the DB level is not required by any of the TPC benchmarks. But the System Under Test, which is the combination of middleware application and database together with all computers and network components these parts are running on, must implement all the required semantics, like ACID properties, referential integrity &c. One could implement a TPC-H with flat files, it's just a major pain in the middleware. A proper TPC benchmark implementation would for example be a complete PHP+DB application, where the user interaction is done by an emulated "browser" and what is measured is the http response times, not anything going on between PHP and the DB. Assuming that all requirements of the TPC specification are implemented by either using available DB features, or including appropriate workarounds in the PHP code, that would very well lead to something that can compare PHP+MySQL vs. PHP+PostgreSQL. All TPC benchmarks I have seen are performed by timing such a system after a considerable rampup time, giving the DB system a chance to properly populate caches and so forth. Rebooting the machine just before the test is the wrong thing here and will especially kill any advanced cache algorithms like ARC. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Wed, 21 Apr 2004, Merlin Moncure wrote: > > > Perhaps one of the advocay team will pick up the batton? > He is using COPY to load the data...I can't think of any earthly reason > why it takes > 1d to load 10gb data...probabaly all boils down to > default shared buffer setting. I don't even really consider this > 'optimizing', just basic configuring to match the software to the > machine. The indexes should be created after the data is loaded. There is no mention of VACUUM, ANALYZE or tuning. I've generated the work load over night and am going to have a quick look at it when I get a minute. Gavin
On Thu, 22 Apr 2004, Gavin Sherry wrote: > On Wed, 21 Apr 2004, Merlin Moncure wrote: > > > > > > Perhaps one of the advocay team will pick up the batton? > > He is using COPY to load the data...I can't think of any earthly reason > > why it takes > 1d to load 10gb data...probabaly all boils down to > > default shared buffer setting. I don't even really consider this > > 'optimizing', just basic configuring to match the software to the > > machine. > > The indexes should be created after the data is loaded. There is no > mention of VACUUM, ANALYZE or tuning. > > I've generated the work load over night and am going to have a quick look > at it when I get a minute. With a 10 GB work load on an old Duron, 512 MB ram, IDE disk, it took 30 odd minutes to load the data and 9 hours to load indexes. Unfortunately I'm out of disk! I'll run the 1 GB work load instead when I get some time but I can imagine that an ANALYZE will make a dramatic different to those query times. Gavin
Gavin Sherry wrote: > With a 10 GB work load on an old Duron, 512 MB ram, IDE disk, > it took 30 > odd minutes to load the data and 9 hours to load indexes. > Unfortunately > I'm out of disk! I'll run the 1 GB work load instead when I > get some time > but I can imagine that an ANALYZE will make a dramatic > different to those > query times. Heh...can you estimate how much space is required? Merlin
...and on Thu, Apr 22, 2004 at 05:53:18AM -0700, Eduardo Almeida used the keyboard: > > - The configuration of the machine is: > Dual opteron 64 bits model 240 > 4GB RAM > 960 GB on RAID 0 > Mandrake Linux 64 with Kernel 2.6.5 (I compiled a > kernel for this test) > Java SDK java version "1.4.2_04" > PostgreSQL JDBC pg74.1jdbc3.jar > > - The TPC-H configuration is: > TPC-H 2.0.0 > 100GB > load using flat files > Refresh functions using java > I'll just add for the reference, to those that aren't aware of it, the Java virtual machine for x86_64 only exists in the 1.5 branch so far, and it's so utterly unstable that most every notable shuffling around in the memory crashes it. :) Hence the 1.4.2_04 is a 32-bit application running in 32-bit mode. I won't be getting into how much this affects the benchmarks as I didn't really get into how CPU- and memory-intensive the refresh functions are in these, so as I said - let's keep it a reference. Cheers, -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/
Attachment
Folks, I�m doing the 100GB TPC-H and I�ll show the previous results to our community (Postgres) in 3 weeks before finishing the study. My intention is to carry through a test with a VLDB in a low cost platform (PostgreSQL, Linux and cheap HW) and not to compare with another DBMS. So far I can tell you that the load time on PG 7.4.2 with kernel 2.6.5 on Opteron 64 model 240 in RAID 0 with 8 disks (960 GB) loaded the database in less than 24 hours. About 7hs:30min to load the data and 16:09:25 to create the indexes The Power test still running and that�s why I�ll not present anything so far. Now I�ll just send to the list my environment configuration. - The configuration of the machine is: Dual opteron 64 bits model 240 4GB RAM 960 GB on RAID 0 Mandrake Linux 64 with Kernel 2.6.5 (I compiled a kernel for this test) Java SDK java version "1.4.2_04" PostgreSQL JDBC pg74.1jdbc3.jar - The TPC-H configuration is: TPC-H 2.0.0 100GB load using flat files Refresh functions using java - The PostgreSQL 7.4.2 configuration is: add_missing_from | on australian_timezones | off authentication_timeout | 60 check_function_bodies | on checkpoint_segments | 128 checkpoint_timeout | 300 checkpoint_warning | 30 client_encoding | SQL_ASCII client_min_messages | notice commit_delay | 0 commit_siblings | 5 cpu_index_tuple_cost | 0.001 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 DateStyle | ISO, MDY db_user_namespace | off deadlock_timeout | 1000 debug_pretty_print | off debug_print_parse | off debug_print_plan | off debug_print_rewritten | off default_statistics_target | 10 default_transaction_isolation | read committed default_transaction_read_only | off dynamic_library_path | $libdir effective_cache_size | 150000 enable_hashagg | on enable_hashjoin | on enable_indexscan | on enable_mergejoin | on enable_nestloop | on enable_seqscan | on enable_sort | on enable_tidscan | on explain_pretty_print | on extra_float_digits | 0 from_collapse_limit | 8 fsync | off geqo | on geqo_effort | 1 geqo_generations | 0 geqo_pool_size | 0 geqo_selection_bias | 2 geqo_threshold | 11 join_collapse_limit | 8 krb_server_keyfile | unset lc_collate | en_US lc_ctype | en_US lc_messages | C lc_monetary | C lc_numeric | C lc_time | C log_connections | off log_duration | off log_error_verbosity | default log_executor_stats | off log_hostname | off log_min_duration_statement | -1 log_min_error_statement | panic log_min_messages | notice log_parser_stats | off log_pid | off log_planner_stats | off log_source_port | off log_statement | off log_statement_stats | off log_timestamp | off max_connections | 10 max_expr_depth | 10000 max_files_per_process | 1000 max_fsm_pages | 20000 max_fsm_relations | 1000 max_locks_per_transaction | 64 password_encryption | on port | 5432 pre_auth_delay | 0 preload_libraries | unset random_page_cost | 1.25 regex_flavor | advanced rendezvous_name | unset search_path | $user,public server_encoding | SQL_ASCII server_version | 7.4.2 shared_buffers | 40000 silent_mode | off sort_mem | 65536 sql_inheritance | on ssl | off statement_timeout | 10000000 stats_block_level | off stats_command_string | off stats_reset_on_server_start | on stats_row_level | off stats_start_collector | on superuser_reserved_connections | 2 syslog | 0 syslog_facility | LOCAL0 syslog_ident | postgres tcpip_socket | on TimeZone | unknown trace_notify | off transaction_isolation | read committed transaction_read_only | off transform_null_equals | off unix_socket_directory | unset unix_socket_group | unset unix_socket_permissions | 511 vacuum_mem | 65536 virtual_host | unset wal_buffers | 32 wal_debug | 0 wal_sync_method | fdatasync zero_damaged_pages | off (113 rows) suggestions, doubts and commentaries are very welcome regards ______________________________ Eduardo Cunha de Almeida Administra��o de Banco de Dados UFPR - CCE +55-41-361-3321 eduardo.almeida@ufpr.br edalmeida@yahoo.com --- Jan Wieck <JanWieck@Yahoo.com> wrote: > Josh Berkus wrote: > > > Folks, > > > > I've sent a polite e-mail to Mr. Gomez offering > our help. Please, nobody > > flame him! > > > > Please keep in mind that the entire test has, other > than a similar > database schema and query types maybe, nothing to do > with a TPC-H. I > don't see any kind of SUT. Foreign key support on > the DB level is not > required by any of the TPC benchmarks. But the > System Under Test, which > is the combination of middleware application and > database together with > all computers and network components these parts are > running on, must > implement all the required semantics, like ACID > properties, referential > integrity &c. One could implement a TPC-H with flat > files, it's just a > major pain in the middleware. > > A proper TPC benchmark implementation would for > example be a complete > PHP+DB application, where the user interaction is > done by an emulated > "browser" and what is measured is the http response > times, not anything > going on between PHP and the DB. Assuming that all > requirements of the > TPC specification are implemented by either using > available DB features, > or including appropriate workarounds in the PHP > code, that would very > well lead to something that can compare PHP+MySQL > vs. PHP+PostgreSQL. > > All TPC benchmarks I have seen are performed by > timing such a system > after a considerable rampup time, giving the DB > system a chance to > properly populate caches and so forth. Rebooting the > machine just before > the test is the wrong thing here and will especially > kill any advanced > cache algorithms like ARC. > > > Jan > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong > than for being right. # > # Let's break this rule - forgive me. > # > #================================================== > JanWieck@Yahoo.com # > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > > http://www.postgresql.org/docs/faqs/FAQ.htmlIP 5: > Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html __________________________________ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25� http://photos.yahoo.com/ph/print_splash
Grega, That�s why I used java 32bits and needed to compile the kernel 2.6.5 with the 32bits modules. To reference, Sun has java 64bits just to IA64 and Solaris Sparc 64 not to Opteron. regards, Eduardo --- Grega Bremec <grega.bremec@noviforum.si> wrote: > ...and on Thu, Apr 22, 2004 at 05:53:18AM -0700, > Eduardo Almeida used the keyboard: > > > > - The configuration of the machine is: > > Dual opteron 64 bits model 240 > > 4GB RAM > > 960 GB on RAID 0 > > Mandrake Linux 64 with Kernel 2.6.5 (I compiled a > > kernel for this test) > > Java SDK java version "1.4.2_04" > > PostgreSQL JDBC pg74.1jdbc3.jar > > > > - The TPC-H configuration is: > > TPC-H 2.0.0 > > 100GB > > load using flat files > > Refresh functions using java > > > > I'll just add for the reference, to those that > aren't aware of it, the Java > virtual machine for x86_64 only exists in the 1.5 > branch so far, and it's so > utterly unstable that most every notable shuffling > around in the memory > crashes it. :) > > Hence the 1.4.2_04 is a 32-bit application running > in 32-bit mode. > > I won't be getting into how much this affects the > benchmarks as I didn't > really get into how CPU- and memory-intensive the > refresh functions are in > these, so as I said - let's keep it a reference. > > Cheers, > -- > Grega Bremec > Senior Administrator > Noviforum Ltd., Software & Media > http://www.noviforum.si/ > > ATTACHMENT part 2 application/pgp-signature __________________________________ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25� http://photos.yahoo.com/ph/print_splash
...and on Thu, Apr 22, 2004 at 06:59:10AM -0700, Eduardo Almeida used the keyboard: <snip> > > To reference, Sun has java 64bits just to IA64 and > Solaris Sparc 64 not to Opteron. > As I mentioned, that is true for the 1.4.x release of the JVMs. We have been testing some JCA builds of 1.5.0 on x86_64 so far, but it is too unstable for any kind of serious work. Cheers, -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/