Thread: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Josh Berkus

Date:

21 April 2004, 14:32:00

----------  Forwarded Message  ----------

Subject: Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Date: Wed, 21 Apr 2004 13:55:21 +0100
From: Paul Thomas <paul@tmsl.demon.co.uk>
To: Nick Barr <nicky@chuckie.co.uk>
Cc: "pgsql-performance @ postgresql . org" <pgsql-performance@postgresql.org>

On 21/04/2004 09:31 Nick Barr wrote:
> Hi,
>
> Has anyone had a look at:
>
> http://people.ac.upc.es/zgomez/
>
> I realize that MySQL & PG cannot really be compared (especially when you
> consider the issues that MySQL has with things like data integrity) but
> still surely PG would perform better than the stats show (i.e. #7 31.28
> seconds versus 42 minutes!!!).

Looks like he's using the default postgresql.conf settings in which case
I'm not suprised at pg looking so slow. His stated use of foreign keys
invalidates the tests anyway as MyISAM tables don't support FKs so we're
probably seeing FK check overheads in pg that are simply ignore by MySQL.
In an honest test, MySQL should be reported as failing those tests.

Perhaps one of the advocay team will pick up the batton?

> On a side note it certainly looks like linux kernel 2.6 is quite a bit
> faster in comparision to 2.4.

Yes, I've seen other benchmarks which also show that.

--
Paul Thomas
+------------------------------+---------------------------------------------
+

| Thomas Micro Systems Limited | Software Solutions for

Business             |

| Computer Consultants         |

http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------
+

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-------------------------------------------------------

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Josh Berkus

Date:

21 April 2004, 14:47:37

Folks,

I've sent a polite e-mail to Mr. Gomez offering our help.  Please, nobody
flame him!

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

"Merlin Moncure"

Date:

21 April 2004, 15:49:55

> Perhaps one of the advocay team will pick up the batton?
He is using COPY to load the data...I can't think of any earthly reason
why it takes > 1d to load 10gb data...probabaly all boils down to
default shared buffer setting.  I don't even really consider this
'optimizing', just basic configuring to match the software to the
machine.

Also, the create table statements do not have primary/foreign key
definitions...from his comments on the results page it's not clear if
this is intentional...

If RI is set up properly it may explain why the results are off.
Perhaps the data generating app is not functioning properly in some way.
( this might explain the tpc errors as well ).  The fact that his
results are not returning correct row count is setting off warning
bells.  Most of the use cases are relatively simple joins, actually.
Maybe one of the key columns is all nulls, or some similar strangeness.

It would be useful to know if his server is I/O or cpu bound.  My guess
is that the server swapping stuff all while...

Running ANALYZE after import can work wonders...or does it? I don't
usually use COPY to do the import.  Perhaps create indexes/constraints
after import?

Some explains might be helpful.  Still, shared buffers is the first
thing to look at.  Maybe if I get around to it, I'll try the tpc-h out
here.

Merlin

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Jan Wieck

Date:

21 April 2004, 16:39:05

Josh Berkus wrote:

> Folks,
>
> I've sent a polite e-mail to Mr. Gomez offering our help.  Please, nobody
> flame him!
>

Please keep in mind that the entire test has, other than a similar
database schema and query types maybe, nothing to do with a TPC-H. I
don't see any kind of SUT. Foreign key support on the DB level is not
required by any of the TPC benchmarks. But the System Under Test, which
is the combination of middleware application and database together with
all computers and network components these parts are running on, must
implement all the required semantics, like ACID properties, referential
integrity &c. One could implement a TPC-H with flat files, it's just a
major pain in the middleware.

A proper TPC benchmark implementation would for example be a complete
PHP+DB application, where the user interaction is done by an emulated
"browser" and what is measured is the http response times, not anything
going on between PHP and the DB. Assuming that all requirements of the
TPC specification are implemented by either using available DB features,
or including appropriate workarounds in the PHP code, that would very
well lead to something that can compare PHP+MySQL vs. PHP+PostgreSQL.

All TPC benchmarks I have seen are performed by timing such a system
after a considerable rampup time, giving the DB system a chance to
properly populate caches and so forth. Rebooting the machine just before
the test is the wrong thing here and will especially kill any advanced
cache algorithms like ARC.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Gavin Sherry

Date:

21 April 2004, 18:40:17

On Wed, 21 Apr 2004, Merlin Moncure wrote:

>
> > Perhaps one of the advocay team will pick up the batton?
> He is using COPY to load the data...I can't think of any earthly reason
> why it takes > 1d to load 10gb data...probabaly all boils down to
> default shared buffer setting.  I don't even really consider this
> 'optimizing', just basic configuring to match the software to the
> machine.

The indexes should be created after the data is loaded. There is no
mention of VACUUM, ANALYZE or tuning.

I've generated the work load over night and am going to have a quick look
at it when I get a minute.

Gavin

Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Gavin Sherry

Date:

22 April 2004, 05:55:17

On Thu, 22 Apr 2004, Gavin Sherry wrote:

> On Wed, 21 Apr 2004, Merlin Moncure wrote:
>
> >
> > > Perhaps one of the advocay team will pick up the batton?
> > He is using COPY to load the data...I can't think of any earthly reason
> > why it takes > 1d to load 10gb data...probabaly all boils down to
> > default shared buffer setting.  I don't even really consider this
> > 'optimizing', just basic configuring to match the software to the
> > machine.
>
> The indexes should be created after the data is loaded. There is no
> mention of VACUUM, ANALYZE or tuning.
>
> I've generated the work load over night and am going to have a quick look
> at it when I get a minute.

With a 10 GB work load on an old Duron, 512 MB ram, IDE disk, it took 30
odd minutes to load the data and 9 hours to load indexes. Unfortunately
I'm out of disk! I'll run the 1 GB work load instead when I get some time
but I can imagine that an ANALYZE will make a dramatic different to those
query times.

Gavin

Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

"Merlin Moncure"

Date:

22 April 2004, 10:07:30

Gavin Sherry wrote:
> With a 10 GB work load on an old Duron, 512 MB ram, IDE disk,
> it took 30
> odd minutes to load the data and 9 hours to load indexes.
> Unfortunately
> I'm out of disk! I'll run the 1 GB work load instead when I
> get some time
> but I can imagine that an ANALYZE will make a dramatic
> different to those
> query times.

Heh...can you estimate how much space is required?
Merlin

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Grega Bremec

Date:

26 April 2004, 11:17:46

...and on Thu, Apr 22, 2004 at 05:53:18AM -0700, Eduardo Almeida used the keyboard:
>
> - The configuration of the machine is:
> Dual opteron 64 bits model 240
> 4GB RAM
> 960 GB on RAID 0
> Mandrake Linux 64 with Kernel 2.6.5  (I compiled a
> kernel for this test)
> Java SDK  java version "1.4.2_04"
> PostgreSQL JDBC pg74.1jdbc3.jar
>
> - The TPC-H configuration is:
> TPC-H 2.0.0
> 100GB
> load using flat files
> Refresh functions using java
>

I'll just add for the reference, to those that aren't aware of it, the Java
virtual machine for x86_64 only exists in the 1.5 branch so far, and it's so
utterly unstable that most every notable shuffling around in the memory
crashes it. :)

Hence the 1.4.2_04 is a 32-bit application running in 32-bit mode.

I won't be getting into how much this affects the benchmarks as I didn't
really get into how CPU- and memory-intensive the refresh functions are in
these, so as I said - let's keep it a reference.

Cheers,
--
    Grega Bremec
    Senior Administrator
    Noviforum Ltd., Software & Media
    http://www.noviforum.si/

Attachment

msg-22763-6875.dat

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Eduardo Almeida

Date:

26 April 2004, 11:21:09

Folks,

I�m doing the 100GB TPC-H and I�ll show the previous
results to our community (Postgres) in 3 weeks before
finishing the study.

My intention is to carry through a test with a VLDB in
a low cost platform (PostgreSQL, Linux and cheap HW)
and not to compare with another DBMS.

So far I can tell you that the load time on PG 7.4.2
with kernel 2.6.5 on Opteron 64 model 240 in RAID 0
with 8 disks (960 GB) loaded the database in less than
24 hours.
About 7hs:30min to load the data and 16:09:25 to
create the indexes

The Power test still running and that�s why I�ll not
present anything so far. Now I�ll just send to the
list my environment configuration.

- The configuration of the machine is:
Dual opteron 64 bits model 240
4GB RAM
960 GB on RAID 0
Mandrake Linux 64 with Kernel 2.6.5  (I compiled a
kernel for this test)
Java SDK  java version "1.4.2_04"
PostgreSQL JDBC pg74.1jdbc3.jar

- The TPC-H configuration is:
TPC-H 2.0.0
100GB
load using flat files
Refresh functions using java

- The PostgreSQL 7.4.2 configuration is:

add_missing_from               | on
 australian_timezones           | off
 authentication_timeout         | 60
 check_function_bodies          | on
 checkpoint_segments            | 128
 checkpoint_timeout             | 300
 checkpoint_warning             | 30
 client_encoding                | SQL_ASCII
 client_min_messages            | notice
 commit_delay                   | 0
 commit_siblings                | 5
 cpu_index_tuple_cost           | 0.001
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 DateStyle                      | ISO, MDY
 db_user_namespace              | off
 deadlock_timeout               | 1000
 debug_pretty_print             | off
 debug_print_parse              | off
 debug_print_plan               | off
 debug_print_rewritten          | off
 default_statistics_target      | 10
 default_transaction_isolation  | read committed
 default_transaction_read_only  | off
 dynamic_library_path           | $libdir
 effective_cache_size           | 150000
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexscan               | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
 explain_pretty_print           | on
 extra_float_digits             | 0
 from_collapse_limit            | 8
 fsync                          | off
 geqo                           | on
 geqo_effort                    | 1
 geqo_generations               | 0
geqo_pool_size                 | 0
 geqo_selection_bias            | 2
 geqo_threshold                 | 11
 join_collapse_limit            | 8
 krb_server_keyfile             | unset
 lc_collate                     | en_US
 lc_ctype                       | en_US
 lc_messages                    | C
 lc_monetary                    | C
 lc_numeric                     | C
 lc_time                        | C
 log_connections                | off
 log_duration                   | off
 log_error_verbosity            | default
 log_executor_stats             | off
 log_hostname                   | off
 log_min_duration_statement     | -1
 log_min_error_statement        | panic
 log_min_messages               | notice
 log_parser_stats               | off
 log_pid                        | off
 log_planner_stats              | off
 log_source_port                | off
 log_statement                  | off
 log_statement_stats            | off
 log_timestamp                  | off
 max_connections                | 10
 max_expr_depth                 | 10000
 max_files_per_process          | 1000
 max_fsm_pages                  | 20000
 max_fsm_relations              | 1000
 max_locks_per_transaction      | 64
 password_encryption            | on
 port                           | 5432
 pre_auth_delay                 | 0
 preload_libraries              | unset
 random_page_cost               | 1.25
 regex_flavor                   | advanced
 rendezvous_name                | unset
 search_path                    | $user,public
 server_encoding                | SQL_ASCII
 server_version                 | 7.4.2
 shared_buffers                 | 40000
 silent_mode                    | off
sort_mem                       | 65536
 sql_inheritance                | on
 ssl                            | off
 statement_timeout              | 10000000
 stats_block_level              | off
 stats_command_string           | off
 stats_reset_on_server_start    | on
 stats_row_level                | off
 stats_start_collector          | on
 superuser_reserved_connections | 2
 syslog                         | 0
 syslog_facility                | LOCAL0
 syslog_ident                   | postgres
 tcpip_socket                   | on
 TimeZone                       | unknown
 trace_notify                   | off
 transaction_isolation          | read committed
 transaction_read_only          | off
 transform_null_equals          | off
 unix_socket_directory          | unset
 unix_socket_group              | unset
 unix_socket_permissions        | 511
 vacuum_mem                     | 65536
 virtual_host                   | unset
 wal_buffers                    | 32
 wal_debug                      | 0
 wal_sync_method                | fdatasync
 zero_damaged_pages             | off
(113 rows)


suggestions, doubts and commentaries are very welcome

regards
______________________________
Eduardo Cunha de Almeida
Administra��o de Banco de Dados
UFPR - CCE
+55-41-361-3321
eduardo.almeida@ufpr.br
edalmeida@yahoo.com

--- Jan Wieck <JanWieck@Yahoo.com> wrote:
> Josh Berkus wrote:
>
> > Folks,
> >
> > I've sent a polite e-mail to Mr. Gomez offering
> our help.  Please, nobody
> > flame him!
> >
>
> Please keep in mind that the entire test has, other
> than a similar
> database schema and query types maybe, nothing to do
> with a TPC-H. I
> don't see any kind of SUT. Foreign key support on
> the DB level is not
> required by any of the TPC benchmarks. But the
> System Under Test, which
> is the combination of middleware application and
> database together with
> all computers and network components these parts are
> running on, must
> implement all the required semantics, like ACID
> properties, referential
> integrity &c. One could implement a TPC-H with flat
> files, it's just a
> major pain in the middleware.
>
> A proper TPC benchmark implementation would for
> example be a complete
> PHP+DB application, where the user interaction is
> done by an emulated
> "browser" and what is measured is the http response
> times, not anything
> going on between PHP and the DB. Assuming that all
> requirements of the
> TPC specification are implemented by either using
> available DB features,
> or including appropriate workarounds in the PHP
> code, that would very
> well lead to something that can compare PHP+MySQL
> vs. PHP+PostgreSQL.
>
> All TPC benchmarks I have seen are performed by
> timing such a system
> after a considerable rampup time, giving the DB
> system a chance to
> properly populate caches and so forth. Rebooting the
> machine just before
> the test is the wrong thing here and will especially
> kill any advanced
> cache algorithms like ARC.
>
>
> Jan
>
> --
>
#======================================================================#
> # It's easier to get forgiveness for being wrong
> than for being right. #
> # Let's break this rule - forgive me.
>                   #
> #==================================================
> JanWieck@Yahoo.com #
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>
> http://www.postgresql.org/docs/faqs/FAQ.htmlIP 5:
> Have you checked our extensive FAQ?
>
>
http://www.postgresql.org/docs/faqs/FAQ.html





__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25�
http://photos.yahoo.com/ph/print_splash

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Eduardo Almeida

Date:

26 April 2004, 11:22:57

Grega,

That�s why I used java 32bits and needed to compile
the kernel 2.6.5 with the 32bits modules.
To reference, Sun has java 64bits just to IA64 and
Solaris Sparc 64 not to Opteron.

regards,
Eduardo
--- Grega Bremec <grega.bremec@noviforum.si> wrote:
> ...and on Thu, Apr 22, 2004 at 05:53:18AM -0700,
> Eduardo Almeida used the keyboard:
> >
> > - The configuration of the machine is:
> > Dual opteron 64 bits model 240
> > 4GB RAM
> > 960 GB on RAID 0
> > Mandrake Linux 64 with Kernel 2.6.5  (I compiled a
> > kernel for this test)
> > Java SDK  java version "1.4.2_04"
> > PostgreSQL JDBC pg74.1jdbc3.jar
> >
> > - The TPC-H configuration is:
> > TPC-H 2.0.0
> > 100GB
> > load using flat files
> > Refresh functions using java
> >
>
> I'll just add for the reference, to those that
> aren't aware of it, the Java
> virtual machine for x86_64 only exists in the 1.5
> branch so far, and it's so
> utterly unstable that most every notable shuffling
> around in the memory
> crashes it. :)
>
> Hence the 1.4.2_04 is a 32-bit application running
> in 32-bit mode.
>
> I won't be getting into how much this affects the
> benchmarks as I didn't
> really get into how CPU- and memory-intensive the
> refresh functions are in
> these, so as I said - let's keep it a reference.
>
> Cheers,
> --
>     Grega Bremec
>     Senior Administrator
>     Noviforum Ltd., Software & Media
>     http://www.noviforum.si/
>

> ATTACHMENT part 2 application/pgp-signature






__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25�
http://photos.yahoo.com/ph/print_splash

Re: [PERFORM] MySQL vs PG TPC-H benchmarks

From

Grega Bremec

Date:

06 May 2004, 20:24:28

...and on Thu, Apr 22, 2004 at 06:59:10AM -0700, Eduardo Almeida used the keyboard:

<snip>
>
> To reference, Sun has java 64bits just to IA64 and
> Solaris Sparc 64 not to Opteron.
>

As I mentioned, that is true for the 1.4.x release of the JVMs. We have been
testing some JCA builds of 1.5.0 on x86_64 so far, but it is too unstable for
any kind of serious work.

Cheers,
--
    Grega Bremec
    Senior Administrator
    Noviforum Ltd., Software & Media
    http://www.noviforum.si/

Attachment

msg-22763-7080.dat