Re: Reliability recommendations - Mailing list pgsql-performance

From Ron
Subject Re: Reliability recommendations
Date
Msg-id 7.0.1.0.2.20060225055917.03da49f8@earthlink.net
Whole thread Raw
In response to Re: Reliability recommendations  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: Reliability recommendations  ("Luke Lonergan" <llonergan@greenplum.com>)
List pgsql-performance
At 11:41 PM 2/24/2006, Luke Lonergan wrote:
>Dan,
>
>On 2/24/06 4:47 PM, "Dan Gorman" <dgorman@hi5.com> wrote:
>
> > Was that sequential reads? If so, yeah you'll get 110MB/s? How big
> > was the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they
> > can't sustain that. There are so many details missing from this test
> > that it's hard to have any context around it :)
> >
> > I was getting about 40-50MB/s on a PV with 14 disks on a RAID10 in
> > real world usage. (random IO and fully saturating a Dell 1850 with 4
> > concurrent threads (to peg the cpu on selects) and raw data files)
>
>OK, how about some proof?
>
>In a synthetic test that writes 32GB of sequential 8k pages on a machine
>with 16GB of RAM:
>========================= Write test results ==============================
>time bash -c "dd if=/dev/zero of=/dbfast1/llonergan/bigfile bs=8k
>count=2000000 && sync" &
>time bash -c "dd if=/dev/zero of=/dbfast3/llonergan/bigfile bs=8k
>count=2000000 && sync" &
>
>2000000++0 records in
>2000000++0 records out
>2000000++0 records in
>2000000++0 records out
>
>real    1m0.046s
>user    0m0.270s
>sys     0m30.008s
>
>real    1m0.047s
>user    0m0.287s
>sys     0m30.675s
>
>So that's 32,000 MB written in 60.05 seconds, which is 533MB/s sustained
>with two threads.
>
>Now to read the same files in parallel:
>========================= Read test results ==============================
>sync
>time dd of=/dev/null if=/dbfast1/llonergan/bigfile bs=8k &
>time dd of=/dev/null if=/dbfast3/llonergan/bigfile bs=8k &
>
>2000000++0 records in
>2000000++0 records out
>
>real    0m39.849s
>user    0m0.282s
>sys     0m22.294s
>2000000++0 records in
>2000000++0 records out
>
>real    0m40.410s
>user    0m0.251s
>sys     0m22.515s
>
>And that's 32,000MB in 40.4 seconds, or 792MB/s sustained from disk (not
>memory).
>
>These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
>controllers.

Impressive IO rates.  A more detailed HW list would help put them in context.

Which 3Ware?  The 9550SX?  How much cache on it (AFAIK, the only
options are 128MB and 256MB?)?

Which HDs?

What CPUs (looks like Opterons, but which flavor?) and mainboard?

What's CPU utilization when hammering the physical IO subsystem this hard?


>Now for real usage, let's run a simple sequential scan query on 123,434 MB
>of data in a single table on 4 of these machines in parallel.  All tables
>are distributed evenly by Bizgres MPP over all 8 filesystems:
>
>============= Bizgres MPP sequential scan results =========================
>
>[llonergan@salerno0 +AH4]$ !psql
>psql -p 9999 -U mppdemo1 demo
>Welcome to psql 8.1.1 (server 8.1.3), the PostgreSQL interactive terminal.
>
>Type:  +AFw-copyright for distribution terms
>        +AFw-h for help with SQL commands
>        +AFw? for help with psql commands
>        +AFw-g or terminate with semicolon to execute query
>        +AFw-q to quit
>
>demo=# +AFw-timing
>Timing is on.
>demo=# select version();
>
>version
>----------------------------------------------------------------------------
>----------------------------------------------------------------------------
>-----
>  PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by
>GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006
>11:34:06
>(1 row)
>
>Time: 0.570 ms
>demo=# select relname,8*relpages/128 as MB from pg_class order by relpages
>desc limit 6;
>             relname             |   mb
>--------------------------------++--------
>  lineitem                       | 123434
>  orders                         |  24907
>  partsupp                       |  14785
>  part                           |   3997
>  customer                       |   3293
>  supplier                       |    202
>(6 rows)
>
>Time: 1.824 ms
>demo=# select count(*) from lineitem;
>    count
>-----------
>  600037902
>(1 row)
>
>Time: 60300.960 ms
>
>So that's 123,434 MB of data scanned in 60.3 seconds, or 2,047 MB/s on 4
>machines, which uses 512MB/s of disk bandwidth on each machine.
>
>Now let's do a query that uses a this big table (a two way join) using all 4
>machines:
>============= Bizgres MPP Query results =========================
>demo=# select
>demo-#         sum(l_extendedprice* (1 - l_discount)) as revenue
>demo-# from
>demo-#         lineitem,
>demo-#         part
>demo-# where
>demo-#         (
>demo(#                 p_partkey = l_partkey
>demo(#                 and p_brand = 'Brand#42'
>demo(#                 and p_container in ('SM CASE', 'SM BOX', 'SM PACK',
>'SM PKG')
>demo(#                 and l_quantity >= 7 and l_quantity <= 7 ++ 10
>demo(#                 and p_size between 1 and 5
>demo(#                 and l_shipmode in ('AIR', 'AIR REG')
>demo(#                 and l_shipinstruct = 'DELIVER IN PERSON'
>demo(#         )
>demo-#         or
>demo-#         (
>demo(#                 p_partkey = l_partkey
>demo(#                 and p_brand = 'Brand#15'
>demo(#                 and p_container in ('MED BAG', 'MED BOX', 'MED PKG',
>'MED PACK')
>demo(#                 and l_quantity >= 14 and l_quantity <= 14 ++ 10
>demo(#                 and p_size between 1 and 10
>demo(#                 and l_shipmode in ('AIR', 'AIR REG')
>demo(#                 and l_shipinstruct = 'DELIVER IN PERSON'
>demo(#         )
>demo-#         or
>demo-#         (
>demo(#                 p_partkey = l_partkey
>demo(#                 and p_brand = 'Brand#53'
>demo(#                 and p_container in ('LG CASE', 'LG BOX', 'LG PACK',
>'LG PKG')
>demo(#                 and l_quantity >= 22 and l_quantity <= 22 ++ 10
>demo(#                 and p_size between 1 and 15
>demo(#                 and l_shipmode in ('AIR', 'AIR REG')
>demo(#                 and l_shipinstruct = 'DELIVER IN PERSON'
>demo(#         );
>     revenue
>----------------
>  356492404.3164
>(1 row)
>
>Time: 114908.149 ms
Hmmm.  ~115secs @ ~500MBps => ~57.5GB of data manipulated.


>And now a 6-way join among 4 tables in this same schema:
>
>demo=# SELECT
>demo-#
>s.s_acctbal,s.s_name,n.n_name,p.p_partkey,p.p_mfgr,s.s_address,s.s_phone,s.s
>_comment
>demo-# FROM
>demo-#         supplier s,partsupp ps,nation n,region r,
>demo-#         part p,  (
>demo(#                         SELECT p_partkey,  min(ps_supplycost) as
>min_ps_cost from part, partsupp ,
>demo(#                                 supplier,nation, region
>demo(#                         WHERE
>demo(#                                 p_partkey=ps_partkey
>demo(#                                 and s_suppkey = ps_suppkey
>demo(#                                 and s_nationkey = n_nationkey
>demo(#                                 and n_regionkey = r_regionkey
>demo(#                                 and r_name = 'EUROPE'
>demo(#                         GROUP BY
>demo(#                                 p_partkey
>demo(#                  ) g
>demo-# WHERE
>demo-#         p.p_partkey = ps.ps_partkey
>demo-#         and g.p_partkey = p.p_partkey
>demo-#         and g. min_ps_cost = ps.ps_supplycost
>demo-#         and s.s_suppkey = ps.ps_suppkey
>demo-#         and p.p_size = 15
>demo-#         and p.p_type like '%BRASS'
>demo-#         and s.s_nationkey = n.n_nationkey
>demo-#         and n.n_regionkey = r.r_regionkey
>demo-#         and r.r_name = 'EUROPE'
>demo-# ORDER BY
>demo-#         s. s_acctbal desc,n.n_name,s.s_name,p.p_partkey
>demo-# LIMIT 100;
>  s_acctbal |          s_name           |          n_name           |
>p_partkey |          p_mfgr           |
>           s_address                 |     s_phone     |
>s_comment
>
>-----------++---------------------------++---------------------------++--------
>---++---------------------------++------
>------------------------------------++-----------------++---------------------
>--------------------------------------
>-------------------------------------------
>    9999.70 | Supplier#000239544        | UNITED KINGDOM            |
>6739531 | Manufacturer#4            | 1UCMu
>3TLyUThghoeZ8arg6cV3Mr              | 33-509-584-9496 | carefully ironic
>asymptotes cajole quickly. slyly silent a
>ccounts sleep. fl
>...
>...
>    9975.53 | Supplier#000310136        | ROMANIA                   |
>10810115 | Manufacturer#5            | VNWON
>A5Sr B                              | 29-977-903-6199 | pending deposits
>wake permanently; final accounts sleep ab
>out the pending deposits.
>(100 rows)
>
>Time: 424981.813 ms
...and this implies ~425secs @ ~500MBps => 212.5GB

What are the IO rates during these joins?

How much data is being handled to complete these joins?

How much data is being exchanged between these machines to complete the joins?

What is the connectivity between these 4 machines?

Putting these numbers in context may help the advocacy effort
considerably as well as help us improve things even further. ;-)

TiA,
Ron



pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Reliability recommendations
Next
From: Ron
Date:
Subject: Re: Reliability recommendations