Re: Huge Data sets, simple queries - Mailing list pgsql-performance

From Jeffrey W. Baker
Subject Re: Huge Data sets, simple queries
Date
Msg-id 1138766980.14051.24.camel@noodles
Whole thread Raw
In response to Re: Huge Data sets, simple queries  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: Huge Data sets, simple queries
Re: Huge Data sets, simple queries
List pgsql-performance
On Tue, 2006-01-31 at 12:47 -0800, Luke Lonergan wrote:
> Jeffrey,
>
> On 1/31/06 12:03 PM, "Jeffrey W. Baker" <jwbaker@acm.org> wrote:
> > Linux does balanced reads on software
> > mirrors.  I'm not sure why you think this can't improve bandwidth.  It
> > does improve streaming bandwidth as long as the platter STR is more than
> > the bus STR.
>
> ... Prove it.

It's clear that Linux software RAID1, and by extension RAID10, does
balanced reads, and that these balanced reads double the bandwidth.  A
quick glance at the kernel source code, and a trivial test, proves the
point.

In this test, sdf and sdg are Seagate 15k.3 disks on a single channel of
an Adaptec 39320, but the enclosure, and therefore the bus, is capable
of only Ultra160 operation.

# grep md0 /proc/mdstat
md0 : active raid1 sdf1[0] sdg1[1]

# dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=0      &
  dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=400000
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 48.243362 seconds (67922298 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 48.375897 seconds (67736211 bytes/sec)

That's 136MB/sec, for those following along at home.  With only two
disks in a RAID1, you can nearly max out the SCSI bus.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=400000 skip=0      &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=400000 skip=400000
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 190.413286 seconds (17208883 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 192.096232 seconds (17058117 bytes/sec)

That, on the other hand, is only 34MB/sec.  With two threads, the RAID1
is 296% faster.

# dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=0       &
  dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=400000  &
  dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=800000  &
  dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=1200000 &
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 174.276585 seconds (18802296 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 181.581893 seconds (18045852 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 183.724243 seconds (17835425 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 184.209018 seconds (17788489 bytes/sec)

That's 71MB/sec with 4 threads...

# dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=0       &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=400000  &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=800000  &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=1200000 &
100000+0 records in
100000+0 records out
819200000 bytes transferred in 77.489210 seconds (10571794 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 87.628000 seconds (9348610 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 88.912989 seconds (9213502 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 90.238705 seconds (9078144 bytes/sec)

Only 36MB/sec for the single disk.  96% advantage for the RAID1.

# dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=0 &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=400000  &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=800000  &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=1200000 &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=1600000 &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2000000 &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2400000 &
  dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2800000 &
50000+0 records in
50000+0 records out
409600000 bytes transferred in 35.289648 seconds (11606803 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 42.653475 seconds (9602969 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 43.524714 seconds (9410745 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 45.151705 seconds (9071640 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 47.741845 seconds (8579476 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 48.600533 seconds (8427891 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 48.758726 seconds (8400548 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 49.679275 seconds (8244887 bytes/sec)

66MB/s with 8 threads.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=0 &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=400000  &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=800000  &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=1200000 &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=1600000 &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2000000 &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2400000 &
  dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2800000 &
50000+0 records in
50000+0 records out
409600000 bytes transferred in 73.873911 seconds (5544583 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 75.613093 seconds (5417051 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 79.988303 seconds (5120749 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 79.996440 seconds (5120228 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 84.885172 seconds (4825342 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 92.995892 seconds (4404496 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 99.180337 seconds (4129851 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 100.144752 seconds (4090080 bytes/sec)

33MB/s.  RAID1 gives a 100% advantage at 8 threads.

I think I've proved my point.  Software RAID1 read balancing provides
0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
respectively.  In the presence of random I/O, the results are even
better.

Anyone who thinks they have a single-threaded workload has not yet
encountered the autovacuum daemon.

-Jeff


pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Sequential scan being used despite indexes
Next
From: James Russell
Date:
Subject: Re: Sequential scan being used despite indexes