Thread: Script to compute random page cost

Script to compute random page cost

From

Bruce Momjian

Date:

09 September 2002, 01:05:58

Because we have seen many complains about sequential vs index scans, I
wrote a script which computes the value for your OS/hardware
combination.

Under BSD/OS on one SCSI disk, I get a random_page_cost around 60.  Our
current postgresql.conf default is 4.

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file.  Do people have a
suggestion on how to correct this?  Does it matter?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#!/bin/bash

trap "rm -f /tmp/$$" 0 1 2 3 15

BLCKSZ=8192

if [ "$RANDOM" = "$RANDOM" ]
then    echo "Your shell does not support \$RANDOM.  Try using bash." 1>&2
    exit 1
fi

# XXX We assume 0 <= random <= 32767

echo "Collecting sizing information ..."

TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
then    echo "Your installation should have at least four times the data stored in template1 to yield meaningful
results"1>&2 
    exit 1
fi

find "$PGDATA/base" -type f -exec ls -ld {} \; |
awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
grep -v '^0 ' > /tmp/$$

TOTAL=`awk 'BEGIN    {sum=0}
            {sum += $1}
        END        {print sum}' /tmp/$$`

echo "Running random access timing test ..."

START=`date '+%s'`
PAGES=1000

while [ "$PAGES" -ne 0 ]
do
    BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

    OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`

    RESULT=`awk '    BEGIN    {offset = 0}
        offset + $1 > '"$OFFSET"' \
            {print $2, '"$OFFSET"' - offset ; exit}
            {offset += $1}' /tmp/$$`
    FILE=`echo "$RESULT" | awk '{print $1}'`
    OFFSET=`echo "$RESULT" | awk '{print $2}'`

    dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE" of="/dev/null" >/dev/null 2>&1
    PAGES=`expr "$PAGES" - 1`
done

STOP=`date '+%s'`
RANDTIME=`expr "$STOP" - "$START"`

echo "Running sequential access timing test ..."

START=`date '+%s'`
# We run the random test 10 times more because it is quicker and
# we need it to run for a while to get accurate results.
PAGES=10000

while [ "$PAGES" -ne 0 ]
do
    BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

    OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`

    RESULT=`awk '    BEGIN    {offset = 0}
        offset + $1 > '"$OFFSET"' \
            {print $2, $1; exit}
            {offset += $1}' /tmp/$$`
    FILE=`echo "$RESULT" | awk '{print $1}'`
    FILEPAGES=`echo "$RESULT" | awk '{print $2}'`

    if [ "$FILEPAGES" -gt "$PAGES" ]
    then    FILEPAGES="$PAGES"
    fi

    dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE" of="/dev/null" >/dev/null 2>&1
    PAGES=`expr "$PAGES" - "$FILEPAGES"`
done

STOP=`date '+%s'`
SEQTIME=`expr "$STOP" - "$START"`

echo
awk 'BEGIN    {printf "random_page_cost = %f\n", ('"$RANDTIME"' / '"$SEQTIME"') * 10}'

Re: Script to compute random page cost

From

Bruce Momjian

Date:

09 September 2002, 02:14:04

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers.  I have a new version at:
ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Because we have seen many complains about sequential vs index scans, I
> wrote a script which computes the value for your OS/hardware
> combination.
> 
> Under BSD/OS on one SCSI disk, I get a random_page_cost around 60.  Our
> current postgresql.conf default is 4.
> 
> What do other people get for this value?
> 
> Keep in mind if we increase this value, we will get a more sequential
> scans vs. index scans.
> 
> One flaw in this test is that it randomly reads blocks from different
> files rather than randomly reading from the same file.  Do people have a
> suggestion on how to correct this?  Does it matter?
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

> #!/bin/bash
> 
> trap "rm -f /tmp/$$" 0 1 2 3 15
> 
> BLCKSZ=8192
> 
> if [ "$RANDOM" = "$RANDOM" ]
> then    echo "Your shell does not support \$RANDOM.  Try using bash." 1>&2
>     exit 1
> fi
> 
> # XXX We assume 0 <= random <= 32767
> 
> echo "Collecting sizing information ..."
> 
> TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
> FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
> if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
> then    echo "Your installation should have at least four times the data stored in template1 to yield meaningful
results"1>&2
 
>     exit 1 
> fi
> 
> find "$PGDATA/base" -type f -exec ls -ld {} \; |
> awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
> grep -v '^0 ' > /tmp/$$
> 
> TOTAL=`awk 'BEGIN    {sum=0}
>             {sum += $1}
>         END        {print sum}' /tmp/$$`
> 
> echo "Running random access timing test ..."
> 
> START=`date '+%s'`
> PAGES=1000
> 
> while [ "$PAGES" -ne 0 ]
> do
>     BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
>     
>     OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`
>     
>     RESULT=`awk '    BEGIN    {offset = 0}
>         offset + $1 > '"$OFFSET"' \
>             {print $2, '"$OFFSET"' - offset ; exit}
>             {offset += $1}' /tmp/$$`
>     FILE=`echo "$RESULT" | awk '{print $1}'`
>     OFFSET=`echo "$RESULT" | awk '{print $2}'`
>     
>     dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE" of="/dev/null" >/dev/null 2>&1
>     PAGES=`expr "$PAGES" - 1`
> done
> 
> STOP=`date '+%s'`
> RANDTIME=`expr "$STOP" - "$START"`
> 
> echo "Running sequential access timing test ..."
> 
> START=`date '+%s'`
> # We run the random test 10 times more because it is quicker and
> # we need it to run for a while to get accurate results.
> PAGES=10000
> 
> while [ "$PAGES" -ne 0 ]
> do
>     BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
>     
>     OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`
>     
>     RESULT=`awk '    BEGIN    {offset = 0}
>         offset + $1 > '"$OFFSET"' \
>             {print $2, $1; exit}
>             {offset += $1}' /tmp/$$`
>     FILE=`echo "$RESULT" | awk '{print $1}'`
>     FILEPAGES=`echo "$RESULT" | awk '{print $2}'`
> 
>     if [ "$FILEPAGES" -gt "$PAGES" ]
>     then    FILEPAGES="$PAGES"
>     fi
>     
>     dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE" of="/dev/null" >/dev/null 2>&1
>     PAGES=`expr "$PAGES" - "$FILEPAGES"`
> done
> 
> STOP=`date '+%s'`
> SEQTIME=`expr "$STOP" - "$START"`
> 
> echo
> awk 'BEGIN    {printf "random_page_cost = %f\n", ('"$RANDTIME"' / '"$SEQTIME"') * 10}'

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

"Christopher Kings-Lynne"

Date:

09 September 2002, 03:20:48

I got:

random_page_cost = 0.807018

For FreeBSD 4.4/i386

With 512MB RAM & SCSI HDD

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
> Sent: Monday, 9 September 2002 2:14 PM
> To: PostgreSQL-development
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
>
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.
>
> ------------------------------------------------------------------
> ---------
>
> Bruce Momjian wrote:
> > Because we have seen many complains about sequential vs index scans, I
> > wrote a script which computes the value for your OS/hardware
> > combination.
> >
> > Under BSD/OS on one SCSI disk, I get a random_page_cost around 60.  Our
> > current postgresql.conf default is 4.
> >
> > What do other people get for this value?
> >
> > Keep in mind if we increase this value, we will get a more sequential
> > scans vs. index scans.
> >
> > One flaw in this test is that it randomly reads blocks from different
> > files rather than randomly reading from the same file.  Do people have a
> > suggestion on how to correct this?  Does it matter?
> >
> > --
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square,
> Pennsylvania 19073
>
> > #!/bin/bash
> >
> > trap "rm -f /tmp/$$" 0 1 2 3 15
> >
> > BLCKSZ=8192
> >
> > if [ "$RANDOM" = "$RANDOM" ]
> > then    echo "Your shell does not support \$RANDOM.  Try
> using bash." 1>&2
> >     exit 1
> > fi
> >
> > # XXX We assume 0 <= random <= 32767
> >
> > echo "Collecting sizing information ..."
> >
> > TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
> > FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
> > if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
> > then    echo "Your installation should have at least four
> times the data stored in template1 to yield meaningful results" 1>&2
> >     exit 1
> > fi
> >
> > find "$PGDATA/base" -type f -exec ls -ld {} \; |
> > awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
> > grep -v '^0 ' > /tmp/$$
> >
> > TOTAL=`awk 'BEGIN    {sum=0}
> >             {sum += $1}
> >         END        {print sum}' /tmp/$$`
> >
> > echo "Running random access timing test ..."
> >
> > START=`date '+%s'`
> > PAGES=1000
> >
> > while [ "$PAGES" -ne 0 ]
> > do
> >     BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
> >
> >     OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *
> '"$TOTAL"'}'`
> >
> >     RESULT=`awk '    BEGIN    {offset = 0}
> >         offset + $1 > '"$OFFSET"' \
> >             {print $2, '"$OFFSET"' - offset ; exit}
> >             {offset += $1}' /tmp/$$`
> >     FILE=`echo "$RESULT" | awk '{print $1}'`
> >     OFFSET=`echo "$RESULT" | awk '{print $2}'`
> >
> >     dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE"
> of="/dev/null" >/dev/null 2>&1
> >     PAGES=`expr "$PAGES" - 1`
> > done
> >
> > STOP=`date '+%s'`
> > RANDTIME=`expr "$STOP" - "$START"`
> >
> > echo "Running sequential access timing test ..."
> >
> > START=`date '+%s'`
> > # We run the random test 10 times more because it is quicker and
> > # we need it to run for a while to get accurate results.
> > PAGES=10000
> >
> > while [ "$PAGES" -ne 0 ]
> > do
> >     BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
> >
> >     OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *
> '"$TOTAL"'}'`
> >
> >     RESULT=`awk '    BEGIN    {offset = 0}
> >         offset + $1 > '"$OFFSET"' \
> >             {print $2, $1; exit}
> >             {offset += $1}' /tmp/$$`
> >     FILE=`echo "$RESULT" | awk '{print $1}'`
> >     FILEPAGES=`echo "$RESULT" | awk '{print $2}'`
> >
> >     if [ "$FILEPAGES" -gt "$PAGES" ]
> >     then    FILEPAGES="$PAGES"
> >     fi
> >
> >     dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE"
> of="/dev/null" >/dev/null 2>&1
> >     PAGES=`expr "$PAGES" - "$FILEPAGES"`
> > done
> >
> > STOP=`date '+%s'`
> > SEQTIME=`expr "$STOP" - "$START"`
> >
> > echo
> > awk 'BEGIN    {printf "random_page_cost = %f\n", ('"$RANDTIME"' /
> '"$SEQTIME"') * 10}'
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: Script to compute random page cost

From

"Dave Page"

Date:

09 September 2002, 03:37:14

Dell Inspiron 8100 laptop, 1.2GHz Pentium, 512Mb RAM, Windows XP Pro
CYGWIN_NT-5.1 PC9 1.3.10(0.51/3/2) 2002-02-25 11:14 i686 unknown

random_page_cost = 0.924119

Regards, Dave.


> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: 09 September 2002 07:14
> To: PostgreSQL-development
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
>
> OK, turns out that the loop for sequential scan ran fewer
> times and was skewing the numbers.  I have a new version at:
>
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.

Re: Script to compute random page cost

From

Tatsuo Ishii

Date:

09 September 2002, 03:58:51

> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> I get _much_ lower numbers now for random_page_cost.

I got:

random_page_cost = 1.047619

Linux kernel 2.4.18
Pentium III 750MHz
Memory 256MB
IDE HDD

(A notebook/SONY VAIO PCG-Z505CR/K)
--
Tatsuo Ishii

Re: Script to compute random page cost

From

Curt Sampson

Date:

09 September 2002, 06:52:59

On Mon, 9 Sep 2002, Bruce Momjian wrote:

> What do other people get for this value?

With your new script, with a 1.5 GHz Athlon, 512 MB RAM, and a nice fast
IBM 7200 RPM IDE disk, I get random_page_cost = 0.933333.

> One flaw in this test is that it randomly reads blocks from different
> files rather than randomly reading from the same file.  Do people have a
> suggestion on how to correct this?  Does it matter?

From my quick glance, it also does a lot of work work to read each
block, including forking off serveral other programs. This would tend to
push up the cost of a random read. You might want to look at modifying
the randread program (http://randread.sourceforge.net) to do what you
want....

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

"Mario Weilguni"

Date:

09 September 2002, 07:51:07

> What do other people get for this value?
>
> Keep in mind if we increase this value, we will get a more sequential
> scans vs. index scans.

With the new script I get 0.929825 on 2 IBM DTLA 5400RPM (80GB) with a  3Ware
6400 Controller (RAID-1)

Best regards,Mario Weilguni


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

Oliver Elphick

Date:

09 September 2002, 07:52:37

On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:
> 
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> I get _much_ lower numbers now for random_page_cost.
> 
> ---------------------------------------------------------------------------

Five successive runs:

random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.894737


linux 2.4.18 SMP
dual Athlon MP 1900+
512Mb RAM
SCSI

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                            
http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
========================================   "Submit yourselves therefore to God. Resist the devil,      and he will flee
fromyou."        James 4:7

Re: Script to compute random page cost

From

Rod Taylor

Date:

09 September 2002, 08:00:42

On Mon, 2002-09-09 at 02:13, Bruce Momjian wrote:
> 
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> I get _much_ lower numbers now for random_page_cost.

The current script pulls way more data for Sequential scan than random
scan now.

Random is pulling a single page (count=1 for dd) with every loop. 
Sequential does the same number of loops, but pulls count > 1 in each.

In effect, sequential is random with more data load -- which explains
all of the 0.9's.

 Rod Taylor

Re: Script to compute random page cost

From

"Nick Fankhauser"

Date:

09 September 2002, 12:42:25

Bruce-

With the change in the script that I mentioned to you off-list (which I
believe just pointed it at our "real world" data), I got the following
results with 6 successive runs on each of our two development platforms:

(We're running PGSQL 7.2.1 on Debian Linux 2.4)

System 1:
1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
random_page_cost = 0.857143
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.857143
random_page_cost = 0.884615

System 2:
Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
SCSI RAID 5 with Hardware controller.
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737


I was surprised that the SCSI RAID drive is generally slower than IDE, but
the values are in line with the results that others have been getting.

-Nick

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
> Sent: Monday, September 09, 2002 1:14 AM
> To: PostgreSQL-development
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
>
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.

Re: Script to compute random page cost

From

"scott.marlowe"

Date:

09 September 2002, 13:55:06

I'm getting an infinite wait on that file, could someone post it to the 
list please?

On Mon, 9 Sep 2002, Bruce Momjian wrote:

> 
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> I get _much_ lower numbers now for random_page_cost.

Re: Script to compute random page cost

From

"Nick Fankhauser"

Date:

09 September 2002, 14:27:49

Hi again-

I bounced these numbers off of Ray Ontko here at our shop, and he pointed
out that random page cost is measured in multiples of a sequential page
fetch. It seems almost impossible that a random fetch would be less
expensive than a sequential fetch, yet we all seem to be getting results <
1. I can't see anything obviously wrong with the script, but something very
odd is going.

-Nick

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Nick Fankhauser
> Sent: Monday, September 09, 2002 11:25 AM
> To: Bruce Momjian; PostgreSQL-development
> Cc: Ray Ontko
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
> Bruce-
>
> With the change in the script that I mentioned to you off-list (which I
> believe just pointed it at our "real world" data), I got the following
> results with 6 successive runs on each of our two development platforms:
>
> (We're running PGSQL 7.2.1 on Debian Linux 2.4)
>
> System 1:
> 1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
> random_page_cost = 0.857143
> random_page_cost = 0.809524
> random_page_cost = 0.809524
> random_page_cost = 0.809524
> random_page_cost = 0.857143
> random_page_cost = 0.884615
>
> System 2:
> Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
> SCSI RAID 5 with Hardware controller.
> random_page_cost = 0.894737
> random_page_cost = 0.842105
> random_page_cost = 0.894737
> random_page_cost = 0.894737
> random_page_cost = 0.842105
> random_page_cost = 0.894737
>
>
> I was surprised that the SCSI RAID drive is generally slower than IDE, but
> the values are in line with the results that others have been getting.
>
> -Nick
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org
> > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
> > Sent: Monday, September 09, 2002 1:14 AM
> > To: PostgreSQL-development
> > Subject: Re: [HACKERS] Script to compute random page cost
> >
> >
> >
> > OK, turns out that the loop for sequential scan ran fewer times and was
> > skewing the numbers.  I have a new version at:
> >
> >     ftp://candle.pha.pa.us/pub/postgresql/randcost
> >
> > I get _much_ lower numbers now for random_page_cost.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Re: Script to compute random page cost

From

Tom Lane

Date:

09 September 2002, 17:09:52

"Nick Fankhauser" <nickf@ontko.com> writes:
> I bounced these numbers off of Ray Ontko here at our shop, and he pointed
> out that random page cost is measured in multiples of a sequential page
> fetch. It seems almost impossible that a random fetch would be less
> expensive than a sequential fetch, yet we all seem to be getting results <
> 1. I can't see anything obviously wrong with the script, but something very
> odd is going.

The big problem with the script is that it involves an invocation of
"dd" - hence, at least one process fork --- for every page read
operation.  The seqscan part of the test is even worse, as it adds a
test(1) call and a shell if/then/else to the overhead.  My guess is that
we are measuring script overhead here, and not the desired I/O quantities
at all --- the script overhead is completely swamping the latter.  The
apparent stability of the results across a number of different platforms
bolsters that thought.

Someone else opined that the script was also not comparing equal
numbers of pages read for the random and sequential cases.  I haven't
tried to decipher the logic enough to see if that allegation is true,
but it's not obviously false.

Finally, I wouldn't believe the results for a moment if they were taken
against databases that are not several times the size of physical RAM
on the test machine, with a total I/O volume also much more than
physical RAM.  We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.
        regards, tom lane

Re: Script to compute random page cost

From

Bruce Momjian

Date:

09 September 2002, 21:24:37

Nick Fankhauser wrote:
> Hi again-
> 
> I bounced these numbers off of Ray Ontko here at our shop, and he pointed
> out that random page cost is measured in multiples of a sequential page
> fetch. It seems almost impossible that a random fetch would be less
> expensive than a sequential fetch, yet we all seem to be getting results <
> 1. I can't see anything obviously wrong with the script, but something very
> odd is going.

OK, new version at:
ftp://candle.pha.pa.us/pub/postgresql/randcost

What I have done is to take all of the computation stuff out of the
timed loop so only the 'dd' is done in the loop.

I am getting a 1.0 for random pages cost with this new code, but I don't
have much data in the database so it is very possible I have it all
cached.  Would others please test it?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

Curt Sampson

Date:

09 September 2002, 22:54:07

On Mon, 9 Sep 2002, Tom Lane wrote:

> Finally, I wouldn't believe the results for a moment if they were taken
> against databases that are not several times the size of physical RAM
> on the test machine, with a total I/O volume also much more than
> physical RAM.  We are trying to measure the behavior when kernel
> caching is not helpful; if the database fits in RAM then you are just
> naturally going to get random_page_cost close to 1, because the kernel
> will avoid doing any I/O at all.

Um...yeah; another reason to use randread against a raw disk device.
(A little hard to use on linux systems, I bet, but works fine on
BSD systems.)

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

Tom Lane

Date:

09 September 2002, 23:44:39

Curt Sampson <cjs@cynic.net> writes:
> On Mon, 9 Sep 2002, Tom Lane wrote:
>> ...  We are trying to measure the behavior when kernel
>> caching is not helpful; if the database fits in RAM then you are just
>> naturally going to get random_page_cost close to 1, because the kernel
>> will avoid doing any I/O at all.

> Um...yeah; another reason to use randread against a raw disk device.
> (A little hard to use on linux systems, I bet, but works fine on
> BSD systems.)

Umm... not really; surely randread wouldn't know anything about
read-ahead logic?

The reason this is a difficult topic is that we are trying to measure
certain kernel behaviors --- namely readahead for sequential reads ---
and not others --- namely caching, because we have other parameters
of the cost models that purport to deal with that.

Mebbe this is an impossible task and we need to restructure the cost
models from the ground up.  But I'm not convinced of that.  The fact
that a one-page shell script can't measure the desired quantity doesn't
mean we can't measure it with more effort.
        regards, tom lane

Re: Script to compute random page cost

From

Curt Sampson

Date:

10 September 2002, 00:19:46

On Mon, 9 Sep 2002, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > On Mon, 9 Sep 2002, Tom Lane wrote:
> >> ...  We are trying to measure the behavior when kernel
> >> caching is not helpful; if the database fits in RAM then you are just
> >> naturally going to get random_page_cost close to 1, because the kernel
> >> will avoid doing any I/O at all.
>
> > Um...yeah; another reason to use randread against a raw disk device.
> > (A little hard to use on linux systems, I bet, but works fine on
> > BSD systems.)
>
> Umm... not really; surely randread wouldn't know anything about
> read-ahead logic?

Randread doesn't know anything about read-ahead logic, but I don't
see how that matters one way or the other. The chances of it reading
blocks sequentially are pretty much infinitesimal if you're reading
across a reasonably large area of disk (I recommend at least 4GB),
so readahead will never be triggered.

> The reason this is a difficult topic is that we are trying to measure
> certain kernel behaviors --- namely readahead for sequential reads ---
> and not others --- namely caching, because we have other parameters
> of the cost models that purport to deal with that.

Well, for the sequential reads, the readahead should be trigerred
even when reading from a raw device. So just use dd to measure
that.  If you want to slightly more accurately model postgres'
behaviour, you probably want to pick a random area of the disk,
read a gigabyte, switch areas, read another gigabyte, and so on.
This will model the "split into 1GB" files thing.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

Bruce Momjian

Date:

10 September 2002, 02:02:06

OK, I have a better version at:
ftp://candle.pha.pa.us/pub/postgresql/randcost

I have added a null loop which does a dd on a single file without
reading any data, and by netting that loop out of the total computation
and increasing the number of tests, I have gotten the following results
for three runs:random test:       36sequential test:   33null timing test:  27random_page_cost = 1.500000
random test:       38sequential test:   32null timing test:  27random_page_cost = 2.200000
random test:       40sequential test:   31null timing test:  27random_page_cost = 3.250000

Interesting that random time is increasing, while the others were
stable.  I think this may have to do with other system activity at the
time of the test.  I will run it some more tomorrow but clearly we are
seeing reasonable numbers now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

"Christopher Kings-Lynne"

Date:

10 September 2002, 02:23:34

I got somewhat different:

$ ./randcost /usr/local/pgsql/data
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test:       13
sequential test:   15
null timing test:  11

random_page_cost = 0.500000

Chris


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
> Sent: Tuesday, 10 September 2002 2:02 PM
> To: Curt Sampson
> Cc: Tom Lane; nickf@ontko.com; PostgreSQL-development; Ray Ontko
> Subject: Re: [HACKERS] Script to compute random page cost
> 
> 
> OK, I have a better version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> I have added a null loop which does a dd on a single file without
> reading any data, and by netting that loop out of the total computation
> and increasing the number of tests, I have gotten the following results
> for three runs:
>     
>     random test:       36
>     sequential test:   33
>     null timing test:  27
>     
>     random_page_cost = 1.500000
> 
>     
>     random test:       38
>     sequential test:   32
>     null timing test:  27
>     
>     random_page_cost = 2.200000
>     
> 
>     random test:       40
>     sequential test:   31
>     null timing test:  27
>     
>     random_page_cost = 3.250000
> 
> Interesting that random time is increasing, while the others were
> stable.  I think this may have to do with other system activity at the
> time of the test.  I will run it some more tomorrow but clearly we are
> seeing reasonable numbers now.
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, 
> Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: Script to compute random page cost

From

Curt Sampson

Date:

10 September 2002, 02:24:24

On Tue, 10 Sep 2002, Bruce Momjian wrote:

> Interesting that random time is increasing, while the others were
> stable.  I think this may have to do with other system activity at the
> time of the test.

Actually, the random versus sequential time may also be different
depending on how many processes are competing for disk access, as
well. If the OS isn't maintaining readahead for whatever reason,
sequential access could, in theory, degrade to being the same speed
as random access. It might be interesting to test this, too.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

"Mario Weilguni"

Date:

10 September 2002, 04:17:15

>OK, I have a better version at:

The script is now broken, I get:
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test:       14
sequential test:   16
null timing test:  14

random_page_cost = 0.000000

Re: Script to compute random page cost

From

Mark Kirkwood

Date:

10 September 2002, 06:33:15

I was attempting to measure random page cost a while ago -
I used three programs in this archive :

http://techdocs.postgresql.org/markir/download/benchtool/

It writes a single big file and seems to give more realistic 
measurements  ( like 6 for a Solaris scsi system and 10 for a Linux ide 
one...)

Have a look and see if you can cannibalize it for your program


Cheers

Mark

Re: Script to compute random page cost

From

Oliver Elphick

Date:

10 September 2002, 06:47:55

On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:
> 
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers.  I have a new version at:
> 
>     ftp://candle.pha.pa.us/pub/postgresql/randcost

Latest version:

olly@linda$ 
random test:       14
sequential test:   11
null timing test:  9
random_page_cost = 2.500000

olly@linda$ for a in 1 2 3 4 5
> do
> ~/randcost
> done
Collecting sizing information ...
random test:       11
sequential test:   11
null timing test:  9
random_page_cost = 1.000000

random test:       11
sequential test:   10
null timing test:  9
random_page_cost = 2.000000

random test:       11
sequential test:   11
null timing test:  9
random_page_cost = 1.000000

random test:       11
sequential test:   10
null timing test:  9
random_page_cost = 2.000000

random test:       10
sequential test:   10
null timing test:  10
Sequential time equals null time.  Increase TESTCYCLES and rerun.


Available memory (512M) exceeds the total database size, so sequential
and random are almost the same for the second and subsequent runs.
Since, in production, I would hope to have all active tables permanently
in RAM, would there be a case for my using a page cost of 1 on the
assumption that no disk reads would be needed?

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                            
http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
========================================   "Draw near to God and he will draw near to you.       Cleanse your hands,
yousinners; and purify your       hearts, you double minded."       James 4:8

Re: Script to compute random page cost

From

Tom Lane

Date:

10 September 2002, 10:01:08

Curt Sampson <cjs@cynic.net> writes:
> Well, for the sequential reads, the readahead should be trigerred
> even when reading from a raw device.

That strikes me as an unportable assumption.

Even if true, we can't provide a test mechanism that requires root
access to run it --- raw-device testing is out of the question just on
that basis, never mind that it is not measuring what we want to measure.

Perhaps it's time to remind people that what we want to measure
is the performance seen by a C program issuing write() and read()
commands, transferring 8K at a time, on a regular Unix filesystem.
A shell script invoking dd is by definition going to see a very
different performance ratio, even if what dd does under the hood
is 8K read() and write() (another not-well-supported assumption,
IMHO).  If you try to "improve" the results by using a raw device,
you're merely moving even further away from the scenario of interest.
        regards, tom lane

Re: Script to compute random page cost

From

Tom Lane

Date:

10 September 2002, 10:12:04

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I will run it some more tomorrow but clearly we are
> seeing reasonable numbers now.

... which still have no provable relationship to the ratio we need to
measure.  See my previous comments to Curt; I don't think you can
possibly get trustworthy results out of a shell script + dd approach,
because we do not implement Postgres using dd.

If you implemented a C testbed and then proved by experiment that the
shell script got comparable numbers, then I'd believe its results.
Without that confirmation, these are just meaningless numbers.
        regards, tom lane

Re: Script to compute random page cost

From

Bruce Momjian

Date:

10 September 2002, 11:27:36

OK, what you are seeing here is that for your platform the TESTCYCLES
size isn't enough;  the numbers are too close to measure the difference.

I am going to increase the TESTCYCLES from 5k to 10k.  That should
provide better numbers.

---------------------------------------------------------------------------

Oliver Elphick wrote:
> On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:
> > 
> > OK, turns out that the loop for sequential scan ran fewer times and was
> > skewing the numbers.  I have a new version at:
> > 
> >     ftp://candle.pha.pa.us/pub/postgresql/randcost
> 
> Latest version:
> 
> olly@linda$ 
> random test:       14
> sequential test:   11
> null timing test:  9
> random_page_cost = 2.500000
> 
> olly@linda$ for a in 1 2 3 4 5
> > do
> > ~/randcost
> > done
> Collecting sizing information ...
> random test:       11
> sequential test:   11
> null timing test:  9
> random_page_cost = 1.000000
> 
> random test:       11
> sequential test:   10
> null timing test:  9
> random_page_cost = 2.000000
> 
> random test:       11
> sequential test:   11
> null timing test:  9
> random_page_cost = 1.000000
> 
> random test:       11
> sequential test:   10
> null timing test:  9
> random_page_cost = 2.000000
> 
> random test:       10
> sequential test:   10
> null timing test:  10
> Sequential time equals null time.  Increase TESTCYCLES and rerun.
> 
> 
> Available memory (512M) exceeds the total database size, so sequential
> and random are almost the same for the second and subsequent runs.
>  
> Since, in production, I would hope to have all active tables permanently
> in RAM, would there be a case for my using a page cost of 1 on the
> assumption that no disk reads would be needed?
> 
> -- 
> Oliver Elphick                                Oliver.Elphick@lfix.co.uk
> Isle of Wight, UK                            
> http://www.lfix.co.uk/oliver
> GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
>                  ========================================
>      "Draw near to God and he will draw near to you.  
>       Cleanse your hands, you sinners; and purify your  
>       hearts, you double minded."       James 4:8 
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

Bruce Momjian

Date:

10 September 2002, 11:29:54

Oliver Elphick wrote:
> Available memory (512M) exceeds the total database size, so sequential
> and random are almost the same for the second and subsequent runs.
>  
> Since, in production, I would hope to have all active tables permanently
> in RAM, would there be a case for my using a page cost of 1 on the
> assumption that no disk reads would be needed?

Yes, in your case random_page_cost would be 1 once the data gets into
RAM.

In fact, that is the reason I used only /data/base for testing so places
where data can load into ram will see lower random pages costs.

I could just create a random file and test on that but it isn't the
same.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Script to compute random page cost

From

Curt Sampson

Date:

10 September 2002, 21:51:46

On Tue, 10 Sep 2002, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > Well, for the sequential reads, the readahead should be trigerred
> > even when reading from a raw device.
>
> That strikes me as an unportable assumption.

Not only unportable: but false. :-) NetBSD, at least, does read-ahead
only through the buffer cache. Thinking about it, you *can't* do
read-ahead on a raw device, because you're not buffering. Doh!

> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()
> commands, transferring 8K at a time, on a regular Unix filesystem.

Right. Which is what randread does, if you give it a file rather
than a raw device. I'm actually just now working on some modifications
for it that will let you work against a bunch of files, rather than
just one, so it will very accurately emulate a postgres random read
of blocks from a table.

There are two other tricky things related to the behaviour, however:

1. The buffer cache. You really need to be working against your
entire database, not just a few gigabytes of its data, or sample
data.

2. Multiple users. You really want a mix of simultaneous accesses
going on, with as many processes as you normally have users querying
the database.

These can probably both be taken care of with shell scripts, though.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

Mark Kirkwood

Date:

11 September 2002, 01:47:08

Tom Lane wrote:
> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()>commands, transferring 8K at a time, on a regular
Unixfilesystem

Yes...and at the risk of being accused of marketing ;-) , that is 
exactly what the 3 programs in my archive do (see previous post for url) :

- one called 'write' creates a suitably sized data file (8k at a time - 
configurable), using the write() call
- another called 'read' does sequential reads (8k at a time - 
configurable), using the read() call
- finally one called 'seek' does random reads (8k chunks - 
configurable), using the lseek() and read() calls

I tried to use code as similar as possible to how Postgres does its 
io....so the results *should* be meaningful !
Large file support in enabled too (as you need to use a file several 
times bigger than your RAM - and everyone seems to have >1G of it these 
days...)

I think the code is reasonably readable too....
Its been *tested* on Linux, Freebsd, Solaris, MacosX.

The only downer is that they don't automatically compute 
random_page_cost for you..(I was more interested in the raw sequential 
read, write and random read rates at the time). However it would be a 
fairly simple modification to combine the all 3 programs into one 
executable that outputs random_page_cost...

regards

Mark

Re: Script to compute random page cost

From

Curt Sampson

Date:

11 September 2002, 03:18:21

On Wed, 11 Sep 2002, Mark Kirkwood wrote:

> Yes...and at the risk of being accused of marketing ;-) , that is
> exactly what the 3 programs in my archive do (see previous post for url) :

Hm, it appears we've both been working on something similar. However,
I've just released version 0.2 of randread, which has the following
features:
   Written in C, uses read(2) and write(2), pretty much like postgres.
   Reads or writes random blocks from a specified list of files,   treated as a contiguous range of blocks, again like
postgres.This   allows you to do random reads from the actual postgres data files   for a table, if you like.

   You can specify the block size to use, and the number of reads to do.
   Allows you to specify how many blocks you want to read before you   start reading again at a new random location.
(Thedefault is 1.)   This allows you to model various sequential and random read mixes.

If you want to do writes, I suggest you create your own set of files to
write, rather than destroying postgresql data. This can easily a be done
with something like this Bourne shell script:
   for i in 1 2 3 4; dodd if=/dev/zero of=file.$i bs=1m count=1024   done

However, it doesn't calculate the random vs. sequential ratio for you;
you've got to do that for yourself. E.g.,:

$ ./randread -l 512 -c 256 /u/cjs/z?
256 reads of 512 x 8.00 KB blocks (4096.00 KB) totalling 131072 blocks (1024.00 MB) from 524288 blocks (4092.00 MB) in
4files.

256 reads in 36.101119 sec. (141019 usec/read, 7 reads/sec, 29045.53 KB/sec)

$ ./randread -c 4096 /u/cjs/z?
4096 reads of 1 x 8.00 KB blocks (8.00 KB) totalling 4096 blocks (32.00 MB) from 524288 blocks (4095.99 MB) in 4
files.
4096 reads in 34.274582 sec. (8367 usec/read, 120 reads/sec, 956.04 KB/sec)

In this case, across 4 GB in 4 files on my 512 MB, 1.5 GHz Athlon
with an IBM 7200 RPM IDE drive, I read about 30 times faster doing
a full sequential read of the files than I do reading 32 MB randomly
from it.  But because of the size of this, there's basically no
buffer cache involved. If I do this on a single 512 MB file:

$ ./randread -c 4096 /u/cjs/z1:0-65536
4096 reads of 1 x 8.00 KB blocks (8.00 KB) totalling 4096 blocks (32.00 MB) from 65536 blocks (511.99 MB) in 1 files.
4096 reads in 28.064573 sec. (6851 usec/read, 146 reads/sec, 1167.59 KB/sec)

$ ./randread -l 65535 -c 1 /u/cjs/z1:0-65536
1 reads of 65535 x 8.00 KB blocks (524280.00 KB) totalling 65535 blocks (511.99 MB) from 65536 blocks (0.01 MB) in 1
files.
1 reads in 17.107867 sec. (17107867 usec/read, 0 reads/sec, 30645.55 KB/sec)

$ ./randread -c 4096 /u/cjs/z1:0-65536
4096 reads of 1 x 8.00 KB blocks (8.00 KB) totalling 4096 blocks (32.00 MB) from 65536 blocks (511.99 MB) in 1 files.
4096 reads in 19.413738 sec. (4739 usec/read, 215 reads/sec, 1687.88 KB/sec)

Well, there you see some of the buffer cache effect from starting
with about half the file in memory. If you want to see serious buffer
cache action, just use the first 128 MB of my first test file:

$ ./randread -c 4096 /u/cjs/z1:0-16536
4096 reads of 1 x 8.00 KB blocks (8.00 KB) totalling 4096 blocks (32.00 MB) from 16536 blocks (129.18 MB) in 1 files.
4096 reads in 20.220791 sec. (4936 usec/read, 204 reads/sec, 1620.51 KB/sec)

$ ./randread -l 16535 -c 1 /u/cjs/z1:0-16536
1 reads of 16535 x 8.00 KB blocks (132280.00 KB) totalling 16535 blocks (129.18 MB) from 16536 blocks (0.01 MB) in 1
files.
1 reads in 3.469231 sec. (3469231 usec/read, 0 reads/sec, 38129.49 KB/sec)

$  ./randread -l 16535 -c 64 /u/cjs/z1:0-16536
64 reads of 16535 x 8.00 KB blocks (132280.00 KB) totalling 1058240 blocks (8267.50 MB) from 16536 blocks (0.01 MB) in
1files.

64 reads in 23.643026 sec. (369422 usec/read, 2 reads/sec, 358072.59 KB/sec)

For those last three, we're basically limited completely by the
CPU, as there's not much disk I/O going on at all. The many-block
one is going to be slower because it's got to generate a lot more
random numbers and do a lot more lseek operations.

Anyway, looking at the real difference between truly sequential
and truly random reads on a large amount of data file (30:1 or so),
it looks to me that people getting much less than that are getting
good work out of their buffer cache. You've got to wonder if there's
some way to auto-tune for this sort of thing....

Anyway, feel free to download and play. If you want to work on the
program, I'm happy to give developer access on sourceforge.
   http://sourceforge.net/project/showfiles.php?group_id=55994

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC

Re: Script to compute random page cost

From

Hans-Jürgen Schönig

Date:

11 September 2002, 03:53:21

AMD Athlon 500
512MB Ram
IBM 120GB IDE

Tested with:
BLCKSZ=8192
TESTCYCLES=500000

Result:
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test:       2541
sequential test:   2455
null timing test:  2389

random_page_cost = 2.303030
   Hans


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>

Re: Script to compute random page cost

From

Mark Kirkwood

Date:

11 September 2002, 04:04:25

Curt Sampson wrote:
> On Wed, 11 Sep 2002, Mark Kirkwood wrote:
> 
> 
> 
> Hm, it appears we've both been working on something similar. However,
> I've just released version 0.2 of randread, which has the following
> features:
> 


funny how often that happens...( I think its often worth the effort to 
write your own benchmarking / measurement tool in order to gain an good 
understanding of what you intend to measure)
>Anyway, feel free to download and play. If you want to work on the>program, I'm happy to give developer access on
sourceforge.>>  http://sourceforge.net/project/showfiles.php?group_id=55994
 

I'll take a look.


best wishes

Mark

Re: Script to compute random page cost

From

Guido Goldstein

Date:

16 September 2002, 10:29:36

Hi!

On Tue, 10 Sep 2002 14:01:11 +0000 (UTC) tgl@sss.pgh.pa.us (Tom Lane) wrote:
[...]
> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()
> commands, transferring 8K at a time, on a regular Unix filesystem.
[...]

I've written something like that.
It is not C but might be useful.
Any comments are welcome.

http://www.a-nugget.org/downloads/randread.py

Bye Guido

Re: Script to compute random page cost

From

Justin Clift

Date:

12 October 2002, 08:28:26

Hi all,

As an end result of all this, do we now have a decent utility by which
end user admin's can run it against the same disk/array that their
PostgreSQL installation is on, and get a reasonably accurate number for
random page cost?

ie:

$ ./get_calc_cost
Try using random_page_cost = foo

$

:-)

Regards and best wishes,

Justin Clift

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi

Re: Script to compute random page cost

From

Bruce Momjian

Date:

12 October 2002, 10:45:35

Justin Clift wrote:
> Hi all,
> 
> As an end result of all this, do we now have a decent utility by which
> end user admin's can run it against the same disk/array that their
> PostgreSQL installation is on, and get a reasonably accurate number for
> random page cost?
> 
> ie:
> 
> $ ./get_calc_cost
> Try using random_page_cost = foo
> 
> $
> 
> :-)

Right now we only have my script:
ftp://candle.pha.pa.us/pub/postgresql/randcost

It uses dd so it forks for every loop and shows a value for my machine
around 2.5.  I need to code the loop in C to get more accurate numbers.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073