Thread: Prefetch

Prefetch

From

Matt Olson

Date:

15 May 2005, 15:28:06

I wanted to get some opinions about row prefetching.  AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the operating
system.

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has
around 30,000 records (tickers).  A typical operation is to get the 200 day
simple moving average (of price) for each ticker and write the result to a
summary table.  In running this process (Perl/DBI), it is typical to see
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next
day's date, the postgres cache and file cache is now populated with 199 days
of the needed data, postgres runs 80-90% of CPU and total run time is greatly
reduced.  My conclusion is that this is a high cache hit rate in action.

I've done other things that make sense, like using indexes, playing with the
planner constants and turning up the postgres cache buffers.

Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is no
apparent difference in database performance.  The random nature of the I/O
drops disk reads down to about 1MB/sec for the array.  A linear table scan
can easily yield 70-80MB/sec on this system.  Total table size is usually
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
there been serious consideration of implementing something like a prefetch
subsystem?  Does anyone have any opinions as to why this would be a bad idea
for postgres?

Postges is great for a multiuser environment and OLTP applications.  However,
in this set up, a data warehouse, the observed performance is not what I
would hope for.

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

Re: Prefetch

From

"Joel Fradkin"

Date:

16 May 2005, 14:41:22

My only comment is what is the layout of your data (just one table with
indexes?).
I found on my date with dozens of joins my view speed was not good for me to
use, so I made a flat file with no joins and it flies.

Joel Fradkin

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and delete and destroy
all copies of the original message, including attachments.




-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Matt Olson
Sent: Monday, May 09, 2005 9:10 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Prefetch

I wanted to get some opinions about row prefetching.  AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the
operating
system.

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has
around 30,000 records (tickers).  A typical operation is to get the 200 day
simple moving average (of price) for each ticker and write the result to a
summary table.  In running this process (Perl/DBI), it is typical to see
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next

day's date, the postgres cache and file cache is now populated with 199 days

of the needed data, postgres runs 80-90% of CPU and total run time is
greatly
reduced.  My conclusion is that this is a high cache hit rate in action.

I've done other things that make sense, like using indexes, playing with the

planner constants and turning up the postgres cache buffers.

Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is
no
apparent difference in database performance.  The random nature of the I/O
drops disk reads down to about 1MB/sec for the array.  A linear table scan
can easily yield 70-80MB/sec on this system.  Total table size is usually
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.
Has
there been serious consideration of implementing something like a prefetch
subsystem?  Does anyone have any opinions as to why this would be a bad idea

for postgres?

Postges is great for a multiuser environment and OLTP applications.
However,
in this set up, a data warehouse, the observed performance is not what I
would hope for.

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend