Thread: Prefetch

Prefetch

From

Matt Olson

Date:

10 May 2005, 10:52:29

I wanted to get some opinions about row prefetching.  AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the operating
system.

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has
around 30,000 records (tickers).  A typical operation is to get the 200 day
simple moving average (of price) for each ticker and write the result to a
summary table.  In running this process (Perl/DBI), it is typical to see
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next
day's date, the postgres cache and file cache is now populated with 199 days
of the needed data, postgres runs 80-90% of CPU and total run time is greatly
reduced.  My conclusion is that this is a high cache hit rate in action.

I've done other things that make sense, like using indexes, playing with the
planner constants and turning up the postgres cache buffers.

Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
apparent difference in database performance.  The random nature of the I/O
drops disk reads down to about 1MB/sec for the array.  A linear table scan
can easily yield 70-80MB/sec on this system.  Total table size is usually
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
there been serious consideration of implementing something like a prefetch
subsystem?  Does anyone have any opinions as to why this would be a bad idea
for postgres?

Postges is great for a multiuser environment and OLTP applications.  However,
in this set up, a data warehouse, the observed performance is not what I
would hope for.

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

Re: Prefetch

From

Tom Lane

Date:

10 May 2005, 11:16:33

Matt Olson <molson@oceanconsulting.com> writes:
> Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
> there been serious consideration of implementing something like a prefetch
> subsystem?

No.

> Does anyone have any opinions as to why this would be a bad idea for
> postgres?

We know even less than the OS does about disk layout, and not a lot more
than it about what our next request will be.  (If we're doing a seqscan,
then of course that's not true, but I would expect the OS to be able to
figure that one out and do readahead.)

You haven't shown us your problem queries, but I think that conventional
query tuning would be a more appropriate answer.  In particular I wonder
whether you shouldn't be looking at ways to calculate multiple
aggregates in parallel.

            regards, tom lane

Re: Prefetch

From

Rod Taylor

Date:

10 May 2005, 12:05:37

> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.

After you load the new days data try running CLUSTER on the structure
using a key of (stockID, date) -- probably your primary key.

This should significantly reduce the amount of IO required for your
calculations involving a few stocks over a period of time.

--

Re: Prefetch

From

Greg Stark

Date:

10 May 2005, 15:13:31

Matt Olson <molson@oceanconsulting.com> writes:

> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.
>
> Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
> apparent difference in database performance.  The random nature of the I/O
> drops disk reads down to about 1MB/sec for the array.  A linear table scan
> can easily yield 70-80MB/sec on this system.  Total table size is usually
> around 1GB and with indexes should be able to fit completely in main memory.

Actually forcing things to use indexes is the wrong direction to go if you're
trying to process lots of data and want to stream it off disk as rapidly as
possible. I would think about whether you can structure your data such that
you can use sequential scans. That might mean partitioning your raw data into
separate tables and then accessing only the partitions that are relevant to
the query.

In your application that might be hard. It sounds like you would need more or
less one table per stock ticker which would really be hard to manage.

One thing you might look into is using the CLUSTER command. But postgres
doesn't maintain the cluster ordering so it would require periodically
rerunning it.

I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
even for completely random reads. Is it possible you're seeing something else
interfering? Do you have INSERT/UPDATE/DELETE transactions happening
concurrently with this select scan? If so you should strongly look into
separating the transaction log from the data files.

--
greg

Re: Prefetch

From

Matt Olson

Date:

10 May 2005, 15:36:54

My postgres binaries and WAL are on a separate disk from the raid array.  The
table I'm doing the selects from is probably about 4GB in size and 18-20
million records.  No concurrent or dependent inserts or deletes are going on.

Tom's point and your points about optimizing the application are well taken.
I know my approach is sub optimal and prone to getting caught by latency
issues (seek times, cache hit rates, etc.).  However, the question of
prefetch in my mind is all about eliminating latencies, so, I thought my
problem would be good for the sake of discussing prefetching.

The two approaches I'm in the process of testing are Rod and Greg's suggestion
of using 'CLUSTER'.  And for the sake of not letting a good idea get away,
I'll probably spend time on doing a parallel query approach which Tom
suggested.

I'll report back to the list what I find and maybe do some _rough_
benchmarking.  This is a production app, so I can't get too much in the way
of the daily batches.

--
Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

On Tuesday 10 May 2005 11:13 am, Greg Stark wrote:
> Matt Olson writes:
> > I've done other things that make sense, like using indexes, playing with
> > the planner constants and turning up the postgres cache buffers.
> >
> > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields
> > no apparent difference in database performance.  The random nature of the
> > I/O drops disk reads down to about 1MB/sec for the array.  A linear table
> > scan can easily yield 70-80MB/sec on this system.  Total table size is
> > usually around 1GB and with indexes should be able to fit completely in
> > main memory.
>
> Actually forcing things to use indexes is the wrong direction to go if
> you're trying to process lots of data and want to stream it off disk as
> rapidly as possible. I would think about whether you can structure your
> data such that you can use sequential scans. That might mean partitioning
> your raw data into separate tables and then accessing only the partitions
> that are relevant to the query.
>
> In your application that might be hard. It sounds like you would need more
> or less one table per stock ticker which would really be hard to manage.
>
> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.
>
> I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
> even for completely random reads. Is it possible you're seeing something
> else interfering? Do you have INSERT/UPDATE/DELETE transactions happening
> concurrently with this select scan? If so you should strongly look into
> separating the transaction log from the data files.

Re: Prefetch

From

Tom Lane

Date:

10 May 2005, 15:54:10

Greg Stark <gsstark@mit.edu> writes:
> Actually forcing things to use indexes is the wrong direction to go if you're
> trying to process lots of data and want to stream it off disk as rapidly as
> possible. I would think about whether you can structure your data such that
> you can use sequential scans.

Agreed.

> In your application that might be hard. It sounds like you would need more or
> less one table per stock ticker which would really be hard to manage.

Actually, in a previous lifetime I used to do pretty much the same stuff
Matt is working on.  The reason I suggested parallelizing is that what
you want is usually not so much the 200day moving average of FOO, as the
200day moving averages of a whole bunch of things.  If your input table
contains time-ordered data for all those things, then a seqscan works
out pretty well.

> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.

If the desired sort order is time-based, it falls out pretty much for
free in this application, because historical data doesn't change -- you
are only interested in appending at the right.

In said previous lifetime, we used Postgres for tracking our actual
transactions, but we built a custom file format for storing the
individual tick data.  That's not stuff you need transactional semantics
for; the historical data is what it is.  Besides, you need to compress
it as much as you can because there's always too much of it.  Machines
are faster and disk space cheaper than they were at the time, but I'd
still question the wisdom of using a Postgres row for each daily bar,
let alone finer-grain data.

            regards, tom lane

Re: Prefetch

From

Sam Vilain

Date:

11 May 2005, 01:34:18

Matt Olson wrote:
> Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
> there been serious consideration of implementing something like a prefetch
> subsystem?  Does anyone have any opinions as to why this would be a bad idea
> for postgres?
> Postges is great for a multiuser environment and OLTP applications.  However,
> in this set up, a data warehouse, the observed performance is not what I
> would hope for.

Oracle doesn't pre-fetch data to get its fast results in this case.
pre-fetching doesn't give you the 100 times speed increases.

Bitmap indexes are very important for data mining.  You might want to see

    http://www.it.iitb.ac.in/~rvijay/dbms/proj/

I have no idea how well developed this is, but this is often the biggest
win with Data Warehousing.  If it works, you'll get results back in seconds,
if it doesn't you'll have plenty of time while your queries run to reflect on
the possibility that commercial databases might actually have important features
that haven't even penetrated the awareness of most free database developers.

Another trick you can use with large data sets like this when you want results
back in seconds is to have regularly updated tables that aggregate the data
along each column normally aggregated against the main data set.

Of couse, Pg doesn't have the nice features that make this just work and make
queries against your data source faster (called "OLAP Query rewrite" in
Oracle), so you'll have to put a lot of work into changing your application to
figure out when to use the summary tables.  As far as I know it doesn't have
materialized views, either, so updating these summary tables is also a more
complex task than just a single REFRESH command.

Maybe some bright person will prove me wrong by posting some working
information about how to get these apparently absent features working.

You might also want to consider ditching RAID 5 and switching to plain
mirroring.  RAID 5 is a helluva performance penalty (by design).  This is
why they said RAID - fast, cheap, reliable - pick any two.  RAID 5 ain't
fast.  But that's probably not your main problem.

Sam.

>
> Regards,
>
> Matt Olson
> Ocean Consulting
> http://www.oceanconsulting.com/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Prefetch

From

Christopher Kings-Lynne

Date:

11 May 2005, 01:52:22

> Another trick you can use with large data sets like this when you want
> results
> back in seconds is to have regularly updated tables that aggregate the data
> along each column normally aggregated against the main data set.

> Maybe some bright person will prove me wrong by posting some working
> information about how to get these apparently absent features working.

Most people just use simple triggers to maintain aggregate summary tables...

Chris

Re: Prefetch

From

Rod Taylor

Date:

11 May 2005, 10:46:23

On Wed, 2005-05-11 at 12:53 +0800, Christopher Kings-Lynne wrote:
> > Another trick you can use with large data sets like this when you want
> > results
> > back in seconds is to have regularly updated tables that aggregate the data
> > along each column normally aggregated against the main data set.
>
> > Maybe some bright person will prove me wrong by posting some working
> > information about how to get these apparently absent features working.
>
> Most people just use simple triggers to maintain aggregate summary tables...

Agreed. I've also got a view which calls a function that will 1) use the
summary table where data exists, or 2) calculate the summary
information, load it into summary table, and send a copy to the client
(partial query results cache).

It's not all nicely abstracted behind user friendly syntax, but most of
those features can be cobbled together (with effort) in PostgreSQL.
--

Re: Prefetch

From

Bricklen Anderson

Date:

11 May 2005, 11:23:33

Christopher Kings-Lynne wrote:
>> Another trick you can use with large data sets like this when you want
>> results
>> back in seconds is to have regularly updated tables that aggregate the
>> data
>> along each column normally aggregated against the main data set.
>
>
>> Maybe some bright person will prove me wrong by posting some working
>> information about how to get these apparently absent features working.
>
>
> Most people just use simple triggers to maintain aggregate summary
> tables...
>
> Chris

However, if (insert) triggers prove to be too much of a performance hit, try
cron'd functions that perform the aggregation for you. This system works well
for us, using the pk's (sequence) for start and stop points.

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

Re: Prefetch

From

Mischa Sandberg

Date:

11 May 2005, 16:07:05

Quoting Christopher Kings-Lynne <chriskl@familyhealth.com.au>:

> > Another trick you can use with large data sets like this when you
> want
> > results
> > back in seconds is to have regularly updated tables that aggregate
> the data
> > along each column normally aggregated against the main data set.
>
> > Maybe some bright person will prove me wrong by posting some
> working
> > information about how to get these apparently absent features
> working.
>
> Most people just use simple triggers to maintain aggregate summary
> tables...

Don't know if this is more appropriate to bizgres, but:
What the first poster is talking about is what OLAP cubes do.

For big aggregating systems (OLAP), triggers perform poorly,
compared to messy hand-rolled code. You may have dozens
of aggregates at various levels. Consider the effect of having
each detail row cascade into twenty updates.

It's particularly silly-looking when data is coming in as
batches of thousands of rows in a single insert, e.g.

   COPY temp_table FROM STDIN;
   UPDATE fact_table ... FROM ... temp_table
   INSERT INTO fact_table ...FROM...temp_table

   (the above pair of operations is so common,
    Oracle added its "MERGE" operator for it).

Hence my recent post (request) for using RULES to aggregate
--- given no luck with triggers "FOR EACH STATEMENT".