Re: Prefetch - Mailing list pgsql-performance

From Matt Olson
Subject Re: Prefetch
Date
Msg-id 200505101136.31759.molson@oceanconsulting.com
Whole thread Raw
In response to Re: Prefetch  (Greg Stark <gsstark@mit.edu>)
List pgsql-performance
My postgres binaries and WAL are on a separate disk from the raid array.  The
table I'm doing the selects from is probably about 4GB in size and 18-20
million records.  No concurrent or dependent inserts or deletes are going on.

Tom's point and your points about optimizing the application are well taken.
I know my approach is sub optimal and prone to getting caught by latency
issues (seek times, cache hit rates, etc.).  However, the question of
prefetch in my mind is all about eliminating latencies, so, I thought my
problem would be good for the sake of discussing prefetching.

The two approaches I'm in the process of testing are Rod and Greg's suggestion
of using 'CLUSTER'.  And for the sake of not letting a good idea get away,
I'll probably spend time on doing a parallel query approach which Tom
suggested.

I'll report back to the list what I find and maybe do some _rough_
benchmarking.  This is a production app, so I can't get too much in the way
of the daily batches.

--
Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

On Tuesday 10 May 2005 11:13 am, Greg Stark wrote:
> Matt Olson writes:
> > I've done other things that make sense, like using indexes, playing with
> > the planner constants and turning up the postgres cache buffers.
> >
> > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields
> > no apparent difference in database performance.  The random nature of the
> > I/O drops disk reads down to about 1MB/sec for the array.  A linear table
> > scan can easily yield 70-80MB/sec on this system.  Total table size is
> > usually around 1GB and with indexes should be able to fit completely in
> > main memory.
>
> Actually forcing things to use indexes is the wrong direction to go if
> you're trying to process lots of data and want to stream it off disk as
> rapidly as possible. I would think about whether you can structure your
> data such that you can use sequential scans. That might mean partitioning
> your raw data into separate tables and then accessing only the partitions
> that are relevant to the query.
>
> In your application that might be hard. It sounds like you would need more
> or less one table per stock ticker which would really be hard to manage.
>
> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.
>
> I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
> even for completely random reads. Is it possible you're seeing something
> else interfering? Do you have INSERT/UPDATE/DELETE transactions happening
> concurrently with this select scan? If so you should strongly look into
> separating the transaction log from the data files.



pgsql-performance by date:

Previous
From:
Date:
Subject: RE: Partitioning / Clustering
Next
From: Tom Lane
Date:
Subject: Re: Prefetch