Re: Reading data in bulk - help? - Mailing list pgsql-performance

From Magnus Naeslund(w)
Subject Re: Reading data in bulk - help?
Date
Msg-id 51478.129.178.88.66.1063194442.squirrel@mail2.fbab.net
Whole thread Raw
In response to Re: Reading data in bulk - help?  (Chris Huston <chuston@bangjafwac.com>)
Responses Re: Reading data in bulk - help?
List pgsql-performance
Chris Huston said:
> Thanks Josh that helped. I had gone looking for some kind of cluster
> option but was looking under create database, create index and
> initlocation - didn't see the CLUSTER index ON table.
>
> I ran the CLUSTER which took about 2 1/2 hours to complete. That
> improved the query performance about 6x - which is great - but is still
> taking 26 minutes to do what a serial read does in about 2 1/2 minutes.
>
> At this point I'm ok because each fetch is taking around 200
> milliseconds from call to the time the data is ready. The processing
> takes 300-600ms per batch. I've got the fetch and the processing
> running in separate threads so even if postgres was running faster it
> wouldn't help this implementation.
>
> However, "iostat" is still reporting average size per transfer of about
> 10kB and total thru-put of about 1MB/s. The transfers per second went
> from >200/s to about 80/s. It still seams like it ought to be a faster.
>
> The system is currently running on a single processor 500Mhz G4. We're
> likely to move to a two processor 2Ghz G5 in the next few months. Then
> each block may take only a 30-60 milliseconds to complete and their can
> be two concurrent blocks processing at once.
>
> Sometime before then I need to figure out how to cut the fetch times
> from the now 200ms to something like 10ms. There are currently
> 1,628,800 records in the single data table representing 6817 groups.
> Each group has 2 to 284 records - with 79% having the max 284 (max
> grows by 1 every day - although the value may change throughout the
> day). Each record is maybe 1 or 2k so ideally each batch/group should
> require 284-568k - at 10MB/s - that'd be
>
> RELATED QUESTION: How now do I speed up the following query: "select
> distinct group_id from datatable"? Which results in a sequential scan
> of the db. Why doesn't it use the group_id index? I only do this once
> per run so it's not as critical as the fetch speed which is done 6817
> times.
>
> Thanks for the help!
> - Chris
>

How are you fetching the data?
If you are using cursors, be sure to fetch a substatial bit at a time so
that youre not punished by latency.
I got a big speedup when i changed my original clueless code to fetch 64
rows in a go instead of only one.

Magnus



pgsql-performance by date:

Previous
From: "Mindaugas Riauba"
Date:
Subject: Re: Need advice about triggers
Next
From: Bruce Momjian
Date:
Subject: Re: Hardware recommendations to scale to silly load