Re: Are there any options to parallelize queries? - Mailing list pgsql-general

From Pavel Stehule
Subject Re: Are there any options to parallelize queries?
Date
Msg-id CAFj8pRAW89xGntvZh3Rkm=GiFW_HDhf9K=SOHej0Uj5bhj36Qg@mail.gmail.com
Whole thread Raw
In response to Are there any options to parallelize queries?  (Seref Arikan <serefarikan@kurumsalteknoloji.com>)
List pgsql-general
Hello

2012/8/21 Seref Arikan <serefarikan@kurumsalteknoloji.com>:
> Dear all,
> I am designing an electronic health record repository which uses postgresql
> as its RDMS technology. For those who may find the topic interesting, the
> EHR standard I specialize in is openEHR: http://www.openehr.org/
>

http://stormdb.com/community/stado?destination=node%2F8

Regards

Pavel Stehule


> My design makes use of parallel execution in the layers above DB, and it
> seems to scale quite good. However, I have a scale problem at hand. A single
> patient can have up to 1 million different clinical data entries on his/her
> own, after a few decades of usage. Clinicians do love their data, and
> especially in chronic disease management, they demand access to whatever
> data exists. If you have 20 years of data for a diabetics patient for
> example, they'll want to look for trends in that, or even scroll through all
> of it, maybe with some filtering.
> My requirement is to be able to process those 1 million records as fast as
> possible. In case of population queries, we're talking about billions of
> records. Each clinical record, (even with all the optimizations our domain
> has developed in the last 30 or so years), leads to a number of rows, so you
> can see that this is really big data. (imagine a national diabetes registry
> with lifetime data of a few million patients)
> I am ready to consider Hadoop or other non-transactional approaches for
> population queries, but clinical care still requires that I process millions
> of records for a single patient.
>
> Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a
> lot, but it would be a lot better if I could feed those frameworks with data
> faster. So, what options do I have to execute queries in parallel, assuming
> a transactional system running on postgresql? For example I'd like to get
> last 10 years' records in chunks of 2 years of data, or chunks of 5K
> records, fed to N number of parallel processing machines. The clinical
> system should keep functioning in the mean time, with new records added etc.
> PGPool looks like a good option, but I'd appreciate your input. Any proven
> best practices, architectures, products?
>
> Best regards
> Seref
>


pgsql-general by date:

Previous
From: Seref Arikan
Date:
Subject: Are there any options to parallelize queries?
Next
From: "Wang, Chaoyong"
Date:
Subject: How to copy Datum