Home > mailing lists

Re: Are there any options to parallelize queries? - Mailing list pgsql-general

From	Pavel Stehule
Subject	Re: Are there any options to parallelize queries?
Date	August 21, 2012 09:21:36
Msg-id	CAFj8pRAW89xGntvZh3Rkm=GiFW_HDhf9K=SOHej0Uj5bhj36Qg@mail.gmail.com Whole thread
In response to	Are there any options to parallelize queries? (Seref Arikan <serefarikan@kurumsalteknoloji.com>)
List	pgsql-general

Tree view

Hello

2012/8/21 Seref Arikan <serefarikan@kurumsalteknoloji.com>:
> Dear all,
> I am designing an electronic health record repository which uses postgresql
> as its RDMS technology. For those who may find the topic interesting, the
> EHR standard I specialize in is openEHR: http://www.openehr.org/
>

http://stormdb.com/community/stado?destination=node%2F8

Regards

Pavel Stehule


> My design makes use of parallel execution in the layers above DB, and it
> seems to scale quite good. However, I have a scale problem at hand. A single
> patient can have up to 1 million different clinical data entries on his/her
> own, after a few decades of usage. Clinicians do love their data, and
> especially in chronic disease management, they demand access to whatever
> data exists. If you have 20 years of data for a diabetics patient for
> example, they'll want to look for trends in that, or even scroll through all
> of it, maybe with some filtering.
> My requirement is to be able to process those 1 million records as fast as
> possible. In case of population queries, we're talking about billions of
> records. Each clinical record, (even with all the optimizations our domain
> has developed in the last 30 or so years), leads to a number of rows, so you
> can see that this is really big data. (imagine a national diabetes registry
> with lifetime data of a few million patients)
> I am ready to consider Hadoop or other non-transactional approaches for
> population queries, but clinical care still requires that I process millions
> of records for a single patient.
>
> Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a
> lot, but it would be a lot better if I could feed those frameworks with data
> faster. So, what options do I have to execute queries in parallel, assuming
> a transactional system running on postgresql? For example I'd like to get
> last 10 years' records in chunks of 2 years of data, or chunks of 5K
> records, fed to N number of parallel processing machines. The clinical
> system should keep functioning in the mean time, with new records added etc.
> PGPool looks like a good option, but I'd appreciate your input. Any proven
> best practices, architectures, products?
>
> Best regards
> Seref
>

pgsql-general by date:

From: Seref Arikan
Date: 21 August 2012, 08:45:44
Subject: Are there any options to parallelize queries?

From: "Wang, Chaoyong"
Date: 21 August 2012, 10:49:36
Subject: How to copy Datum

Re: Are there any options to parallelize queries? - Mailing list pgsql-general

Previous

Next