Home > mailing lists

Re: Performance of complicated query - Mailing list pgsql-performance

From	Jonathan Morra
Subject	Re: Performance of complicated query
Date	May 28, 2013 14:43:41
Msg-id	CAF8LAAUp2FyirH42W=whC7ao7HGrH6UvGWcj-DPLq4Pbn53suw@mail.gmail.com Whole thread
In response to	Re: Performance of complicated query (Jonathan Morra <jonmorra@gmail.com>)
List	pgsql-performance

Tree view

I have been working on this query, and I was able to modify it and get it's run time cut in half. Here's where it is right now:

SELECT first_name, last_name, serial_number, latest_read, value, lifetime_value, lifetime.patient_id

FROM (

SELECT DISTINCT patient_id, first_name, last_name, MAX(read_datetime) OVER(PARTITION BY patient_id) AS latest_read

, SUM(value) OVER(PARTITION BY patient_id) AS value, first_value(serial_number) OVER(PARTITION BY patient_id ORDER BY read_datetime DESC) AS serial_number

FROM (

SELECT patient_devices.device_id, patient_id, MAX(value - issued_value) AS value, MAX(read_datetime) AS read_datetime

FROM read_reads

INNER JOIN patient_devices ON patient_devices.device_id = read_reads.device_id

AND read_datetime >= issuance_datetime

AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)

WHERE read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'

GROUP BY patient_devices.id

) AS first

INNER JOIN devices ON devices.id = device_id

INNER JOIN patients ON patient_id = patients.id

) AS filtered

INNER JOIN (

SELECT patient_id, SUM(value) AS lifetime_value

FROM (

SELECT patient_id, MAX(value - issued_value) AS value FROM read_reads

INNER JOIN patient_devices ON patient_devices.device_id = read_reads.device_id

AND read_datetime >= issuance_datetime

AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)

GROUP BY patient_devices.id

) AS first GROUP BY patient_id

) AS lifetime ON filtered.patient_id = lifetime.patient_id

I think the key to cutting it down was moving some of the joins up a level. Even though this is faster, I'd still like to cut it down a bunch more (as this will be run a lot in my application). Any more insight would be greatly appreciated. A summary of explain (analyze, buffers) can be found at http://explain.depesz.com/s/qx7f.

Thanks

On Thu, May 23, 2013 at 5:21 PM, Jonathan Morra <jonmorra@gmail.com> wrote:

Sorry for the messy query, I'm very new to writing these complex queries. I'll try and make it easier to read by using WITH clauses. However, just to clarify, the WITH clauses only increase readability and not performance in any way, right?

On Thu, May 23, 2013 at 4:22 PM, james <james@mansionfamily.plus.com> wrote:
On 23/05/2013 22:57, Jonathan Morra wrote:
I'm not sure I understand your proposed solution. There is also the case to consider where the same patient can be assigned the same device multiple times. In this case, the value may be reset at each assignment (hence the line value - issued_value AS value from the original query).

Perhaps you could use triggers to help somewhat? At least for the lifetime part.

For a given assignment of a device to a patient, only the last value is useful, so you can maintain that easily enough (a bit like a materialised view but before 9.3 I guess).

But, that might fix 'lifetime' but not some arbitrary windowed view. I can see why an 'as at' end time is useful, but not why a start time is so useful: if a device has readings before the window but not in the window, is that 'no reading' or should the last reading prior to the window apply?

It also seems to me that the solution you have is hard to reason about. Its like a Haskell program done in one big inline fold rather than a bunch of 'where' clauses, and I find these cause significant brain overload.

Perhaps you could break it out into identifiable chunks that work out (both for lifetime if not using triggers, and for your date range otherwise) the readings that are not superceded (ie the last in the date bounds for a device assignment), and then work with those. Consider the CTE 'WITH queries' for doing this?

It seems to me that if you can do this, then the problem might be easier to express.

Failing that, I'd be looking at using temporary tables, and forcing a series of reduce steps using them, but then I'm a nasty old Sybase hacker at heart. ;-)

pgsql-performance by date:

From: Matheus de Oliveira
Date: 28 May 2013, 12:40:16
Subject: Re: Slow SELECT by primary key? Postgres 9.1.2

From: Steve Crawford
Date: 28 May 2013, 16:04:43
Subject: Re: Performance of complicated query

Re: Performance of complicated query - Mailing list pgsql-performance

Previous

Next