Re: Performance of complicated query - Mailing list pgsql-performance

From Jonathan Morra
Subject Re: Performance of complicated query
Date
Msg-id CAF8LAAUp2FyirH42W=whC7ao7HGrH6UvGWcj-DPLq4Pbn53suw@mail.gmail.com
Whole thread Raw
In response to Re: Performance of complicated query  (Jonathan Morra <jonmorra@gmail.com>)
List pgsql-performance
I have been working on this query, and I was able to modify it and get it's run time cut in half.  Here's where it is right now:

SELECT first_name, last_name, serial_number, latest_read, value, lifetime_value, lifetime.patient_id
FROM (
SELECT DISTINCT patient_id, first_name, last_name, MAX(read_datetime) OVER(PARTITION BY patient_id) AS latest_read
, SUM(value) OVER(PARTITION BY patient_id) AS value, first_value(serial_number) OVER(PARTITION BY patient_id ORDER BY read_datetime DESC) AS serial_number
FROM (
SELECT patient_devices.device_id, patient_id, MAX(value - issued_value) AS value, MAX(read_datetime) AS read_datetime
FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id = read_reads.device_id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
WHERE read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
) AS first
INNER JOIN devices ON devices.id = device_id
INNER JOIN patients ON patient_id = patients.id
) AS filtered
INNER JOIN (
SELECT patient_id, SUM(value) AS lifetime_value
FROM (
SELECT patient_id, MAX(value - issued_value) AS value FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id = read_reads.device_id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
) AS first GROUP BY patient_id
) AS lifetime ON filtered.patient_id = lifetime.patient_id

I think the key to cutting it down was moving some of the joins up a level.  Even though this is faster, I'd still like to cut it down a bunch more (as this will be run a lot in my application).  Any more insight would be greatly appreciated.  A summary of explain (analyze, buffers) can be found at http://explain.depesz.com/s/qx7f.

Thanks


On Thu, May 23, 2013 at 5:21 PM, Jonathan Morra <jonmorra@gmail.com> wrote:
Sorry for the messy query, I'm very new to writing these complex queries.  I'll try and make it easier to read by using WITH clauses.  However, just to clarify, the WITH clauses only increase readability and not performance in any way, right?


On Thu, May 23, 2013 at 4:22 PM, james <james@mansionfamily.plus.com> wrote:
On 23/05/2013 22:57, Jonathan Morra wrote:
I'm not sure I understand your proposed solution.  There is also the case to consider where the same patient can be assigned the same device multiple times.  In this case, the value may be reset at each assignment (hence the line value - issued_value AS value from the original query).


Perhaps you could use triggers to help somewhat?  At least for the lifetime part.

For a given assignment of a device to a patient, only the last value is useful, so you can maintain that easily enough (a bit like a materialised view but before 9.3 I guess).

But, that might fix 'lifetime' but not some arbitrary windowed view.  I can see why an 'as at' end time is useful, but not why a start time is so useful: if a device has readings before the window but not in the window, is that 'no reading' or should the last reading prior to the window apply?

It also seems to me that the solution you have is hard to reason about.  Its like a Haskell program done in one big inline fold rather than a bunch of 'where' clauses, and I find these cause significant brain overload.

Perhaps you could break it out into identifiable chunks that work out (both for lifetime if not using triggers, and for your date range otherwise) the readings that are not superceded (ie the last in the date bounds for a device assignment), and then work with those.  Consider the CTE 'WITH queries' for doing this?

It seems to me that if you can do this, then the problem might be easier to express.

Failing that, I'd be looking at using temporary tables, and forcing a series of reduce steps using them, but then I'm a nasty old Sybase hacker at heart. ;-)



pgsql-performance by date:

Previous
From: Matheus de Oliveira
Date:
Subject: Re: Slow SELECT by primary key? Postgres 9.1.2
Next
From: Steve Crawford
Date:
Subject: Re: Performance of complicated query