Re: Speeding up a query. - Mailing list pgsql-performance

From Hartman, Matthew
Subject Re: Speeding up a query.
Date
Msg-id 366642367C5B354197A1E0D27BC175BD0225971A@KGHMAIL.KGH.ON.CA
Whole thread Raw
In response to Re: Speeding up a query.  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
Thanks for the replies everyone. I'll try to answer them all in this one email. I will send another email immediately
afterthis with additional details about the query. 

> - Frequently the biggest performance gains can be reached by
>   a (painful) redesign. Can ou change the table structure in a way
>   that makes this query less expensive?

I have considered redesigning the algorithm to accommodate this. As I've said, there's one row per five minute time
slot.Instead, I could represent an interval of time with a row. For example, "start_time" of "08:00" with an "end_time"
of"12:00" or perhaps an interval "duration" of "4 hours". The difficulty becomes in managing separate time requirements
(nursevs unit) for each time slot, and in inserting/updating new rows as pieces of those time slots or intervals are
usedup. Having a row per five minute interval avoids those complications so far. Still, I'd start with 32 rows and
increasethe number, never reaching 3,280.. :) 

> - You have an index on matrix.xxxxx, right?

I have tried indexes on each common join criteria. Usually it's "time,unit", "time,nurse", or "time,unit_scheduled",
"time,nurse_scheduled"(the later two being Booleans). In the first two cases it's made a difference of less than a
second.In the last two, the time actually increases if I add "analyze" statements in after updates are made. 

> - Can you reduce the row count of the two subqueries by adding
>   additional conditions that weed out rows that can be excluded
>   right away?

I use some additional conditions. I'll paste the meat of the query below.

> - Maybe you can gain a little by changing the "select *" to
>   "select id" in both subqueries and adding an additional join
>   with matrix that adds the relevant columns in the end.
>   I don't know the executor, so I don't know if that will help,
>   but it would be a simple thing to test in an experiment.

I wrote the "select *" as simplified, but really, it returns the primary key for that row.

> how far in advance do you schedule?  As far as necessary?

It's done on a per day basis, each day taking 8-12 seconds or so on my workstation. We typically schedule patients as
muchas three to six months in advance. The query already pulls data to a temporary table to avoid having to manage a
massivenumber of rows. 

> How many chairs are there?  How many nurses are there?   This is a
> tricky (read: interesting) problem.

In my current template there are 17 chairs and 7 nurses. Chairs are grouped into pods of 2-4 chairs. Nurses cover one
tomany pods, allowing for a primary nurse per pod as well as floater nurses that cover multiple pods. 




Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Wednesday, June 17, 2009 9:09 AM
To: Hartman, Matthew
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Speeding up a query.

On Tue, Jun 16, 2009 at 2:35 PM, Hartman,
Matthew<Matthew.Hartman@krcc.on.ca> wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From   (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) chair,
>        (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

how far in advance do you schedule?  As far as necessary?

How many chairs are there?  How many nurses are there?   This is a
tricky (read: interesting) problem.

merlin


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Speeding up a query.
Next
From: "Hartman, Matthew"
Date:
Subject: Re: Speeding up a query.