Re: Speeding up a query. - Mailing list pgsql-performance
From | Hartman, Matthew |
---|---|
Subject | Re: Speeding up a query. |
Date | |
Msg-id | 366642367C5B354197A1E0D27BC175BD0225971A@KGHMAIL.KGH.ON.CA Whole thread Raw |
In response to | Re: Speeding up a query. (Merlin Moncure <mmoncure@gmail.com>) |
List | pgsql-performance |
Thanks for the replies everyone. I'll try to answer them all in this one email. I will send another email immediately afterthis with additional details about the query. > - Frequently the biggest performance gains can be reached by > a (painful) redesign. Can ou change the table structure in a way > that makes this query less expensive? I have considered redesigning the algorithm to accommodate this. As I've said, there's one row per five minute time slot.Instead, I could represent an interval of time with a row. For example, "start_time" of "08:00" with an "end_time" of"12:00" or perhaps an interval "duration" of "4 hours". The difficulty becomes in managing separate time requirements (nursevs unit) for each time slot, and in inserting/updating new rows as pieces of those time slots or intervals are usedup. Having a row per five minute interval avoids those complications so far. Still, I'd start with 32 rows and increasethe number, never reaching 3,280.. :) > - You have an index on matrix.xxxxx, right? I have tried indexes on each common join criteria. Usually it's "time,unit", "time,nurse", or "time,unit_scheduled", "time,nurse_scheduled"(the later two being Booleans). In the first two cases it's made a difference of less than a second.In the last two, the time actually increases if I add "analyze" statements in after updates are made. > - Can you reduce the row count of the two subqueries by adding > additional conditions that weed out rows that can be excluded > right away? I use some additional conditions. I'll paste the meat of the query below. > - Maybe you can gain a little by changing the "select *" to > "select id" in both subqueries and adding an additional join > with matrix that adds the relevant columns in the end. > I don't know the executor, so I don't know if that will help, > but it would be a simple thing to test in an experiment. I wrote the "select *" as simplified, but really, it returns the primary key for that row. > how far in advance do you schedule? As far as necessary? It's done on a per day basis, each day taking 8-12 seconds or so on my workstation. We typically schedule patients as muchas three to six months in advance. The query already pulls data to a temporary table to avoid having to manage a massivenumber of rows. > How many chairs are there? How many nurses are there? This is a > tricky (read: interesting) problem. In my current template there are 17 chairs and 7 nurses. Chairs are grouped into pods of 2-4 chairs. Nurses cover one tomany pods, allowing for a primary nurse per pod as well as floater nurses that cover multiple pods. Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549-6666 x4294 -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Wednesday, June 17, 2009 9:09 AM To: Hartman, Matthew Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Speeding up a query. On Tue, Jun 16, 2009 at 2:35 PM, Hartman, Matthew<Matthew.Hartman@krcc.on.ca> wrote: > Good afternoon. > > I have developed an application to efficiently schedule chemotherapy > patients at our hospital. The application takes into account several > resource constraints (available chairs, available nurses, nurse coverage > assignment to chairs) as well as the chair time and nursing time > required for a regimen. > > The algorithm for packing appointments in respects each constraint and > typically schedules a day of treatments (30-60) within 9-10 seconds on > my workstation, down from 27 seconds initially. I would like to get it > below 5 seconds if possible. > > I think what's slowing is down is simply the number of rows and joins. > The algorithm creates a scheduling matrix with one row per 5 minute > timeslot, per unit, per nurse assigned to the unit. That translates to > 3,280 rows for the days I have designed in development (each day can > change). > > To determine the available slots, the algorithm finds the earliest slot > that has an available chair and a count of the required concurrent > intervals afterwards. So a 60 minute regimen requires 12 concurrent > rows. This is accomplished by joining the table on itself. A second > query is ran for the same range, but with respect to the nurse time and > an available nurse. Finally, those two are joined against each other. > Effectively, it is: > > Select * > From ( > Select * > From matrix m1, matrix m2 > Where m1.xxxxx = m2.xxxxx > ) chair, > ( > Select * > From matrix m1, matrix m2 > Where m1.xxxxx = m2.xxxxx > ) nurse > Where chair.id = nurse.id > > With matrix having 3,280 rows. Ugh. > > I have tried various indexes and clustering approachs with little > success. Any ideas? how far in advance do you schedule? As far as necessary? How many chairs are there? How many nurses are there? This is a tricky (read: interesting) problem. merlin
pgsql-performance by date: