Thread: Utilizing multiple cores in a function call.
Good morning.
I have developed a function call that schedules patient appointments within a day based on several resource constraints. The algorithm has been mentioned on here before and I have managed to tweak it down to 6-9 seconds from the original 27 seconds.
Of course, I want it to be faster still. The function throttles one of my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting pretty. Is there any way to use both CPUs?
Thanks,
Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294
Hartman, Matthew wrote: > Good morning. > > > > I have developed a function call that schedules patient appointments > within a day based on several resource constraints. The algorithm has > been mentioned on here before and I have managed to tweak it down to 6-9 > seconds from the original 27 seconds. > To speed up the execution of processes, I heartily recommend the book, "Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982. There are many important steps. The most important is usually to refine the algorithm itself. I once speeded up a program that would have required several weeks on a main frame running 24/7 to 6 minutes by improving the basic algorithm of the thing. Only then would it have made sense to optimize the actual code. Next, you need to profile the code to see where the hot spots are. There is little point to examining code in other parts of the program. > > Of course, I want it to be faster still. The function throttles one of > my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one > sitting pretty. Is there any way to use both CPUs? > You could write your algorithm as a separate process -- a server. Then in you SQL program, you invoke a trivial function that just hands the arguments off to the server. Thus, your SQL program would normally run on one processor and the time-consuming algorithm would run on the other. If you are not careful, this would not benefit you at all because your SQL process would wait until the server returns its answer. So you would need to modify your SQL program so that it could do other things while the server process did its thing. My guess is that you need a more efficient algorithm before you go to the trouble of optimizing the execution of your current one. As far as making it run on multiple processors, it depends critically on the nature of your algorithm. A few can easily be modified to run on multiple processors. Some cannot run on multiple processors at all. > > > Thanks, > > > Matthew Hartman > Programmer/Analyst > Information Management, ICP > Kingston General Hospital > (613) 549-6666 x4294 > > > > > -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22, 4.19
I'm pretty much at that point where I've chewed the fat off of the algorithm, or at least at my personal limits. Occasionally a new idea pops into my head and yields an improvement but it's in the order of 100-250ms. Google came back with "no sir". It seems PostgreSQL is limited to one CPU per query unless I spawn a master/controller like you suggested. Shame.. Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549-6666 x4294 -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jean-David Beyer Sent: Monday, June 29, 2009 10:53 AM To: pgsql performance Subject: Re: [PERFORM] Utilizing multiple cores in a function call. Hartman, Matthew wrote: > Good morning. > > > > I have developed a function call that schedules patient appointments > within a day based on several resource constraints. The algorithm has > been mentioned on here before and I have managed to tweak it down to 6-9 > seconds from the original 27 seconds. > To speed up the execution of processes, I heartily recommend the book, "Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982. There are many important steps. The most important is usually to refine the algorithm itself. I once speeded up a program that would have required several weeks on a main frame running 24/7 to 6 minutes by improving the basic algorithm of the thing. Only then would it have made sense to optimize the actual code. Next, you need to profile the code to see where the hot spots are. There is little point to examining code in other parts of the program. > > Of course, I want it to be faster still. The function throttles one of > my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one > sitting pretty. Is there any way to use both CPUs? > You could write your algorithm as a separate process -- a server. Then in you SQL program, you invoke a trivial function that just hands the arguments off to the server. Thus, your SQL program would normally run on one processor and the time-consuming algorithm would run on the other. If you are not careful, this would not benefit you at all because your SQL process would wait until the server returns its answer. So you would need to modify your SQL program so that it could do other things while the server process did its thing. My guess is that you need a more efficient algorithm before you go to the trouble of optimizing the execution of your current one. As far as making it run on multiple processors, it depends critically on the nature of your algorithm. A few can easily be modified to run on multiple processors. Some cannot run on multiple processors at all. > > > Thanks, > > > Matthew Hartman > Programmer/Analyst > Information Management, ICP > Kingston General Hospital > (613) 549-6666 x4294 > > > > > -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22, 4.19 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Hartman, Matthew wrote: > I'm pretty much at that point where I've chewed the fat off of the > algorithm, or at least at my personal limits. Occasionally a new idea > pops into my head and yields an improvement but it's in the order of > 100-250ms. > > Google came back with "no sir". It seems PostgreSQL is limited to one > CPU per query unless I spawn a master/controller like you suggested. > Shame.. Although I have never done it myself, you might try using PL/R to perform the algo in R, and make use of snow package to run parallel tasks -- see: http://cran.r-project.org/web/views/HighPerformanceComputing.html Joe
On Mon, 29 Jun 2009, Hartman, Matthew wrote: > The function throttles one of my CPUs to 100% (shown as 50% in Task > Manager) and leaves the other one sitting pretty. Is there any way to > use both CPUs? Not easily. Potential techniques: -Rewrite the function or its time critical portion in some other language that allows using two processes usefully -Write a "worker server" that you prompt to pick up work from a table and write its output to another that you can ask to handle part of the job. You might communicate with the worker using the LISTEN/NOTIFY mechanism in the database. -Some combination of these two techniques. One popular way to speed up things that are running slowly is to run some part of them in a C UDF, so that you could use "select my_big_computation(x,y,z)" and get faster execution. If you were hoping for a quick answer, no such thing. I suspect you'd get better help talking about what your function does and see if there's a specific part somebody else is familiar with optimizing. For example, I've seen >10:1 speedups just be rewriting one small portion of a computationally expensive mathematical function in C before, keeping the rest of the logic on the database side. You don't necessarily have to rewrite the whole thing. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Jun 29, 2009 at 10:26 AM, Hartman, Matthew<Matthew.Hartman@krcc.on.ca> wrote: > Good morning. > > > > I have developed a function call that schedules patient appointments within > a day based on several resource constraints. The algorithm has been > mentioned on here before and I have managed to tweak it down to 6-9 seconds > from the original 27 seconds. > > > > Of course, I want it to be faster still. The function throttles one of my > CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting > pretty. Is there any way to use both CPUs? Your best bet at using multiple cores on a cpu bound problem is to try and divide up the work logically into separate pools and to attack the work with multiple function calls. This is probably what the database would do for you if it had 'in-query multi threading', only the database could attack it on a much finer grained level. In your particular case, I think the answer is to attack the problem in an entirely new direction, although your matrix query is one of the coolest queries i've seen in a while. The first thought that jumped out at me was to try and treat your nurses and stations as incrementing numbers so that if you allocate three hours of nurse x's time, you increment some number by three in the nurse's table. This would lay on top of a kind of a time calculation system that would convert that number to actual time based on the nurses schedule, etc. On top of _that_, you would need some kind of resolution system to handle canceled appointments, nurse no-shows, etc. The stations would operate on a similar principle...you imagine all the available hours for the station stretched to infinity on a number line and keep a fixed allocation point which always moves forwards, plus a 'number line time' -> real time converter and a freestore list to pick up unexpectedly freed time. merlin
On Mon, 2009-06-29 at 14:42 -0400, Greg Smith wrote: > -Write a "worker server" that you prompt to pick up work from a table and > write its output to another that you can ask to handle part of the job. > You might communicate with the worker using the LISTEN/NOTIFY mechanism in > the database. > > -Some combination of these two techniques. One popular way to speed up > things that are running slowly is to run some part of them in a C UDF, so > that you could use "select my_big_computation(x,y,z)" and get faster > execution. The trouble here is that the backend may not like having threads suddenly introduced into its execution environment. If properly written, I don't really see why a C UDF that used pthreads couldn't spawn two worker threads that _NEVER_ touched _ANY_ PostgreSQL APIs, talked to the SPI, etc, and let them run while blocking the main thread until they complete. Then again, I know relatively little about Pg's guts, and for all I know initing the pthread environment could completely mess up the backend. Personally I'd want to do it out-of-process, using a SECURITY DEFINER PL/PgSQL function owned by a role that also owned some otherwise private queue and result tables for your worker server. As Greg Smith noted, LISTEN/NOTIFY would allow your worker server to avoid polling and instead sleep when there's nothing in the queue, and would also let your waiting clients avoid polling the result table. > For example, I've seen >10:1 speedups just be rewriting one small portion > of a computationally expensive mathematical function in C before, keeping > the rest of the logic on the database side. You don't necessarily have to > rewrite the whole thing. A useful dirty trick is to use Psyco in Python. It's a specializing compiler that can get massive performance boosts out of Python code without any code changes, and it seems to work with PL/Python. Just: try: import psyco psyco.full() except: # Enabing Pysco failed; don't care pass in your function should get you a pretty serious boost. This will NOT, however, allow your code to use two cores at once; you'll need threading or multiple processes for that. -- Craig Ringer
I have tried to wrap my brain around different approaches but I'm still stuck with this one so far. Your approach is interesting but the problem is more complicated than that. Let me break it down a bit more. The chemotherapy treatment room is divided into groupings of chairs, called pods. Pod 1 could have three chairs, pod 2 could have two, and so forth. Every day can have a unique number of pods, chairs, and groupings of chairs to pods. Furthermore, every day can have a unique number of nurses, and nurses are assigned to one or more pods. A single nurse could be assigned to cover three pods for example. On top of that, pods have a start/end time as well as nurses. Every pod and nurse can have unique start/end times. Chemotherapy regimens have a required chair time and a required nurse time. The required nurse time represents how long it takes a nurse to start the treatment. To schedule an appointment, both the chair and nurse have to be available for the required times at the same time, while also respecting the pod/chair and pod/nurse assignments. It's more than incrementing/decrementing the total available time. Thanks, Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549-6666 x4294 -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Monday, June 29, 2009 5:22 PM To: Hartman, Matthew Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Utilizing multiple cores in a function call. The first thought that jumped out at me was to try and treat your nurses and stations as incrementing numbers so that if you allocate three hours of nurse x's time, you increment some number by three in the nurse's table. This would lay on top of a kind of a time calculation system that would convert that number to actual time based on the nurses schedule, etc. On top of _that_, you would need some kind of resolution system to handle canceled appointments, nurse no-shows, etc. The stations would operate on a similar principle...you imagine all the available hours for the station stretched to infinity on a number line and keep a fixed allocation point which always moves forwards, plus a 'number line time' -> real time converter and a freestore list to pick up unexpectedly freed time. merlin
On Tue, Jun 30, 2009 at 8:30 AM, Hartman, Matthew<Matthew.Hartman@krcc.on.ca> wrote: > I have tried to wrap my brain around different approaches but I'm still > stuck with this one so far. Your approach is interesting but the problem > is more complicated than that. Let me break it down a bit more. > > The chemotherapy treatment room is divided into groupings of chairs, > called pods. Pod 1 could have three chairs, pod 2 could have two, and so > forth. Every day can have a unique number of pods, chairs, and groupings > of chairs to pods. Furthermore, every day can have a unique number of > nurses, and nurses are assigned to one or more pods. A single nurse > could be assigned to cover three pods for example. On top of that, pods > have a start/end time as well as nurses. Every pod and nurse can have > unique start/end times. > > Chemotherapy regimens have a required chair time and a required nurse > time. The required nurse time represents how long it takes a nurse to > start the treatment. To schedule an appointment, both the chair and > nurse have to be available for the required times at the same time, > while also respecting the pod/chair and pod/nurse assignments. It's more > than incrementing/decrementing the total available time. I take it then that the char time and the nurse time are not the same duration. Does the nurse time always have to be the same portion of the chair time (say, at the beginning?), or is their some more complicated definition of how the nurse time overlays on top the chair time during the treatment? merlin
> I take it then that the char time and the nurse time are not the same > duration. Does the nurse time always have to be the same portion of > the chair time (say, at the beginning?), or is their some more > complicated definition of how the nurse time overlays on top the chair > time during the treatment? The nurse time is equal to or often less than the chair time, and always at the beginning. They've asked to be able to specify nurse time at the end as well but I've stuck with "no" so far. :) Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549-6666 x4294