Thread: Preventing query from hogging server
I've got a report that is starting to take too long to run. I'm going to create a lookup table that should speed up the results, but first I've got to create the lookup table. I honestly don't care how long the query takes to run, I just want to run it without causing a major performance impact on other operations. The query seems to take forever even when I limit the results to just 10, so I don't know if I can get good results by splitting the query into groups of queries (for example, for a years worth of data do 12 queries, one for each month or maybe 365 queries, one for each day) or if there is a psql equivalent to "nice." I've tried `nice psql` in the past and I don't think that had much impact, but I haven't tried it on this query. Here is the query (BTW, there will be a corresponding "max" version of this query as well): INSERT INTO usage_sessions_min (accountid,atime,sessionid) select accountid, min(atime) as atime, sessionid from usage_access group by accountid,sessionid; atime is a timestamptz, accountis is varchar(12) and sessionid is int. I've tried to do an explain analyze of this query, but its been running for hours already and I don't know when it will finish. -- Matthew Nuzum <matt@followers.net> www.followers.net - Makers of "Elite Content Management System" View samples of Elite CMS in action by visiting http://www.elitecms.com/
while you weren't looking, Matthew Nuzum wrote: > select accountid, min(atime) as atime, sessionid from usage_access > group by accountid,sessionid; Try something along the lines of: select ua.accountid , (select atime from usage_access where sessionid = ua.sessionid and accountid = ua.accountid order by atime asc limit 1 ) as atime , ua.sessionid from usage_access ua group by accountid , sessionid min() and max() currently do table scans, which, on large tables, or even moderately sized tables with large numbers of accounts/sessions, can add up. You'll need to replace asc with desc in the subquery for the max() version. This form cheats a bit and uses the index to find the highest and lowest values, provided you've created the appropriate indices. This is, IIRC, in the FAQ. /rls -- :wq
"Matthew Nuzum" <matt.followers@gmail.com> writes: > Here is the query (BTW, there will be a corresponding "max" version of this > query as well): > INSERT INTO usage_sessions_min (accountid,atime,sessionid) > select accountid, min(atime) as atime, sessionid from usage_access > group by accountid,sessionid; How many rows in usage_access? How many groups do you expect? (Approximate answers are fine.) What PG version is this, and what's your sort_mem setting? regards, tom lane
> How many rows in usage_access? How many groups do you expect? > (Approximate answers are fine.) What PG version is this, and > what's your sort_mem setting? > > regards, tom lane I believe there are about 40,000,000 rows, I expect there to be about 10,000,000 groups. PostgreSQL version is 7.3.2 and the sort_mem is at the default setting. (I know that's an old version. We've been testing with 7.4 now and are nearly ready to upgrade.) -- Matthew Nuzum <matt@followers.net> www.followers.net - Makers of "Elite Content Management System" View samples of Elite CMS in action by visiting http://www.followers.net/portfolio/
> How many rows in usage_access? Oh, I just got my explain analyze: QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------------------------------------------- Subquery Scan "*SELECT*" (cost=9499707.90..9856491.74 rows=3567838 width=28) (actual time=11443537.58..12470835.17 rows=1198141 loops=1) -> Aggregate (cost=9499707.90..9856491.74 rows=3567838 width=28) (actual time=11443537.56..12466550.25 rows=1198141 loops=1) -> Group (cost=9499707.90..9767295.78 rows=35678384 width=28) (actual time=11443537.10..12408372.26 rows=35678383 loops=1) -> Sort (cost=9499707.90..9588903.86 rows=35678384 width=28) (actual time=11443537.07..12035366.31 rows=35678383 loops=1) Sort Key: accountid, sessionid -> Seq Scan on usage_access (cost=0.00..1018901.84 rows=35678384 width=28) (actual time=8.13..416580.35 rows=35678383 loops=1) Total runtime: 12625498.84 msec (7 rows) -- Matthew Nuzum <matt@followers.net> www.followers.net - Makers of "Elite Content Management System" View samples of Elite CMS in action by visiting http://www.followers.net/portfolio/
"Matthew Nuzum" <matt.followers@gmail.com> writes: > I believe there are about 40,000,000 rows, I expect there to be about > 10,000,000 groups. PostgreSQL version is 7.3.2 and the sort_mem is at the > default setting. Okay. I doubt that the nearby suggestion to convert the min()s to indexscans will help at all, given those numbers --- there aren't enough rows per group to make it a win. I think you've just gotta put up with the sorting required to bring the groups together. LIMIT or subdividing the query will not make it faster, because the sort step is the expensive part. You could probably improve matters by increasing sort_mem as much as you can stand --- maybe something like 10M to 100M (instead of the default 1M). Obviously you don't want to make it a big fraction of your available RAM, or it will hurt the concurrent processing, but on modern machines I would think you could give this a few tens of MB without any problem. (Note that you want to just SET sort_mem in this one session, not increase it globally.) I would strongly suggest doing the min and max calculations together: select groupid, min(col), max(col) from ... because if you do them in two separate queries 90% of the effort will be duplicated. regards, tom lane
> I would strongly suggest doing the min and max calculations together: > > select groupid, min(col), max(col) from ... > > because if you do them in two separate queries 90% of the effort will be > duplicated. > > regards, tom lane Thanks. Other than avoiding using too much sort mem, is there anything else I can do to ensure this query doesn't starve other processes for resources? Doing the explain analyze only increases my server load by 1 and seems to readily relinquish CPU time, but previously when I had been running a test query my server load rose to unacceptable levels. FWIW, the explain was run from psql running on the db server, the test query the other day was run from one of the webservers. Should I run this on the db server to minimize load? -- Matthew Nuzum <matt@followers.net> www.followers.net - Makers of "Elite Content Management System" View samples of Elite CMS in action by visiting http://www.followers.net/portfolio/
"Matthew Nuzum" <matt.followers@gmail.com> writes: > Thanks. Other than avoiding using too much sort mem, is there anything else > I can do to ensure this query doesn't starve other processes for resources? Not a lot. > Doing the explain analyze only increases my server load by 1 and seems to > readily relinquish CPU time, but previously when I had been running a test > query my server load rose to unacceptable levels. Interesting. EXPLAIN ANALYZE is going to cause a bunch of gettimeofday() calls to be inserted ... maybe your kernel takes those as process preemption points? Seems unlikely, but ... > FWIW, the explain was run from psql running on the db server, the test query > the other day was run from one of the webservers. Should I run this on the > db server to minimize load? Since it's an insert/select, psql isn't participating in the data flow. It's not going to matter where the psql process is. regards, tom lane
On Thu, Mar 24, 2005 at 01:07:39PM -0600, Matthew Nuzum wrote: > I've tried `nice psql` in the past and I don't think that had much impact, > but I haven't tried it on this query. On linux, nice will only help if the query is CPU-bound. On FreeBSD, nice affects I/O scheduling, as well as CPU, so it's a more effective means of limiting the impact of large queries. I don't know how other OS's handle this. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"