Thread: Preventing query from hogging server

Preventing query from hogging server

From
"Matthew Nuzum"
Date:
I've got a report that is starting to take too long to run. I'm going to
create a lookup table that should speed up the results, but first I've got
to create the lookup table.

I honestly don't care how long the query takes to run, I just want to run it
without causing a major performance impact on other operations. The query
seems to take forever even when I limit the results to just 10, so I don't
know if I can get good results by splitting the query into groups of queries
(for example, for a years worth of data do 12 queries, one for each month or
maybe 365 queries, one for each day) or if there is a psql equivalent to
"nice."

I've tried `nice psql` in the past and I don't think that had much impact,
but I haven't tried it on this query.

Here is the query (BTW, there will be a corresponding "max" version of this
query as well):
INSERT INTO usage_sessions_min (accountid,atime,sessionid)
select accountid, min(atime) as atime, sessionid from usage_access
group by accountid,sessionid;

atime is a timestamptz, accountis is varchar(12) and sessionid is int.

I've tried to do an explain analyze of this query, but its been running for
hours already and I don't know when it will finish.

--
Matthew Nuzum <matt@followers.net>
www.followers.net - Makers of "Elite Content Management System"
View samples of Elite CMS in action by visiting
http://www.elitecms.com/



Re: Preventing query from hogging server

From
Rosser Schwarz
Date:
while you weren't looking, Matthew Nuzum wrote:

> select accountid, min(atime) as atime, sessionid from usage_access
> group by accountid,sessionid;

Try something along the lines of:

select ua.accountid
     , (select atime
          from usage_access
         where sessionid = ua.sessionid
           and accountid = ua.accountid
         order by atime asc
         limit 1
       ) as atime
     , ua.sessionid
  from usage_access ua
 group by accountid
     , sessionid

min() and max() currently do table scans, which, on large tables, or
even moderately sized tables with large numbers of accounts/sessions,
can add up.  You'll need to replace asc with desc in the subquery for
the max() version.

This form cheats a bit and uses the index to find the highest and
lowest values, provided you've created the appropriate indices.

This is, IIRC, in the FAQ.

/rls

--
:wq

Re: Preventing query from hogging server

From
Tom Lane
Date:
"Matthew Nuzum" <matt.followers@gmail.com> writes:
> Here is the query (BTW, there will be a corresponding "max" version of this
> query as well):
> INSERT INTO usage_sessions_min (accountid,atime,sessionid)
> select accountid, min(atime) as atime, sessionid from usage_access
> group by accountid,sessionid;

How many rows in usage_access?  How many groups do you expect?
(Approximate answers are fine.)  What PG version is this, and
what's your sort_mem setting?

            regards, tom lane

Re: Preventing query from hogging server

From
"Matthew Nuzum"
Date:
> How many rows in usage_access?  How many groups do you expect?
> (Approximate answers are fine.)  What PG version is this, and
> what's your sort_mem setting?
>
>             regards, tom lane

I believe there are about 40,000,000 rows, I expect there to be about
10,000,000 groups. PostgreSQL version is 7.3.2 and the sort_mem is at the
default setting.

(I know that's an old version. We've been testing with 7.4 now and are
nearly ready to upgrade.)

--
Matthew Nuzum <matt@followers.net>
www.followers.net - Makers of "Elite Content Management System"
View samples of Elite CMS in action by visiting
http://www.followers.net/portfolio/


Re: Preventing query from hogging server

From
"Matthew Nuzum"
Date:
> How many rows in usage_access?

Oh, I just got my explain analyze:
                                                                      QUERY
PLAN
----------------------------------------------------------------------------
--------------------------------------------------------------------------
 Subquery Scan "*SELECT*"  (cost=9499707.90..9856491.74 rows=3567838
width=28) (actual time=11443537.58..12470835.17 rows=1198141 loops=1)
   ->  Aggregate  (cost=9499707.90..9856491.74 rows=3567838 width=28)
(actual time=11443537.56..12466550.25 rows=1198141 loops=1)
         ->  Group  (cost=9499707.90..9767295.78 rows=35678384 width=28)
(actual time=11443537.10..12408372.26 rows=35678383 loops=1)
               ->  Sort  (cost=9499707.90..9588903.86 rows=35678384
width=28) (actual time=11443537.07..12035366.31 rows=35678383 loops=1)
                     Sort Key: accountid, sessionid
                     ->  Seq Scan on usage_access  (cost=0.00..1018901.84
rows=35678384 width=28) (actual time=8.13..416580.35 rows=35678383 loops=1)
 Total runtime: 12625498.84 msec
(7 rows)

--
Matthew Nuzum <matt@followers.net>
www.followers.net - Makers of "Elite Content Management System"
View samples of Elite CMS in action by visiting
http://www.followers.net/portfolio/


Re: Preventing query from hogging server

From
Tom Lane
Date:
"Matthew Nuzum" <matt.followers@gmail.com> writes:
> I believe there are about 40,000,000 rows, I expect there to be about
> 10,000,000 groups. PostgreSQL version is 7.3.2 and the sort_mem is at the
> default setting.

Okay.  I doubt that the nearby suggestion to convert the min()s to
indexscans will help at all, given those numbers --- there aren't enough
rows per group to make it a win.

I think you've just gotta put up with the sorting required to bring the
groups together.  LIMIT or subdividing the query will not make it
faster, because the sort step is the expensive part.  You could probably
improve matters by increasing sort_mem as much as you can stand ---
maybe something like 10M to 100M (instead of the default 1M).  Obviously
you don't want to make it a big fraction of your available RAM, or it
will hurt the concurrent processing, but on modern machines I would
think you could give this a few tens of MB without any problem.  (Note
that you want to just SET sort_mem in this one session, not increase it
globally.)

I would strongly suggest doing the min and max calculations together:

    select groupid, min(col), max(col) from ...

because if you do them in two separate queries 90% of the effort will be
duplicated.

            regards, tom lane

Re: Preventing query from hogging server

From
"Matthew Nuzum"
Date:
> I would strongly suggest doing the min and max calculations together:
>
>     select groupid, min(col), max(col) from ...
>
> because if you do them in two separate queries 90% of the effort will be
> duplicated.
>
>            regards, tom lane

Thanks. Other than avoiding using too much sort mem, is there anything else
I can do to ensure this query doesn't starve other processes for resources?

Doing the explain analyze only increases my server load by 1 and seems to
readily relinquish CPU time, but previously when I had been running a test
query my server load rose to unacceptable levels.

FWIW, the explain was run from psql running on the db server, the test query
the other day was run from one of the webservers. Should I run this on the
db server to minimize load?

--
Matthew Nuzum <matt@followers.net>
www.followers.net - Makers of "Elite Content Management System"
View samples of Elite CMS in action by visiting
http://www.followers.net/portfolio/


Re: Preventing query from hogging server

From
Tom Lane
Date:
"Matthew Nuzum" <matt.followers@gmail.com> writes:
> Thanks. Other than avoiding using too much sort mem, is there anything else
> I can do to ensure this query doesn't starve other processes for resources?

Not a lot.

> Doing the explain analyze only increases my server load by 1 and seems to
> readily relinquish CPU time, but previously when I had been running a test
> query my server load rose to unacceptable levels.

Interesting.  EXPLAIN ANALYZE is going to cause a bunch of
gettimeofday() calls to be inserted ... maybe your kernel takes those as
process preemption points?  Seems unlikely, but ...

> FWIW, the explain was run from psql running on the db server, the test query
> the other day was run from one of the webservers. Should I run this on the
> db server to minimize load?

Since it's an insert/select, psql isn't participating in the data flow.
It's not going to matter where the psql process is.

            regards, tom lane

Re: Preventing query from hogging server

From
"Jim C. Nasby"
Date:
On Thu, Mar 24, 2005 at 01:07:39PM -0600, Matthew Nuzum wrote:
> I've tried `nice psql` in the past and I don't think that had much impact,
> but I haven't tried it on this query.

On linux, nice will only help if the query is CPU-bound. On FreeBSD,
nice affects I/O scheduling, as well as CPU, so it's a more effective
means of limiting the impact of large queries. I don't know how other
OS's handle this.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"