Thread: Confusing performance of specific query

Confusing performance of specific query

From
Adam Endicott
Date:
I'm having an issue with a specific query, and I don't really know
where to start figuring out what's going on. I'm pretty new to
PostgreSQL in specific, and I'm not much of a database/SQL guru in
general. I've got one query that is consistently taking 10X longer to
run on a production machine than on my desktop. I haven't noticed
other queries suffering similar performance issues on the production
machine. Here's the query in question:

----------
SELECT DISTINCT

"movies_movie"."id","movies_movie"."title","movies_movie"."cinemasource_id","movies_movie"."mpaa_rating_id","movies_movie"."advisory","movies_movie"."teaser","movies_movie"."review_id","movies_movie"."runtime","movies_movie"."studio_url","movies_movie"."distributor_id","movies_movie"."synopsis","movies_movie"."stars","movies_movie"."main_image_id","movies_movie"."trailer","movies_movie"."editors_pick"
FROM "movies_movie" INNER JOIN "movies_moviescreening" AS
"movies_movie__moviescreening" ON "movies_movie"."id" =
"movies_movie__moviescreening"."movie_id" WHERE
("movies_movie__moviescreening"."id" IS NOT NULL) ORDER BY
"movies_movie"."title" ASC;
----------

I'm using a web frame work with an ORM (Django), so that's where this
query originates from - so it might not be the best way to do what I'm
trying to accomplish, but right now I'm more interested in the
performance difference between the two machines than I am in changing
this query to something better.

When I run EXPLAIN ANALYZE on this query, it takes something like
1200ms on my desktop (Dual 2GHz G5 Mac - 1.5 GB RAM for reference) and
about 14000ms on the production server (quad processor, 8 GB RAM,
running Ubuntu). There are about 500 rows in the movies_movie table,
and about 16k rows in the movies_moviescreening table. The data is the
same on both machines. My desktop is running PostgreSQL 8.2.3, and the
production server is running 8.1.9.

Since I don't know much about how to diagnose this, I don't know what
other information to give, so let me know if I've left out something
crucial.

Any help would be greatly appreciated.

Thanks,
Adam


Re: Confusing performance of specific query

From
Tom Lane
Date:
Adam Endicott <leftwing17@gmail.com> writes:
> When I run EXPLAIN ANALYZE on this query, it takes something like
> 1200ms on my desktop (Dual 2GHz G5 Mac - 1.5 GB RAM for reference) and
> about 14000ms on the production server (quad processor, 8 GB RAM,
> running Ubuntu). There are about 500 rows in the movies_movie table,
> and about 16k rows in the movies_moviescreening table. The data is the
> same on both machines. My desktop is running PostgreSQL 8.2.3, and the
> production server is running 8.1.9.

> Since I don't know much about how to diagnose this, I don't know what
> other information to give, so let me know if I've left out something
> crucial.

Uh ... the actual outputs from the two EXPLAIN ANALYZEs, perhaps?

The most likely bets are (1) something different about the
postgresql.conf settings on the two machines, or (2) the 8.2 planner
is smarter than the 8.1 planner and is choosing a better plan.

            regards, tom lane

Re: Confusing performance of specific query

From
Adam Endicott
Date:
Here's the output from explain analyze.

My desktop:
-----------------
 Unique  (cost=6732.86..7380.50 rows=504 width=677) (actual
time=844.345..1148.705 rows=65 loops=1)
   ->  Sort  (cost=6732.86..6773.34 rows=16191 width=677) (actual
time=844.341..1099.446 rows=16191 loops=1)
         Sort Key: movies_movie.title, movies_movie.id,
movies_movie.cinemasource_id, movies_movie.mpaa_rating_id,
movies_movie.advisory, movies_movie.teaser, movies_movie.review_id,
movies_movie.runtime, movies_movie.studio_url,
movies_movie.distributor_id, movies_movie.synopsis,
movies_movie.stars, movies_movie.main_image_id, movies_movie.trailer,
movies_movie.editors_pick
         ->  Hash Join  (cost=56.34..700.88 rows=16191 width=677)
(actual time=6.871..53.314 rows=16191 loops=1)
               Hash Cond: (movies_movie__moviescreening.movie_id =
movies_movie.id)
               ->  Seq Scan on movies_moviescreening
movies_movie__moviescreening  (cost=0.00..421.91 rows=16191 width=4)
(actual time=0.098..19.090 rows=16191 loops=1)
                     Filter: (id IS NOT NULL)
               ->  Hash  (cost=50.04..50.04 rows=504 width=677)
(actual time=6.676..6.676 rows=504 loops=1)
                     ->  Seq Scan on movies_movie  (cost=0.00..50.04
rows=504 width=677) (actual time=0.055..2.717 rows=504 loops=1)
 Total runtime: 1270.106 ms
(10 rows)


Production server:
--------------------------
 Unique  (cost=7412.10..8059.74 rows=507 width=679) (actual
time=14465.619..14559.558 rows=65 loops=1)
   ->  Sort  (cost=7412.10..7452.58 rows=16191 width=679) (actual
time=14465.613..14503.436 rows=16191 loops=1)
         Sort Key: movies_movie.title, movies_movie.id,
movies_movie.cinemasource_id, movies_movie.mpaa_rating_id,
movies_movie.advisory, movies_movie.teaser, movies_movie.review_id,
movies_movie.runtime, movies_movie.studio_url,
movies_movie.distributor_id, movies_movie.synopsis,
movies_movie.stars, movies_movie.main_image_id, movies_movie.trailer,
movies_movie.editors_pick
         ->  Hash Join  (cost=53.34..740.11 rows=16191 width=679)
(actual time=3.470..94.064 rows=16191 loops=1)
               Hash Cond: ("outer".movie_id = "inner".id)
               ->  Seq Scan on movies_moviescreening
movies_movie__moviescreening  (cost=0.00..443.91 rows=16191 width=4)
(actual time=0.040..34.362 rows=16191 loops=1)
                     Filter: (id IS NOT NULL)
               ->  Hash  (cost=52.07..52.07 rows=507 width=679)
(actual time=3.366..3.366 rows=507 loops=1)
                     ->  Seq Scan on movies_movie  (cost=0.00..52.07
rows=507 width=679) (actual time=0.009..1.638 rows=507 loops=1)
 Total runtime: 14568.621 ms
(10 rows)


Re: Confusing performance of specific query

From
Tom Lane
Date:
Adam Endicott <leftwing17@gmail.com> writes:
> Here's the output from explain analyze.

Wow, so the differential is all in the sort step.

8.2 does have improved sorting code, but I don't think that explains
the difference, especially not for a mere 16000 rows to be sorted.

Do you have comparable work_mem settings on both machines?  Another
thing to look at, if any of the sort key columns are textual, is whether
the lc_collate settings are the same.

            regards, tom lane

Re: Confusing performance of specific query

From
Adam Endicott
Date:
On Aug 9, 10:47 pm, t...@sss.pgh.pa.us (Tom Lane) wrote:
> Do you have comparable work_mem settings on both machines?  Another
> thing to look at, if any of the sort key columns are textual, is whether
> the lc_collate settings are the same.

work_mem is commented out in both postgresql.conf files:
desktop: #work_mem = 1MB
production: #work_mem = 1024

So I guess they use the defaults?

Ahh, but lc_collate is different (assuming I'm finding it right). On
my machine it's set to "C", and on the production server it's set to
"en_US.UTF-8". I'm going to have to do some research to figure out
what that does.