what's going on here? - Mailing list pgsql-general
From | Ben |
---|---|
Subject | what's going on here? |
Date | |
Msg-id | Pine.LNX.4.10.10103091131040.28803-100000@gilgamesh.eos.SilentMedia.com Whole thread Raw |
Responses |
Re: what's going on here?
Re: what's going on here? |
List | pgsql-general |
We have two postgres-driven web pages, and we've noticed in the last few days that one is loading about 10 times faster than the other. Looking into it, we found out that postgres has very different execution paths for what is essentially the same query, and this makes no sense to me. Sorry for the deludge of info here, but maybe it'll explain why this is happening.... First, here are the explains. Note the only parameter that changes is playlist.stream, and yet for some reason the second query makes use of indices which are unrelated to playlist.stream. music=# explain select users.name as username, playlist.id as id, paths.len as len, paths.path as path, timestamp(playlist.added) as added, timestamp(playlist.played) as played from users, playlist, paths where paths.md5=files.md5 and files.song=playlist.songid and playlist.userid=users.id and playlist.played is null and playlist.stream=1 order by playlist.added, playlist.id; NOTICE: QUERY PLAN: Sort (cost=10845.28..10845.28 rows=1857 width=80) -> Hash Join (cost=10007.93..10744.44 rows=1857 width=80) -> Merge Join (cost=10006.44..10668.17 rows=1857 width=64) -> Sort (cost=3754.03..3754.03 rows=2321 width=36) -> Hash Join (cost=306.62..3624.28 rows=2321 width=36) -> Seq Scan on files (cost=0.00..1072.13 rows=55413 width=16) -> Hash (cost=300.81..300.81 rows=2321 width=20) -> Seq Scan on playlist (cost=0.00..300.81 rows=2321 width=20) -> Sort (cost=6252.41..6252.41 rows=50617 width=28) -> Seq Scan on paths (cost=0.00..1677.17 rows=50617 width=28) -> Hash (cost=1.39..1.39 rows=39 width=16) -> Seq Scan on users (cost=0.00..1.39 rows=39 width=16) EXPLAIN music=# explain select users.name as username, playlist.id as id, paths.len as len, paths.path as path, timestamp(playlist.added) as added, timestamp(playlist.played) as played from users, playlist, paths where paths.md5=files.md5 and files.song=playlist.songid and playlist.userid=users.id and playlist.played is null and playlist.stream=2 order by playlist.added, playlist.id; NOTICE: QUERY PLAN: Sort (cost=1467.44..1467.44 rows=164 width=80) -> Hash Join (cost=1.49..1461.40 rows=164 width=80) -> Nested Loop (cost=0.00..1452.85 rows=164 width=64) -> Nested Loop (cost=0.00..771.76 rows=205 width=36) -> Seq Scan on playlist (cost=0.00..300.81 rows=205 width=20) -> Index Scan using files_song_btree_key on files (cost=0.00..2.28 rows=1 width=16) -> Index Scan using paths_md5_btree_key on paths (cost=0.00..3.31 rows=1 width=28) -> Hash (cost=1.39..1.39 rows=39 width=16) -> Seq Scan on users (cost=0.00..1.39 rows=39 width=16) EXPLAIN Okay, so now the relevant schema: Table "users" Attribute | Type | Modifier -------------+-----------+------------------------------------------------ id | integer | not null default nextval('users_id_seq'::text) name | text | not null Indices: users_name_key, users_pkey Index "users_name_key" Attribute | Type -----------+------ name | text unique btree Index "users_pkey" Attribute | Type -----------+--------- id | integer unique btree (primary key) Table "playlist" Attribute | Type | Modifier -----------+---------+--------------------------------------------------- id | integer | not null default nextval('playlist_id_seq'::text) songid | integer | userid | integer | added | integer | played | integer | stream | integer | Indices: playlist_added_key, playlist_pkey, playlist_played_key, playlist_songid_key Index "playlist_added_key" Attribute | Type -----------+--------- added | integer btree Index "playlist_pkey" Attribute | Type -----------+--------- id | integer unique btree (primary key) Index "playlist_played_key" Attribute | Type -----------+--------- played | integer btree Index "playlist_songid_key" Attribute | Type -----------+--------- songid | integer btree Table "paths" Attribute | Type | Modifier -----------+-----------+---------- md5 | char(32) | not null path | text | not null len | integer | Indices: paths_md5_btree_key, paths_md5_key, paths_path_key Index "paths_md5_btree_key" Attribute | Type -----------+---------- md5 | char(32) btree Index "paths_md5_key" Attribute | Type -----------+---------- md5 | char(32) hash Index "paths_path_key" Attribute | Type -----------+------ path | text btree Table "files" Attribute | Type | Modifier -----------+----------+---------- song | integer | not null md5 | char(32) | not null Indices: files_md5_btree_key, files_md5_key, files_song_btree_key, files_song_key Index "files_md5_btree_key" Attribute | Type -----------+---------- md5 | char(32) btree Index "files_md5_key" Attribute | Type -----------+---------- md5 | char(32) hash Index "files_song_btree_key" Attribute | Type -----------+--------- song | integer btree Index "files_song_key" Attribute | Type -----------+--------- song | integer hash Interestingly, the sequential scan on playlist claims to be returning 2000 results for stream=1 and only 200 for stream=2. I'm not sure which part of the where clause this guess comes from, because the playlist table has equal numbers of entries for both streams.
pgsql-general by date: