Thread: Slow query
The following query has never finished. I have let it run for over 24 hours. This is a one time update that is part of a conversion script from MSSQL data. All of the tables are freshly built and inserted into. I have not run explain analyze because it does not return in a reasonable time. Explain output is posted below. Any suggestions on syntax changes or anything else to improve this would be appreciated. Dual PIII 1Ghz 4 GB RAM 4 spindle IDE RAID 0 on LSI controller. Postgres 7.4.5 Linux version 2.6.3-7mdk-p3-smp-64GB postgresql.cong snip: tcpip_socket = true max_connections = 40 shared_buffers = 1000 sort_mem = 65536 fsync = true source_song_title +-10,500,000 rows source_song +-9,500,000 rows source_system 10 rows source_title +- 5,600,000 Code run right before this query: create index ssa_source_song_id on source_song_artist (source_song_id); analyze source_song_artist; create index sa_artist_id on source_artist (artist_id); analyze source_artist; create index ss_source_song_id on source_song (source_song_id); analyze source_song; create index st_title_id on source_title (title_id); analyze source_title; source_song.source_song_id = int4 source_song_title.source_song_id = int4 source_title.title_id = int4 source_song_title.title_id = int4 update source_song_title set source_song_title_id = nextval('source_song_title_seq') ,licensing_match_order = (select licensing_match_order from source_system where source_system_id = ss.source_system_id) ,affiliation_match_order = (select affiliation_match_order from source_system where source_system_id = ss.source_system_id) ,title = st.title from source_song_title sst join source_song ss on ss.source_song_id = sst.source_song_id join source_title st on st.title_id = sst.title_id where source_song_title.source_song_id = sst.source_song_id; Explain output: "Hash Join (cost=168589.60..16651072.43 rows=6386404 width=335)" " Hash Cond: ("outer".title_id = "inner".title_id)" " -> Merge Join (cost=0.00..1168310.61 rows=6386403 width=311)" " Merge Cond: ("outer".source_song_id = "inner".source_song_id)" " -> Merge Join (cost=0.00..679279.40 rows=6386403 width=16)" " Merge Cond: ("outer".source_song_id = "inner".source_song_id)" " -> Index Scan using source_song_title_pkey on source_song_title sst (cost=0.00..381779.37 rows=10968719 width=8)" " -> Index Scan using ss_source_song_id on source_song ss (cost=0.00..190583.36 rows=6386403 width=8)" " -> Index Scan using source_song_title_pkey on source_song_title (cost=0.00..381779.37 rows=10968719 width=303)" " -> Hash (cost=117112.08..117112.08 rows=5513808 width=32)" " -> Seq Scan on source_title st (cost=0.00..117112.08 rows=5513808 width=32)" " SubPlan" " -> Seq Scan on source_system (cost=0.00..1.14 rows=2 width=4)" " Filter: (source_system_id = $0)" " -> Seq Scan on source_system (cost=0.00..1.14 rows=2 width=2)" " Filter: (source_system_id = $0)"
Roger Ging <rging@paccomsys.com> writes: > update source_song_title set > source_song_title_id = nextval('source_song_title_seq') > ,licensing_match_order = (select licensing_match_order from > source_system where source_system_id = ss.source_system_id) > ,affiliation_match_order = (select affiliation_match_order from > source_system where source_system_id = ss.source_system_id) > ,title = st.title > from source_song_title sst > join source_song ss on ss.source_song_id = sst.source_song_id > join source_title st on st.title_id = sst.title_id > where source_song_title.source_song_id = sst.source_song_id; Why is "source_song_title sst" in there? To the extent that source_song_id is not unique, you are multiply updating rows because of the self-join. regards, tom lane
Any time you run subqueries, it's going to slow down the update process a lot. Each record that is updated in source_song_title runs two additional queries. When I do large updates like this, I usualy Run a transaction that will select all the new data into a new table on a join. For example SELECT a.*, b.licensing_match_order, b.affiliation_match_order, d.title INTO updated_data FROM source_song_title AS a INNER JOIN source_system AS b ON b.id = d.id INNER JOIN source_song AS c ON a.id = c.id INNER JOIN source_title AS d ON a.id = d.id I'm not sure that query does what you want, but you get the idea. Then just drop the old table and rename the updated_data table. This way instead of doing a bunch of updates, you do one select and a rename. -Josh On Fri, 22 Oct 2004 16:37:14 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Roger Ging <rging@paccomsys.com> writes: > > update source_song_title set > > source_song_title_id = nextval('source_song_title_seq') > > ,licensing_match_order = (select licensing_match_order from > > source_system where source_system_id = ss.source_system_id) > > ,affiliation_match_order = (select affiliation_match_order from > > source_system where source_system_id = ss.source_system_id) > > ,title = st.title > > from source_song_title sst > > join source_song ss on ss.source_song_id = sst.source_song_id > > join source_title st on st.title_id = sst.title_id > > where source_song_title.source_song_id = sst.source_song_id; > > Why is "source_song_title sst" in there? To the extent that > source_song_id is not unique, you are multiply updating rows > because of the self-join. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >