Thread: JOIN order, 15K, 15K, 7MM rows
All three tables have the same integer key, and it's indexed.
I parenthesized the joins to do the two small tables first.
I'm running and INSERT INTO ... SELECT query with this join (one record added per record in join), 4 hours down and all I have to show for it is 100 recycled transaction logs. ?
If it ever returns, I can post the Explain output.
Andrew, > I'm running and INSERT INTO ... SELECT query with this join (one record > added per record in join), 4 hours down and all I have to show for it is > 100 recycled transaction logs. ? > > If it ever returns, I can post the Explain output. How about giving us the query and the regular EXPLAIN output right now? I've a feeling that you have an unconstrained join in the query somewhere. -- Josh Berkus Aglio Database Solutions San Francisco
EXPLAIN INSERT INTO public.historical_price ( security_serial_id, [7 fields of proprietary data]) SELECT public.security_series.security_serial_id, [7 fields of data], FROM obsolete.datadb_fix INNER JOIN (obsolete.calcdb INNER JOIN public.security_series ON obsolete.calcdb.serial=public.security_series.legacy_calcdb_id)ON obsolete.datadb_fix.id=public.security_series.legacy_calcdb_id; datadb_fix is about 5.5MM records. The other two tables are about 15K records. Hash Join (cost=1151.63..225863.54 rows=5535794 width=53) Hash Cond: ("outer".id = "inner".serial) -> Seq Scan on datadb_fix (cost=0.00..121867.99 rows=6729299 width=28) -> Hash (cost=1115.54..1115.54 rows=14438 width=25) -> Hash Join (cost=609.96..1115.54 rows=14438 width=25) Hash Cond: ("outer".legacy_calcdb_id = "inner".serial) -> Seq Scan on security_series (cost=0.00..247.40 rows=15540 width=13) -> Hash (cost=572.37..572.37 rows=15037 width=12) -> Seq Scan on calcdb (cost=0.00..572.37 rows=15037 width=12) pim_new-# Table "obsolete.datadb_fix" pim_new-# Column | Type | Modifiers pim_new-# -------------+------------------+----------- pim_new-# serial | integer | pim_new-# id | integer | pim_new-# date | date | [4 fields deleted] pim_new-# Indexes: sb_data_pkey unique btree (id, date), pim_new-# datadb1_id btree (id), pim_new=# \d obsolete.calcdb Table "obsolete.calcdb" Column | Type | Modifiers --------------------+----------------------+------------------------------------ ------------------- serial | integer | not null default nextval('"calcdb_s erial_seq"'::text) [...30 proprietary fields] Indexes: calcdb_serial_key unique btree (serial), [...5 other indexes] pim_new=# \d security_series Table "public.security_series" Column | Type | Modifiers --------------------+--------------+----------- security_serial_id | integer | not null period | character(1) | not null legacy_calcdb_id | integer | Indexes: security_series_pkey primary key btree (security_serial_id, period), secseries_legacy_id_idx1 btree (legacy_calcdb_id) The target table has three indexes on it, so I suppose that accounts for SOME extra time. I ended up cancelling the query,running the select on a faster machine into an unindexed temp table, then using COPY out and in. That process tookabout 2.5 hours total. Machine: Linux, PG 7.3.4, 1.1GHz, 768MB RAM, unfortunately running other stuff. The first trywhich didn't finish in 24 hours was on Mac OS X Jaguar, PG 7.3.3, 1GHz, 256MB (please don't laugh). Yes, hardware upgradesare coming, but I need to estimate how much more I have to squeeze out of the DB and client applications.