Simple join optimized badly? - Mailing list pgsql-performance
From | Craig A. James |
---|---|
Subject | Simple join optimized badly? |
Date | |
Msg-id | 45274A77.7080005@modgraph-usa.com Whole thread Raw |
Responses |
Re: Simple join optimized badly?
|
List | pgsql-performance |
I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort. HITLIST_ROWS has about 48,000 rows SAMPLE has about 16 million rows The joined column is indexed in SAMPLE HITLIST_ROWS is a scratch table which is used a few times then discarded. HITLIST_ROWS has no indexes at all There are two plans below. The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it's sortingthe 16 million rows of the SEARCH table. Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. First question: HITLIST_ROWS so small, I don't understand why the lack of ANALYZE should cause SAMPLE's contents to be sorted. Second question: Even though ANALYZE brings it down from 26 minutes to 47 seconds, a huge improvement, it still seems slowto me. Its going at roughly 1 row per millisecond -- are my expectations too high? This is a small-ish Dell computer(Xeon), 4 GB memory, with a four-disk SATA software RAID0 (bandwidth limited to about 130 MB/sec due to PCI cards). Other joins of a similar size seem much faster. It looks like I'll need to do an ANALYZE every time I modify HITLIST_ROWS, which seems like a waste because HITLIST_ROWSis rarely used more than once or twice before being truncated and rebuilt with new content. (HITLIST_ROWS can'tbe an actual temporary table, though, because it's a web application and each access is from a new connection.) This is Postgres 8.0.3. (We're upgrading soon.) Thanks, Craig explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=4782.35..1063809.82 rows=613226 width=4) (actual time=174.212..1593886.582 rows=176294 loops=1) Merge Cond: ("outer".version_id = "inner".objectid) -> Index Scan using i_sample_version_id on sample t (cost=0.00..1008713.68 rows=16446157 width=8) (actual time=0.111..1571911.208rows=16446157 loops=1) -> Sort (cost=4782.35..4910.39 rows=51216 width=4) (actual time=173.669..389.496 rows=176329 loops=1) Sort Key: ph.objectid -> Seq Scan on hitlist_rows_378593 ph (cost=0.00..776.16 rows=51216 width=4) (actual time=0.015..90.059 rows=48834loops=1) Total runtime: 1594093.725 ms (7 rows) chmoogle2=> analyze HITLIST_ROWS; ANALYZE chmoogle2=> explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=874.43..457976.83 rows=584705 width=4) (actual time=302.792..47796.719 rows=176294 loops=1) Hash Cond: ("outer".version_id = "inner".objectid) -> Seq Scan on sample t (cost=0.00..369024.57 rows=16446157 width=8) (actual time=46.344..26752.343 rows=16446157 loops=1) -> Hash (cost=752.34..752.34 rows=48834 width=4) (actual time=149.548..149.548 rows=0 loops=1) -> Seq Scan on hitlist_rows_378593 ph (cost=0.00..752.34 rows=48834 width=4) (actual time=0.048..80.721 rows=48834loops=1) Total runtime: 47988.572 ms (6 rows)
pgsql-performance by date: