Simple join optimized badly? - Mailing list pgsql-performance

From Craig A. James
Subject Simple join optimized badly?
Date
Msg-id 45274A77.7080005@modgraph-usa.com
Whole thread Raw
Responses Re: Simple join optimized badly?
List pgsql-performance
I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort.

  HITLIST_ROWS has about 48,000 rows
  SAMPLE has about 16 million rows

  The joined column is indexed in SAMPLE
  HITLIST_ROWS is a scratch table which is used a few times then discarded.
  HITLIST_ROWS has no indexes at all

There are two plans below.  The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it's
sortingthe 16 million rows of the SEARCH table.  Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. 

First question: HITLIST_ROWS so small, I don't understand why the lack of ANALYZE should cause SAMPLE's contents to be
sorted.

Second question: Even though ANALYZE brings it down from 26 minutes to 47 seconds, a huge improvement, it still seems
slowto me.  Its going at roughly 1 row per millisecond -- are my expectations too high?  This is a small-ish Dell
computer(Xeon), 4 GB memory, with a four-disk SATA software RAID0 (bandwidth limited to about 130 MB/sec due to PCI
cards). Other joins of a similar size seem much faster. 

It looks like I'll need to do an ANALYZE every time I modify HITLIST_ROWS, which seems like a waste because
HITLIST_ROWSis rarely used more than once or twice before being truncated and rebuilt with new content.  (HITLIST_ROWS
can'tbe an actual temporary table, though, because it's a web application and each access is from a new connection.) 

This is Postgres 8.0.3.  (We're upgrading soon.)

Thanks,
Craig



explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID);
                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=4782.35..1063809.82 rows=613226 width=4) (actual time=174.212..1593886.582 rows=176294 loops=1)
   Merge Cond: ("outer".version_id = "inner".objectid)
   ->  Index Scan using i_sample_version_id on sample t  (cost=0.00..1008713.68 rows=16446157 width=8) (actual
time=0.111..1571911.208rows=16446157 loops=1) 
   ->  Sort  (cost=4782.35..4910.39 rows=51216 width=4) (actual time=173.669..389.496 rows=176329 loops=1)
         Sort Key: ph.objectid
         ->  Seq Scan on hitlist_rows_378593 ph  (cost=0.00..776.16 rows=51216 width=4) (actual time=0.015..90.059
rows=48834loops=1) 
 Total runtime: 1594093.725 ms
(7 rows)

chmoogle2=> analyze HITLIST_ROWS;
ANALYZE
chmoogle2=> explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID;
                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=874.43..457976.83 rows=584705 width=4) (actual time=302.792..47796.719 rows=176294 loops=1)
   Hash Cond: ("outer".version_id = "inner".objectid)
   ->  Seq Scan on sample t  (cost=0.00..369024.57 rows=16446157 width=8) (actual time=46.344..26752.343 rows=16446157
loops=1)
   ->  Hash  (cost=752.34..752.34 rows=48834 width=4) (actual time=149.548..149.548 rows=0 loops=1)
         ->  Seq Scan on hitlist_rows_378593 ph  (cost=0.00..752.34 rows=48834 width=4) (actual time=0.048..80.721
rows=48834loops=1) 
 Total runtime: 47988.572 ms
(6 rows)

pgsql-performance by date:

Previous
From: "Carlo Stonebanks"
Date:
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Next
From: Tom Lane
Date:
Subject: Re: Simple join optimized badly?