Performance problem with 50,000,000 rows - Mailing list pgsql-general

From David Link
Subject Performance problem with 50,000,000 rows
Date
Msg-id 3BC4A589.6BBA0024@soundscan.com
Whole thread Raw
Responses List archive broken
List pgsql-general
I'm new to PG but this just seems wrong.  Can someone take a look:

  .-----------.     .-----------.
  | bk_inv    |     | bk_title  |
  |-----------|     |-----------|
  | isbn      |<--->| isbn      |
  | store     |     | vendor    |
  | qty       |     |           |
  | week      |     `-----------'
  |           |      2,000,000 recs
  `-----------'
   50,000,000 recs

  Actual record numbers:
    bk_inv  : 46,790,877
    bk_title:  2,311,710


VENDOR REPORT

  A list of Inventory items, for any one given vendor (e.q. 01672708)

select   i.isbn,
         t.vendor,
         i.store,
         i.qty
from     bk_inv i,
         bk_title t
where    i.isbn = t.isbn
and      t.vendor = '01672708' ;

This query should be instantaneous.  Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:

ERROR:  Write to hashjoin temp file failed

tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor  ='50000029';
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
  ->  Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
  ->  Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)

BIG COST!


These explain queries show the existance of the indexes and give small
costs:

tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE:  QUERY PLAN:

Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
width=24)

tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE:  QUERY PLAN:

Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
width=48)


Note.  Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.

Any help on this much obliged.  (Yes I ran vacuum analyze).

David Link
White Plains, NY

pgsql-general by date:

Previous
From: "H. Wade Minter"
Date:
Subject: Where to count
Next
From: Doug McNaught
Date:
Subject: Re: VACUUM, 24/7 availability and 7.2