Re: Performance problem with 50,000,000 rows - Mailing list pgsql-general

From Feite Brekeveld
Subject Re: Performance problem with 50,000,000 rows
Date
Msg-id 3BC4C5BC.3CBCFDE@osiris-it.nl
Whole thread Raw
In response to Performance problem with 50,000,000 rows  (David Link <dlink@soundscan.com>)
Responses Re: Performance problem with 50,000,000 rows  (tony <tony@animaproductions.com>)
List pgsql-general
David Link wrote:

Just, reading it and I always enter these kind of queries like:

select   i.isbn,
         t.vendor,
         i.store,
         i.qty
from     bk_inv i,
         bk_title t
where
       t.vendor = '01672708' and
       i.isbn = t.isbn;

Don't know if it makes a difference but I can imagine that it could because
the t.vendor = '...' limits the possibilities in the first stage.

Regards,

Feite

> I'm new to PG but this just seems wrong.  Can someone take a look:
>
>   .-----------.     .-----------.
>   | bk_inv    |     | bk_title  |
>   |-----------|     |-----------|
>   | isbn      |<--->| isbn      |
>   | store     |     | vendor    |
>   | qty       |     |           |
>   | week      |     `-----------'
>   |           |      2,000,000 recs
>   `-----------'
>    50,000,000 recs
>
>   Actual record numbers:
>     bk_inv  : 46,790,877
>     bk_title:  2,311,710
>
> VENDOR REPORT
>
>   A list of Inventory items, for any one given vendor (e.q. 01672708)
>
> select   i.isbn,
>          t.vendor,
>          i.store,
>          i.qty
> from     bk_inv i,
>          bk_title t
> where    i.isbn = t.isbn
> and      t.vendor = '01672708' ;
>
> This query should be instantaneous.  Granted that's 50 million rows, but
> I have created an index on the isbn column for both tables.
> After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
> spits out:
>
> ERROR:  Write to hashjoin temp file failed
>
> tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
> and t.vendor  ='50000029';
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
>   ->  Index Scan using bk_title_isbn_idx on bk_title t
> (cost=0.00..390788.08 rows=107331 width=24)
>   ->  Index Scan using bk_inv_isbn_idx on bk_inv i
> (cost=0.00..10252621.38 rows=46790877 width=48)
>
> BIG COST!
>
> These explain queries show the existance of the indexes and give small
> costs:
>
> tiger=# explain select * from bk_title where isbn = '50000029';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
> width=24)
>
> tiger=# explain select * from bk_inv where isbn = '0897474228';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
> width=48)
>
> Note.  Same tables, same query returns instantaneously with Oracle 8.1.
> What I am hoping to show is that Postgres can do our job too.
>
> Any help on this much obliged.  (Yes I ran vacuum analyze).
>
> David Link
> White Plains, NY
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Feite Brekeveld
feite.brekeveld@osiris-it.nl
http://www.osiris-it.nl




pgsql-general by date:

Previous
From: "Jeffrey W. Baker"
Date:
Subject: Re: VACUUM, 24/7 availability and 7.2
Next
From: "Mihai Gheorghiu"
Date:
Subject: Re: Session identifier