Thread: Performance problem with 50,000,000 rows

Performance problem with 50,000,000 rows

From
David Link
Date:
I'm new to PG but this just seems wrong.  Can someone take a look:

  .-----------.     .-----------.
  | bk_inv    |     | bk_title  |
  |-----------|     |-----------|
  | isbn      |<--->| isbn      |
  | store     |     | vendor    |
  | qty       |     |           |
  | week      |     `-----------'
  |           |      2,000,000 recs
  `-----------'
   50,000,000 recs

  Actual record numbers:
    bk_inv  : 46,790,877
    bk_title:  2,311,710


VENDOR REPORT

  A list of Inventory items, for any one given vendor (e.q. 01672708)

select   i.isbn,
         t.vendor,
         i.store,
         i.qty
from     bk_inv i,
         bk_title t
where    i.isbn = t.isbn
and      t.vendor = '01672708' ;

This query should be instantaneous.  Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:

ERROR:  Write to hashjoin temp file failed

tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor  ='50000029';
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
  ->  Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
  ->  Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)

BIG COST!


These explain queries show the existance of the indexes and give small
costs:

tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE:  QUERY PLAN:

Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
width=24)

tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE:  QUERY PLAN:

Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
width=48)


Note.  Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.

Any help on this much obliged.  (Yes I ran vacuum analyze).

David Link
White Plains, NY

Re: Performance problem with 50,000,000 rows

From
"Mitch Vincent"
Date:
Have you done a VACUUM ANALYZE ?

-Mitch


----- Original Message -----
From: "David Link" <dlink@soundscan.com>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, October 10, 2001 3:46 PM
Subject: [GENERAL] Performance problem with 50,000,000 rows


> I'm new to PG but this just seems wrong.  Can someone take a look:
>
>   .-----------.     .-----------.
>   | bk_inv    |     | bk_title  |
>   |-----------|     |-----------|
>   | isbn      |<--->| isbn      |
>   | store     |     | vendor    |
>   | qty       |     |           |
>   | week      |     `-----------'
>   |           |      2,000,000 recs
>   `-----------'
>    50,000,000 recs
>
>   Actual record numbers:
>     bk_inv  : 46,790,877
>     bk_title:  2,311,710
>
>
> VENDOR REPORT
>
>   A list of Inventory items, for any one given vendor (e.q. 01672708)
>
> select   i.isbn,
>          t.vendor,
>          i.store,
>          i.qty
> from     bk_inv i,
>          bk_title t
> where    i.isbn = t.isbn
> and      t.vendor = '01672708' ;
>
> This query should be instantaneous.  Granted that's 50 million rows, but
> I have created an index on the isbn column for both tables.
> After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
> spits out:
>
> ERROR:  Write to hashjoin temp file failed
>
> tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
> and t.vendor  ='50000029';
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
>   ->  Index Scan using bk_title_isbn_idx on bk_title t
> (cost=0.00..390788.08 rows=107331 width=24)
>   ->  Index Scan using bk_inv_isbn_idx on bk_inv i
> (cost=0.00..10252621.38 rows=46790877 width=48)
>
> BIG COST!
>
>
> These explain queries show the existance of the indexes and give small
> costs:
>
> tiger=# explain select * from bk_title where isbn = '50000029';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
> width=24)
>
> tiger=# explain select * from bk_inv where isbn = '0897474228';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
> width=48)
>
>
> Note.  Same tables, same query returns instantaneously with Oracle 8.1.
> What I am hoping to show is that Postgres can do our job too.
>
> Any help on this much obliged.  (Yes I ran vacuum analyze).
>
> David Link
> White Plains, NY
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Re: Performance problem with 50,000,000 rows

From
Feite Brekeveld
Date:
David Link wrote:

Just, reading it and I always enter these kind of queries like:

select   i.isbn,
         t.vendor,
         i.store,
         i.qty
from     bk_inv i,
         bk_title t
where
       t.vendor = '01672708' and
       i.isbn = t.isbn;

Don't know if it makes a difference but I can imagine that it could because
the t.vendor = '...' limits the possibilities in the first stage.

Regards,

Feite

> I'm new to PG but this just seems wrong.  Can someone take a look:
>
>   .-----------.     .-----------.
>   | bk_inv    |     | bk_title  |
>   |-----------|     |-----------|
>   | isbn      |<--->| isbn      |
>   | store     |     | vendor    |
>   | qty       |     |           |
>   | week      |     `-----------'
>   |           |      2,000,000 recs
>   `-----------'
>    50,000,000 recs
>
>   Actual record numbers:
>     bk_inv  : 46,790,877
>     bk_title:  2,311,710
>
> VENDOR REPORT
>
>   A list of Inventory items, for any one given vendor (e.q. 01672708)
>
> select   i.isbn,
>          t.vendor,
>          i.store,
>          i.qty
> from     bk_inv i,
>          bk_title t
> where    i.isbn = t.isbn
> and      t.vendor = '01672708' ;
>
> This query should be instantaneous.  Granted that's 50 million rows, but
> I have created an index on the isbn column for both tables.
> After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
> spits out:
>
> ERROR:  Write to hashjoin temp file failed
>
> tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
> and t.vendor  ='50000029';
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
>   ->  Index Scan using bk_title_isbn_idx on bk_title t
> (cost=0.00..390788.08 rows=107331 width=24)
>   ->  Index Scan using bk_inv_isbn_idx on bk_inv i
> (cost=0.00..10252621.38 rows=46790877 width=48)
>
> BIG COST!
>
> These explain queries show the existance of the indexes and give small
> costs:
>
> tiger=# explain select * from bk_title where isbn = '50000029';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
> width=24)
>
> tiger=# explain select * from bk_inv where isbn = '0897474228';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
> width=48)
>
> Note.  Same tables, same query returns instantaneously with Oracle 8.1.
> What I am hoping to show is that Postgres can do our job too.
>
> Any help on this much obliged.  (Yes I ran vacuum analyze).
>
> David Link
> White Plains, NY
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Feite Brekeveld
feite.brekeveld@osiris-it.nl
http://www.osiris-it.nl




Re: Performance problem with 50,000,000 rows

From
Mark kirkwood
Date:
Previously :
>
>Table sizes .....
>
>bk_inv  : 46,790,877
>bk_title:  2,311,710
>
>Query :
>
>select   i.isbn,
>         t.vendor,
>         i.store,
>         i.qty
>from     bk_inv i,
>         bk_title t
>where    i.isbn = t.isbn
>and      t.vendor = '01672708' ;
>

It might be worth putting and index on bk_title.vendor, then the scan of this
table will quickly find the appropriate rows( assuming vendor is
selective)... then the join to bk_inv can use the bk_inv.isbn index as
before, but hopefully with a smaller dataset....(so hopefully it might be
able to use a nested loop from bk_title -> bk_inv instead of a huge merge
join)

It also might be worth playing with sort_mem and shared_buffers (if you have
not done so already) - since you have a reasonable amount of memory.

good luck

Mark



Re: Performance problem with 50,000,000 rows

From
David Link
Date:
Solution to the problem:

  I had run vaccum on the two tables in question (which happen to be the
only two tables user defined in the database)

  # vacuum analyze bk_inv;
  # vacuum analyze bk_title;

When I ran vacuum for all tables (including the system tables I suppose)

  # vacuum analyze;

I got the response time I exected. (Why this is perhaps someone can
explain).
Thanks for the feedbacks.
Regards, David

David Link wrote:
>
> I'm new to PG but this just seems wrong.  Can someone take a look:
>
>   .-----------.     .-----------.
>   | bk_inv    |     | bk_title  |
>   |-----------|     |-----------|
>   | isbn      |<--->| isbn      |
>   | store     |     | vendor    |
>   | qty       |     |           |
>   | week      |     `-----------'
>   |           |      2,000,000 recs
>   `-----------'
>    50,000,000 recs
>
>   Actual record numbers:
>     bk_inv  : 46,790,877
>     bk_title:  2,311,710
>
> VENDOR REPORT
>
>   A list of Inventory items, for any one given vendor (e.q. 01672708)
>
> select   i.isbn,
>          t.vendor,
>          i.store,
>          i.qty
> from     bk_inv i,
>          bk_title t
> where    i.isbn = t.isbn
> and      t.vendor = '01672708' ;
>
> This query should be instantaneous.  Granted that's 50 million rows, but
> I have created an index on the isbn column for both tables.
> After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
> spits out:
>
> ERROR:  Write to hashjoin temp file failed
>
> tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
> and t.vendor  ='50000029';
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
>   ->  Index Scan using bk_title_isbn_idx on bk_title t
> (cost=0.00..390788.08 rows=107331 width=24)
>   ->  Index Scan using bk_inv_isbn_idx on bk_inv i
> (cost=0.00..10252621.38 rows=46790877 width=48)
>
> BIG COST!
>
> These explain queries show the existance of the indexes and give small
> costs:
>
> tiger=# explain select * from bk_title where isbn = '50000029';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
> width=24)
>
> tiger=# explain select * from bk_inv where isbn = '0897474228';
> NOTICE:  QUERY PLAN:
>
> Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
> width=48)
>
> Note.  Same tables, same query returns instantaneously with Oracle 8.1.
> What I am hoping to show is that Postgres can do our job too.
>
> Any help on this much obliged.  (Yes I ran vacuum analyze).
>
> David Link
> White Plains, NY

Re: Performance problem with 50,000,000 rows

From
tony
Date:
On Thu, 2001-10-11 at 00:03, Feite Brekeveld wrote:

> Just, reading it and I always enter these kind of queries like:
>
> select   i.isbn,
>          t.vendor,
>          i.store,
>          i.qty
> from     bk_inv i,
>          bk_title t
> where
>        t.vendor = '01672708' and
>        i.isbn = t.isbn;
>
> Don't know if it makes a difference but I can imagine that it could because
> the t.vendor = '...' limits the possibilities in the first stage.

I have noticed this behaviour too. Is there some sort of guide to use in
order to get the query right? Or is this just part of SQL basics?

Cheers

Tony Grant


List archive broken

From
"Tille, Andreas"
Date:
Hello,

I tried to search the mailing list archives but got only:

                              An error occured!

PQconnectPoll() -- connect() failed: Connection refused Is the postmaster running (with -i) at 'db.hub.org' and
acceptingconnections on TCP/IP port 5439? 


This doesn´t speak well about the backend we use ;-)).

Kind regards

        Andreas (hopefully not repeating someone else because I can´t search
                 the archive).