Thread: Performance problem with 50,000,000 rows
I'm new to PG but this just seems wrong. Can someone take a look: .-----------. .-----------. | bk_inv | | bk_title | |-----------| |-----------| | isbn |<--->| isbn | | store | | vendor | | qty | | | | week | `-----------' | | 2,000,000 recs `-----------' 50,000,000 recs Actual record numbers: bk_inv : 46,790,877 bk_title: 2,311,710 VENDOR REPORT A list of Inventory items, for any one given vendor (e.q. 01672708) select i.isbn, t.vendor, i.store, i.qty from bk_inv i, bk_title t where i.isbn = t.isbn and t.vendor = '01672708' ; This query should be instantaneous. Granted that's 50 million rows, but I have created an index on the isbn column for both tables. After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it spits out: ERROR: Write to hashjoin temp file failed tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn and t.vendor ='50000029'; NOTICE: QUERY PLAN: Merge Join (cost=0.00..11229637.06 rows=2172466 width=72) -> Index Scan using bk_title_isbn_idx on bk_title t (cost=0.00..390788.08 rows=107331 width=24) -> Index Scan using bk_inv_isbn_idx on bk_inv i (cost=0.00..10252621.38 rows=46790877 width=48) BIG COST! These explain queries show the existance of the indexes and give small costs: tiger=# explain select * from bk_title where isbn = '50000029'; NOTICE: QUERY PLAN: Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1 width=24) tiger=# explain select * from bk_inv where isbn = '0897474228'; NOTICE: QUERY PLAN: Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55 width=48) Note. Same tables, same query returns instantaneously with Oracle 8.1. What I am hoping to show is that Postgres can do our job too. Any help on this much obliged. (Yes I ran vacuum analyze). David Link White Plains, NY
Have you done a VACUUM ANALYZE ? -Mitch ----- Original Message ----- From: "David Link" <dlink@soundscan.com> To: <pgsql-general@postgresql.org> Sent: Wednesday, October 10, 2001 3:46 PM Subject: [GENERAL] Performance problem with 50,000,000 rows > I'm new to PG but this just seems wrong. Can someone take a look: > > .-----------. .-----------. > | bk_inv | | bk_title | > |-----------| |-----------| > | isbn |<--->| isbn | > | store | | vendor | > | qty | | | > | week | `-----------' > | | 2,000,000 recs > `-----------' > 50,000,000 recs > > Actual record numbers: > bk_inv : 46,790,877 > bk_title: 2,311,710 > > > VENDOR REPORT > > A list of Inventory items, for any one given vendor (e.q. 01672708) > > select i.isbn, > t.vendor, > i.store, > i.qty > from bk_inv i, > bk_title t > where i.isbn = t.isbn > and t.vendor = '01672708' ; > > This query should be instantaneous. Granted that's 50 million rows, but > I have created an index on the isbn column for both tables. > After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it > spits out: > > ERROR: Write to hashjoin temp file failed > > tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn > and t.vendor ='50000029'; > NOTICE: QUERY PLAN: > > Merge Join (cost=0.00..11229637.06 rows=2172466 width=72) > -> Index Scan using bk_title_isbn_idx on bk_title t > (cost=0.00..390788.08 rows=107331 width=24) > -> Index Scan using bk_inv_isbn_idx on bk_inv i > (cost=0.00..10252621.38 rows=46790877 width=48) > > BIG COST! > > > These explain queries show the existance of the indexes and give small > costs: > > tiger=# explain select * from bk_title where isbn = '50000029'; > NOTICE: QUERY PLAN: > > Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1 > width=24) > > tiger=# explain select * from bk_inv where isbn = '0897474228'; > NOTICE: QUERY PLAN: > > Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55 > width=48) > > > Note. Same tables, same query returns instantaneously with Oracle 8.1. > What I am hoping to show is that Postgres can do our job too. > > Any help on this much obliged. (Yes I ran vacuum analyze). > > David Link > White Plains, NY > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
David Link wrote: Just, reading it and I always enter these kind of queries like: select i.isbn, t.vendor, i.store, i.qty from bk_inv i, bk_title t where t.vendor = '01672708' and i.isbn = t.isbn; Don't know if it makes a difference but I can imagine that it could because the t.vendor = '...' limits the possibilities in the first stage. Regards, Feite > I'm new to PG but this just seems wrong. Can someone take a look: > > .-----------. .-----------. > | bk_inv | | bk_title | > |-----------| |-----------| > | isbn |<--->| isbn | > | store | | vendor | > | qty | | | > | week | `-----------' > | | 2,000,000 recs > `-----------' > 50,000,000 recs > > Actual record numbers: > bk_inv : 46,790,877 > bk_title: 2,311,710 > > VENDOR REPORT > > A list of Inventory items, for any one given vendor (e.q. 01672708) > > select i.isbn, > t.vendor, > i.store, > i.qty > from bk_inv i, > bk_title t > where i.isbn = t.isbn > and t.vendor = '01672708' ; > > This query should be instantaneous. Granted that's 50 million rows, but > I have created an index on the isbn column for both tables. > After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it > spits out: > > ERROR: Write to hashjoin temp file failed > > tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn > and t.vendor ='50000029'; > NOTICE: QUERY PLAN: > > Merge Join (cost=0.00..11229637.06 rows=2172466 width=72) > -> Index Scan using bk_title_isbn_idx on bk_title t > (cost=0.00..390788.08 rows=107331 width=24) > -> Index Scan using bk_inv_isbn_idx on bk_inv i > (cost=0.00..10252621.38 rows=46790877 width=48) > > BIG COST! > > These explain queries show the existance of the indexes and give small > costs: > > tiger=# explain select * from bk_title where isbn = '50000029'; > NOTICE: QUERY PLAN: > > Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1 > width=24) > > tiger=# explain select * from bk_inv where isbn = '0897474228'; > NOTICE: QUERY PLAN: > > Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55 > width=48) > > Note. Same tables, same query returns instantaneously with Oracle 8.1. > What I am hoping to show is that Postgres can do our job too. > > Any help on this much obliged. (Yes I ran vacuum analyze). > > David Link > White Plains, NY > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Feite Brekeveld feite.brekeveld@osiris-it.nl http://www.osiris-it.nl
Previously : > >Table sizes ..... > >bk_inv : 46,790,877 >bk_title: 2,311,710 > >Query : > >select i.isbn, > t.vendor, > i.store, > i.qty >from bk_inv i, > bk_title t >where i.isbn = t.isbn >and t.vendor = '01672708' ; > It might be worth putting and index on bk_title.vendor, then the scan of this table will quickly find the appropriate rows( assuming vendor is selective)... then the join to bk_inv can use the bk_inv.isbn index as before, but hopefully with a smaller dataset....(so hopefully it might be able to use a nested loop from bk_title -> bk_inv instead of a huge merge join) It also might be worth playing with sort_mem and shared_buffers (if you have not done so already) - since you have a reasonable amount of memory. good luck Mark
Solution to the problem: I had run vaccum on the two tables in question (which happen to be the only two tables user defined in the database) # vacuum analyze bk_inv; # vacuum analyze bk_title; When I ran vacuum for all tables (including the system tables I suppose) # vacuum analyze; I got the response time I exected. (Why this is perhaps someone can explain). Thanks for the feedbacks. Regards, David David Link wrote: > > I'm new to PG but this just seems wrong. Can someone take a look: > > .-----------. .-----------. > | bk_inv | | bk_title | > |-----------| |-----------| > | isbn |<--->| isbn | > | store | | vendor | > | qty | | | > | week | `-----------' > | | 2,000,000 recs > `-----------' > 50,000,000 recs > > Actual record numbers: > bk_inv : 46,790,877 > bk_title: 2,311,710 > > VENDOR REPORT > > A list of Inventory items, for any one given vendor (e.q. 01672708) > > select i.isbn, > t.vendor, > i.store, > i.qty > from bk_inv i, > bk_title t > where i.isbn = t.isbn > and t.vendor = '01672708' ; > > This query should be instantaneous. Granted that's 50 million rows, but > I have created an index on the isbn column for both tables. > After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it > spits out: > > ERROR: Write to hashjoin temp file failed > > tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn > and t.vendor ='50000029'; > NOTICE: QUERY PLAN: > > Merge Join (cost=0.00..11229637.06 rows=2172466 width=72) > -> Index Scan using bk_title_isbn_idx on bk_title t > (cost=0.00..390788.08 rows=107331 width=24) > -> Index Scan using bk_inv_isbn_idx on bk_inv i > (cost=0.00..10252621.38 rows=46790877 width=48) > > BIG COST! > > These explain queries show the existance of the indexes and give small > costs: > > tiger=# explain select * from bk_title where isbn = '50000029'; > NOTICE: QUERY PLAN: > > Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1 > width=24) > > tiger=# explain select * from bk_inv where isbn = '0897474228'; > NOTICE: QUERY PLAN: > > Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55 > width=48) > > Note. Same tables, same query returns instantaneously with Oracle 8.1. > What I am hoping to show is that Postgres can do our job too. > > Any help on this much obliged. (Yes I ran vacuum analyze). > > David Link > White Plains, NY
On Thu, 2001-10-11 at 00:03, Feite Brekeveld wrote: > Just, reading it and I always enter these kind of queries like: > > select i.isbn, > t.vendor, > i.store, > i.qty > from bk_inv i, > bk_title t > where > t.vendor = '01672708' and > i.isbn = t.isbn; > > Don't know if it makes a difference but I can imagine that it could because > the t.vendor = '...' limits the possibilities in the first stage. I have noticed this behaviour too. Is there some sort of guide to use in order to get the query right? Or is this just part of SQL basics? Cheers Tony Grant
Hello, I tried to search the mailing list archives but got only: An error occured! PQconnectPoll() -- connect() failed: Connection refused Is the postmaster running (with -i) at 'db.hub.org' and acceptingconnections on TCP/IP port 5439? This doesn´t speak well about the backend we use ;-)). Kind regards Andreas (hopefully not repeating someone else because I can´t search the archive).