Thread: optimising data load
Hi all, I'm (still) working on an application to regularly populate my database with some data provided from a third party DB. I'm still having really bad performance problems. There are 300,000 records to be inserted, but I'm only getting 10,000/hour. 30 hours to populate a single table is just not on. There must be something seriously bad going on. I have to update to different databases: live, and an update log. I've included explains for both the db's below. I have a few questions: 1) Can it really be right that it is going this slow ? I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3 2) Can anyone see a way to optimise these queries further ? 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ? Thanks JohnT --- LIVE --- explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment) SELECT oh.theorder,'P',' 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY ol.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12) -> Limit (cost=47.41..47.41 rows=1 width=12) -> Sort (cost=47.41..47.41 rows=1 width=12) -> Nested Loop (cost=0.00..47.40 rows=1 width=12) -> Index Scan using orderheader_account on orderheader oh (cost=0.00..21.64 rows=1 width=4) -> Index Scan using orderlines_pkey on orderlines ol (cost=0.00..25.54 rows=17 width=8) EXPLAIN --- UPDATE LOG --- explain INSERT INTO orderlinesupdates (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin) SELECT oh.theorder,'P',' 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock=' 0310' LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36) -> Limit (cost=81.29..81.29 rows=1 width=36) -> Sort (cost=81.29..81.29 rows=1 width=36) -> Nested Loop (cost=0.00..81.28 rows=1 width=36) -> Nested Loop (cost=0.00..52.47 rows=1 width=28) -> Index Scan using orderheaderupdates_account on orderheaderupdates oh (cost=0.00..23.62 rows=1width=8) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol (cost=0.00..28.60 rows=17 width=20) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2 (cost=0.00..28.60 rows=17 width=8) EXPLAIN
I think something like setting fsync = off in postgresql.conf and dropping indeces should boost performance. I think it did help when we had a similar problem populating our database with data from an old access database. -------------------------------------------------------------- Rasmus T. Mohr Direct : +45 36 910 122 Application Developer Mobile : +45 28 731 827 Netpointers Intl. ApS Phone : +45 70 117 117 Vestergade 18 B Fax : +45 70 115 115 1456 Copenhagen K Email : mailto:rmo@netpointers.com Denmark Website : http://www.netpointers.com "Remember that there are no bugs, only undocumented features." -------------------------------------------------------------- > -----Original Message----- > From: pgsql-novice-owner@postgresql.org > [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of John Taylor > Sent: Wednesday, May 22, 2002 3:46 PM > To: pgsql-novice@postgresql.org > Subject: [NOVICE] optimising data load > > > > Hi all, > > I'm (still) working on an application to regularly populate > my database with some data provided from > a third party DB. > > I'm still having really bad performance problems. > There are 300,000 records to be inserted, but I'm only > getting 10,000/hour. > 30 hours to populate a single table is just not on. There > must be something seriously bad going on. > > I have to update to different databases: live, and an update log. > > I've included explains for both the db's below. > > I have a few questions: > > 1) Can it really be right that it is going this slow ? > I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3 > > 2) Can anyone see a way to optimise these queries further ? > > 3) Is it likely to be quicker if I use a query to obtain > ol.line, and then generate a CSV file for use with COPY ? > > Thanks > JohnT > > --- LIVE --- > explain INSERT INTO orderlines > (theorder,type,stock,line,ordercurrent,sellingquant,price,disc > ount,vatrate,comment) > SELECT oh.theorder,'P',' > 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh > LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder > WHERE oh.account=' MILN1' AND oh.delivery=1 AND > oh.thedate='2002-06-01' AND oh.ordertype='O' > ORDER BY ol.line DESC LIMIT 1; > NOTICE: QUERY PLAN: > > Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12) > -> Limit (cost=47.41..47.41 rows=1 width=12) > -> Sort (cost=47.41..47.41 rows=1 width=12) > -> Nested Loop (cost=0.00..47.40 rows=1 width=12) > -> Index Scan using orderheader_account > on orderheader oh (cost=0.00..21.64 rows=1 width=4) > -> Index Scan using orderlines_pkey on > orderlines ol (cost=0.00..25.54 rows=17 width=8) > > EXPLAIN > > --- UPDATE LOG --- > explain INSERT INTO orderlinesupdates > (theorder,type,stock,line,ordercurrent,sellingquant,price,disc > ount,vatrate,comment,updated,utype,origin) > SELECT oh.theorder,'P',' > 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C > ' FROM orderheaderupdates oh > LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = > ol.theorder AND ol.stock=' 0310' > LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder > WHERE oh.account=' MILN1' AND oh.delivery=1 AND > oh.thedate='2002-06-01' AND oh.ordertype='O' > ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1; > NOTICE: QUERY PLAN: > > Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36) > -> Limit (cost=81.29..81.29 rows=1 width=36) > -> Sort (cost=81.29..81.29 rows=1 width=36) > -> Nested Loop (cost=0.00..81.28 rows=1 width=36) > -> Nested Loop (cost=0.00..52.47 rows=1 > width=28) > -> Index Scan using > orderheaderupdates_account on orderheaderupdates oh > (cost=0.00..23.62 rows=1 width=8) > -> Index Scan using > orderlinesupdates_theorder on orderlinesupdates ol > (cost=0.00..28.60 rows=17 width=20) > -> Index Scan using > orderlinesupdates_theorder on orderlinesupdates ol2 > (cost=0.00..28.60 rows=17 width=8) > > EXPLAIN > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > >
Dump the records from the other dbase to a text file and then use the COPY command for Pg. I update tables nightly with 400K+ records and it only takes 1 -2 mins. You should drop and re-add your indexes and then do a vacuum analyze Patrick Hatcher Macys.Com Legacy Integration Developer 415-932-0610 office HatcherPT - AIM John Taylor <postgres@jtresponse.co To: pgsql-novice@postgresql.org .uk> cc: Sent by: Subject: [NOVICE] optimising data load pgsql-novice-owner@post gresql.org 05/22/2002 06:45 AM Hi all, I'm (still) working on an application to regularly populate my database with some data provided from a third party DB. I'm still having really bad performance problems. There are 300,000 records to be inserted, but I'm only getting 10,000/hour. 30 hours to populate a single table is just not on. There must be something seriously bad going on. I have to update to different databases: live, and an update log. I've included explains for both the db's below. I have a few questions: 1) Can it really be right that it is going this slow ? I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3 2) Can anyone see a way to optimise these queries further ? 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ? Thanks JohnT --- LIVE --- explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment) SELECT oh.theorder,'P',' 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY ol.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12) -> Limit (cost=47.41..47.41 rows=1 width=12) -> Sort (cost=47.41..47.41 rows=1 width=12) -> Nested Loop (cost=0.00..47.40 rows=1 width=12) -> Index Scan using orderheader_account on orderheader oh (cost=0.00..21.64 rows=1 width=4) -> Index Scan using orderlines_pkey on orderlines ol (cost=0.00..25.54 rows=17 width=8) EXPLAIN --- UPDATE LOG --- explain INSERT INTO orderlinesupdates (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin) SELECT oh.theorder,'P',' 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock=' 0310' LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36) -> Limit (cost=81.29..81.29 rows=1 width=36) -> Sort (cost=81.29..81.29 rows=1 width=36) -> Nested Loop (cost=0.00..81.28 rows=1 width=36) -> Nested Loop (cost=0.00..52.47 rows=1 width=28) -> Index Scan using orderheaderupdates_account on orderheaderupdates oh (cost=0.00..23.62 rows=1 width=8) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol (cost=0.00..28.60 rows=17 width=20) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2 (cost=0.00..28.60 rows=17 width=8) EXPLAIN ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Wednesday 22 May 2002 16:29, Patrick Hatcher wrote: > Dump the records from the other dbase to a text file and then use the COPY > command for Pg. I update tables nightly with 400K+ records and it only > takes 1 -2 mins. You should drop and re-add your indexes and then do a > vacuum analyze > I'm looking into that at the moment. I'm getting some very variable results. There are some tables that it is easy to do this for. However for some tables, I don't get data in the right format, so I need to perform some queries to get the right values to use when populating. In this situation I'm not sure if I should drop the indexes to make make the insert faster, or keep them to make the selects faster. Thanks JohnT
On Wed, 2002-05-22 at 08:45, John Taylor wrote: > > Hi all, > > I'm (still) working on an application to regularly populate my database with some data provided from > a third party DB. > > I'm still having really bad performance problems. > There are 300,000 records to be inserted, but I'm only getting 10,000/hour. > 30 hours to populate a single table is just not on. There must be something seriously bad going on. From what I see of your INSERT statement, you are joining orderheader to orderlines and then inserting those records into orderlines. Is that correct? I think I'd make a view from the SELECT portion. Then, nightly: 1. COPY out the view. 2. Drop the index(es) on orderlines 3. COPY into orderlines 4. recreate indexes on orderlines. Do the same for orderlineupdates. The thing is is that modifying indexes is a _very_ expensive operation, no matter what RDBMS or hardware configuration you have... > I have to update to different databases: live, and an update log. > > I've included explains for both the db's below. > > I have a few questions: > > 1) Can it really be right that it is going this slow ? > I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3 > > 2) Can anyone see a way to optimise these queries further ? > > 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ? > > Thanks > JohnT > > --- LIVE --- > explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment) > SELECT oh.theorder,'P',' 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh > LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder > WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' > ORDER BY ol.line DESC LIMIT 1; > NOTICE: QUERY PLAN: > > Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12) > -> Limit (cost=47.41..47.41 rows=1 width=12) > -> Sort (cost=47.41..47.41 rows=1 width=12) > -> Nested Loop (cost=0.00..47.40 rows=1 width=12) > -> Index Scan using orderheader_account on orderheader oh (cost=0.00..21.64 rows=1 width=4) > -> Index Scan using orderlines_pkey on orderlines ol (cost=0.00..25.54 rows=17 width=8) > > EXPLAIN > > --- UPDATE LOG --- > explain INSERT INTO orderlinesupdates (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin) > SELECT oh.theorder,'P',' 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh > LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock=' 0310' > LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder > WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' > ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1; > NOTICE: QUERY PLAN: > > Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36) > -> Limit (cost=81.29..81.29 rows=1 width=36) > -> Sort (cost=81.29..81.29 rows=1 width=36) > -> Nested Loop (cost=0.00..81.28 rows=1 width=36) > -> Nested Loop (cost=0.00..52.47 rows=1 width=28) > -> Index Scan using orderheaderupdates_account on orderheaderupdates oh (cost=0.00..23.62 rows=1width=8) > -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol > (cost=0.00..28.60 rows=17 width=20) > -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2 (cost=0.00..28.60 rows=17width=8) > > EXPLAIN -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
On Wednesday 22 May 2002 18:40, Ron Johnson wrote: > On Wed, 2002-05-22 at 08:45, John Taylor wrote: > > > > Hi all, > > > > I'm (still) working on an application to regularly populate my database with some data provided from > > a third party DB. > > > > I'm still having really bad performance problems. > > There are 300,000 records to be inserted, but I'm only getting 10,000/hour. > > 30 hours to populate a single table is just not on. There must be something seriously bad going on. > > From what I see of your INSERT statement, you are joining orderheader > to orderlines and then inserting those records into orderlines. > Is that correct? Yes, unfortunately I get incomplete data, so I need to select some details before inserting. > > I think I'd make a view from the SELECT portion. Then, nightly: > 1. COPY out the view. > 2. Drop the index(es) on orderlines > 3. COPY into orderlines > 4. recreate indexes on orderlines. So, how would I merge the data from the view with the data to copy in ? I was thinking of generating the copy file by performing a select for each row, but a selecting each row into a file, and then copy in might be more expensive than just inserting it like I do now. > > Do the same for orderlineupdates. The thing is is that > modifying indexes is a _very_ expensive operation, no matter > what RDBMS or hardware configuration you have... > Yes, I was thinking of that for my next step. Once I figure how to get copy working, I'll also try dropping indexes. Thanks JohnT
On Wed, 2002-05-22 at 16:05, John Taylor wrote: > On Wednesday 22 May 2002 18:40, Ron Johnson wrote: > > On Wed, 2002-05-22 at 08:45, John Taylor wrote: > > > > > > Hi all, > > > > > > I'm (still) working on an application to regularly populate my database with some data provided from > > > a third party DB. > > > > > > I'm still having really bad performance problems. > > > There are 300,000 records to be inserted, but I'm only getting 10,000/hour. > > > 30 hours to populate a single table is just not on. There must be something seriously bad going on. > > > > From what I see of your INSERT statement, you are joining orderheader > > to orderlines and then inserting those records into orderlines. > > Is that correct? > > Yes, unfortunately I get incomplete data, so I need to select some details before inserting. > > > > > I think I'd make a view from the SELECT portion. Then, nightly: > > 1. COPY out the view. > > 2. Drop the index(es) on orderlines > > 3. COPY into orderlines > > 4. recreate indexes on orderlines. > > So, how would I merge the data from the view with the data to copy in ? ??? Unless I am misunderstanding you: 0. create view v_new_orderlines (theorder, type, stock, line, ordercurrent, sellingquant, price, discount, vatrate, comment) as SELECT oh.theorder, 'P', '0310', coalesce(ol.line+1,1), 5, 0, .52, 0, 0, '' FROM orderheader oh LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY ol.line ; 1. COPY v_new_orderlines to '/tmp/new_orderlines.tsv'; 2. DROP each index from orderlines 3. COPY orderlines from '/tmp/new_orderlines.tsv'; 4. CREATE each index on orderlines > I was thinking of generating the copy file by performing a select for each row, > but a selecting each row into a file, and then copy in might be more expensive than > just inserting it like I do now. > > > > > Do the same for orderlineupdates. The thing is is that > > modifying indexes is a _very_ expensive operation, no matter > > what RDBMS or hardware configuration you have... > > > > Yes, I was thinking of that for my next step. > Once I figure how to get copy working, I'll also try dropping indexes. -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
On Thursday 23 May 2002 03:40, Ron Johnson wrote: > On Wed, 2002-05-22 at 16:05, John Taylor wrote: > > > So, how would I merge the data from the view with the data to copy in ? > > ??? Unless I am misunderstanding you: I think you are misunderstanding me. I have 300,000 rows to insert. They each require a different query to obtain the correct data to insert. > > 0. create view v_new_orderlines (theorder, type, stock, line, > ordercurrent, sellingquant, price, discount, vatrate, comment) > as SELECT oh.theorder, 'P', '0310', coalesce(ol.line+1,1), 5, 0, > .52, 0, 0, '' > FROM orderheader oh > LEFT OUTER JOIN orderlines ol > ON oh.theorder = ol.theorder > WHERE oh.account=' MILN1' > AND oh.delivery=1 > AND oh.thedate='2002-06-01' > AND oh.ordertype='O' > ORDER BY ol.line ; > 1. COPY v_new_orderlines to '/tmp/new_orderlines.tsv'; > 2. DROP each index from orderlines > 3. COPY orderlines from '/tmp/new_orderlines.tsv'; > 4. CREATE each index on orderlines > JohnT
On Thu, 2002-05-23 at 02:01, John Taylor wrote: > On Thursday 23 May 2002 03:40, Ron Johnson wrote: > > On Wed, 2002-05-22 at 16:05, John Taylor wrote: > > > > > So, how would I merge the data from the view with the data to copy in ? > > > > ??? Unless I am misunderstanding you: > > I think you are misunderstanding me. > I have 300,000 rows to insert. > They each require a different query to obtain the correct data to insert. Oh, ok. How about creating an intermediary, indexless table that the 300,000 records will be inserted into. Then, after the intermediary table is populated, drop the index(es) on orderlines and INSERT INTO orderlines SELECT * FROM intermediary; > > > > 0. create view v_new_orderlines (theorder, type, stock, line, > > ordercurrent, sellingquant, price, discount, vatrate, comment) > > as SELECT oh.theorder, 'P', '0310', coalesce(ol.line+1,1), 5, 0, > > .52, 0, 0, '' > > FROM orderheader oh > > LEFT OUTER JOIN orderlines ol > > ON oh.theorder = ol.theorder > > WHERE oh.account=' MILN1' > > AND oh.delivery=1 > > AND oh.thedate='2002-06-01' > > AND oh.ordertype='O' > > ORDER BY ol.line ; > > 1. COPY v_new_orderlines to '/tmp/new_orderlines.tsv'; > > 2. DROP each index from orderlines > > 3. COPY orderlines from '/tmp/new_orderlines.tsv'; > > 4. CREATE each index on orderlines > > > > JohnT -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
On Thursday 23 May 2002 08:22, you wrote: > On Thu, 2002-05-23 at 02:01, John Taylor wrote: > > On Thursday 23 May 2002 03:40, Ron Johnson wrote: > > > On Wed, 2002-05-22 at 16:05, John Taylor wrote: > > > > > > > So, how would I merge the data from the view with the data to copy in ? > > > > > > ??? Unless I am misunderstanding you: > > > > I think you are misunderstanding me. > > I have 300,000 rows to insert. > > They each require a different query to obtain the correct data to insert. > > Oh, ok. How about creating an intermediary, indexless table > that the 300,000 records will be inserted into. Then, after > the intermediary table is populated, drop the index(es) on > orderlines and INSERT INTO orderlines SELECT * FROM intermediary; > That sounds a good idea. Would it be quicker to copy out from the intermediary, and then copy in to the real one ? I guess I'll have to experiment. I have a feeling a lot of the problems may be due to the indexes. I've been experimenting with an easier table. copy in to the live takes 5 minutes for 850000 records copy in to the updates takes 2 hours ! It is the same data with 3 extra columns, but there are an additional 2 indexes. I think it must be the indexes making the diffrence. I'm just checking now. Thanks for the ideas ! Regards JohnT
On Thu, 2002-05-23 at 21:10, John Taylor wrote: > > > > Oh, ok. How about creating an intermediary, indexless table > > that the 300,000 records will be inserted into. Then, after > > the intermediary table is populated, drop the index(es) on > > orderlines and INSERT INTO orderlines SELECT * FROM intermediary; > > > > That sounds a good idea. > Would it be quicker to copy out from the intermediary, and then copy in to the real one ? > I guess I'll have to experiment. It will be quickest to: "INSERT INTO orderlinesupdate SELECT * FROM intermediary", like Ron said No COPY involved, except to get the records into the intermediary table initially. That syntax on the INSERT statement is _really_ useful for this sort of thing. > I have a feeling a lot of the problems may be due to the indexes. > I've been experimenting with an easier table. > copy in to the live takes 5 minutes for 850000 records > copy in to the updates takes 2 hours ! > It is the same data with 3 extra columns, but there are an additional 2 indexes. > I think it must be the indexes making the diffrence. I'm just checking now. Also consider that if the numbers of records inthe table is changing a lot, then it is well worth while doing a Vacuum analyze after time, so the planner statistics get updated and the system produces queries appropriate to a 400k record table (e.g.) rather than a 100k record table. Regards, Andrew. -- -------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Are you enrolled at http://schoolreunions.co.nz/ yet?
On Wednesday 22 May 2002 18:40, Ron Johnson wrote: > On Wed, 2002-05-22 at 08:45, John Taylor wrote: > > > > Hi all, > > > > I'm (still) working on an application to regularly populate my database with some data provided from > > a third party DB. > > > > I'm still having really bad performance problems. > > There are 300,000 records to be inserted, but I'm only getting 10,000/hour. > > 30 hours to populate a single table is just not on. There must be something seriously bad going on. > > From what I see of your INSERT statement, you are joining orderheader > to orderlines and then inserting those records into orderlines. > Is that correct? Yes, unfortunately I get incomplete data, so I need to select some details before inserting. > > I think I'd make a view from the SELECT portion. Then, nightly: > 1. COPY out the view. > 2. Drop the index(es) on orderlines > 3. COPY into orderlines > 4. recreate indexes on orderlines. So, how would I merge the data from the view with the data to copy in ? I was thinking of generating the copy file by performing a select for each row, but a selecting each row into a file, and then copy in might be more expensive than just inserting it like I do now. > > Do the same for orderlineupdates. The thing is is that > modifying indexes is a _very_ expensive operation, no matter > what RDBMS or hardware configuration you have... > Yes, I was thinking of that for my next step. Once I figure how to get copy working, I'll also try dropping indexes. Thanks JohnT -- John Taylor J T Response Limited Director Internet Solutions for Business +44 (0)1954 261146
Hi, Can someone give me examples for the correct syntax for FULL OUTER JOIN, where each table has some additional conditions. The docs don't give any complete examples :-( I can only get it to work using subselects: SELECT stock,o.ordercurrent,s.quantity FROM (SELECT stock,ordercurrent FROM orderlines o WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T')) AS o FULL OUTER JOIN (SELECT stock,quantity FROM standingorders s WHERE s.account=' 15048' AND s.dayno=2) AS s USING (stock) Thanks JohnT
> -----Original Message----- > From: pgsql-novice-owner@postgresql.org > [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of John Taylor > Sent: Monday, May 27, 2002 6:15 AM > To: PgSQL Novice ML > Subject: [NOVICE] Full Outer Joins > > Can someone give me examples for the correct syntax for FULL > OUTER JOIN, where each table > has some additional conditions. The docs don't give any complete > examples :-( > > I can only get it to work using subselects: > > SELECT stock,o.ordercurrent,s.quantity FROM > (SELECT stock,ordercurrent FROM orderlines o WHERE o.theorder=' > 4494' AND (o.type='P' OR o.type='T')) AS o > FULL OUTER JOIN > (SELECT stock,quantity FROM standingorders s WHERE s.account=' > 15048' AND s.dayno=2) AS s > USING (stock) create table Customers (custid int primary key, custname text unique, zip char(5) not null); create table Orders (orderid int primary key, custid int not null references Customers, part text not null, orderamt int not null); insert into Customers values (1,'Joel','20009'); insert into Customers values (2,'Johnny Cheapskate','20009'); insert into Customers values (3,'Santa Claus','10005'); insert into Orders values (1,1,'Pink Widget',100); insert into Orders values (2,1,'Pink Widget',200); insert into Orders values (3,3,'Pink Widget',100); Select customer id, name, # of orders, and total $ of orders, only for those customers in zip 20009 and only for Pink Widgets. However, be sure to show all customers in that zip code; for those that never ordered a Pink Widget, show zero). SELECT c.custid, c.custname, count(o.orderid), sum(o.orderamt) FROM Customers AS c LEFT OUTER JOIN Orders AS o ON (o.custid=c.custid AND o.part='Pink Widget') WHERE c.zip = '20009' GROUP BY c.custid, c.custname; custid | custname | count | sum --------+-------------------+-------+----- 1 | Joel | 2 | 300 2 | Johnny Cheapskate | 0 | (2 rows) If this doesn't give you the idea, you'll have to be more specific about where you're stuck. - J. Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton Knowledge Management & Technology Consultant
On Monday 27 May 2002 15:33, Joel Burton wrote: > > -----Original Message----- > > From: pgsql-novice-owner@postgresql.org > > [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of John Taylor > > Sent: Monday, May 27, 2002 6:15 AM > > To: PgSQL Novice ML > > Subject: [NOVICE] Full Outer Joins > > > > Can someone give me examples for the correct syntax for FULL > > OUTER JOIN, where each table > > has some additional conditions. The docs don't give any complete > > examples :-( > > > > > SELECT c.custid, > c.custname, > count(o.orderid), > sum(o.orderamt) > FROM Customers AS c > LEFT OUTER JOIN Orders AS o > ON (o.custid=c.custid > AND o.part='Pink Widget') > WHERE c.zip = '20009' > GROUP BY c.custid, > c.custname; > > Postgres doesn't seem to like that form with full outer joins ... SELECT o.stock,o.ordercurrent,o.type,s.stock,s.quantity FROM orderlines o FULL OUTER JOIN standingorders s ON (s.stock=o.stock AND s.account=' 15048' and s.dayno=2 ) WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T') gives ... ERROR: FULL JOIN is only supported with mergejoinable join conditions If I join with USING(), and move all the conditionals to the end WHERE that only returns columns matching the first table. If I join with USING(), and use subselects to generate each side of the JOIN, then it all works fine. Regards JohnT
John Taylor <postgres@jtresponse.co.uk> writes: > Postgres doesn't seem to like that form with full outer joins ... > SELECT o.stock,o.ordercurrent,o.type,s.stock,s.quantity > FROM orderlines o FULL OUTER JOIN standingorders s ON (s.stock=o.stock AND s.account=' 15048' and s.dayno=2 ) > WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T') > ERROR: FULL JOIN is only supported with mergejoinable join conditions While that implementation limitation is annoying (it's partly fixed in development sources, FWIW), I really wonder why you'd want to do the above. With a FULL JOIN, you are going to get a lot of dummy rows out: every s row *not* satisfying account=' 15048' and dayno=2 will still generate a joined row (with nulls for the o columns) and also every o row that doesn't join to an s row with account=' 15048' and dayno=2 will generate a joined row (with nulls for the s columns). It seems unlikely that that's what you wanted. I have a hard time envisioning a use for FULL JOIN with join conditions that restrict only one of the tables; seems like the restrictions ought to be in WHERE, instead. regards, tom lane
On Monday 27 May 2002 19:10, Tom Lane wrote: > John Taylor <postgres@jtresponse.co.uk> writes: > > Postgres doesn't seem to like that form with full outer joins ... > > > SELECT o.stock,o.ordercurrent,o.type,s.stock,s.quantity > > FROM orderlines o FULL OUTER JOIN standingorders s ON (s.stock=o.stock AND s.account=' 15048' and s.dayno=2 ) > > WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T') > > > ERROR: FULL JOIN is only supported with mergejoinable join conditions > > While that implementation limitation is annoying (it's partly fixed in > development sources, FWIW), I really wonder why you'd want to do the > above. With a FULL JOIN, you are going to get a lot of dummy rows out: > every s row *not* satisfying account=' 15048' and dayno=2 will still > generate a joined row (with nulls for the o columns) and also every o > row that doesn't join to an s row with account=' 15048' and dayno=2 will > generate a joined row (with nulls for the s columns). It seems unlikely > that that's what you wanted. I have a hard time envisioning a use for > FULL JOIN with join conditions that restrict only one of the tables; > seems like the restrictions ought to be in WHERE, instead. Yes, I WANT to restrict both tables, but I can't figure out how ? Where do I put the conditionals for each table ? I have two tables of orders, temporary, and permanent. For each day there are a number of orders to be delivered. Each order may have any entry in the temporary AND/OR the permanent. I want to get all orders that are in either table. For each order, I need to know what table it is in (and if it is in both), and also join to the stockitems table to get thedescription. I can do it all like this: SELECT stock,stockitems.description,o.ordercurrent,s.quantity FROM (SELECT stock,ordercurrent FROM orderlines o WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T')) AS o FULL OUTER JOIN (SELECT stock,quantity FROM standingorders s WHERE s.account=' 15048' AND s.dayno=2) AS s USING (stock) JOIN stockitems USING (stock) How do I do it without the subselects ? Thanks JohnT
John Taylor <postgres@jtresponse.co.uk> writes: > I can do it all like this: > SELECT stock,stockitems.description,o.ordercurrent,s.quantity FROM > (SELECT stock,ordercurrent FROM orderlines o WHERE o.theorder=' 4494' AND (o.type='P' OR o.type='T')) AS o > FULL OUTER JOIN > (SELECT stock,quantity FROM standingorders s WHERE s.account=' 15048' AND s.dayno=2) AS s > USING (stock) > JOIN stockitems USING (stock) If that actually has the behavior you want, then that's the way you write it. Placing restrictions inside an outer join is not semantically the same as placing them outside (at the WHERE clause), so it wasn't clear to me exactly what you wanted. > How do I do it without the subselects ? Why worry? It looks like it should be an efficient solution, or at least as efficient as you'll get. regards, tom lane
On Monday 27 May 2002 20:02, Tom Lane wrote: > > If that actually has the behavior you want, then that's the way you > write it. Placing restrictions inside an outer join is not semantically > the same as placing them outside (at the WHERE clause), so it wasn't > clear to me exactly what you wanted. > Ah, that explains why I couldn't figure it out. I think I need to move on from my current Postgres Bible, it stops short of full outer joins :-( Thanks JohnT