Thread: Index Problem?
I am using postgres 7.4.1 and have a problem with a plpgsql function. When I run the function on the production server it takes approx 33 minutes to run. I dumped the DB and copied it to a similarly configured box and ran the function and it ran in about 10 minutes. Can anyone offer advice on tuning the function or my database? Here are the lengthy, gory details. F u n c t i o n It updates seven columns of a table 1 to 4 times daily. Current data = 42,000 rows, new data = 30,000 rows. CREATE TYPE employeeType AS (empID INTEGER, updateDate DATE, bDate INTEGER, val1 NUMERIC, val2 NUMERIC, val3 NUMERIC, val4 NUMERIC, favNum NUMERIC); CREATE OR REPLACE FUNCTION updateEmployeeData() RETURNS SETOF employeeType AS ' DECLARE rec RECORD; BEGIN FOR rec IN SELECT empID, updateDate, bDate, val1, val2 , val3, val4, favNum FROM newData LOOP RETURN NEXT rec; UPDATE currentData SET val1=rec.val1, val2=rec.val2, val3=rec.val2, val4=rec.val4, favNum=rec.favNum, updateDate=rec.updateDate WHERE empID=rec.empID; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql'; The emp table has 60 columns, all indexed, about two-thirds are numeric, but they are not affected by this update. The other 50+ columns are updated in the middle of the night and the amount of time that update takes isn't a concern. Late last night I dumped the table, dropped it and re-created it from the dump (on the production server - when no one was looking). When I re-ran the function it took almost 11 minutes, which was pretty much in line with my results from the dev server. D e t a i l s v 7.4.1 Debian stable 1 GB ram shared_buffers = 2048 sort_mem = 1024 SHMMAX 360000000 (360,000,000) VACUUM FULL ANALYZE is run every night, and I ran it yesterday between running the function and it made no difference in running time. top shows the postmaster using minimal cpu (0-40%) and miniscule memory. vmstat shows a fair amount of IO (bo=1000->4000). Yesterday on the dev server we upgraded to the 2.6 kernel and unfortunately only noticed a small increase in update time (about one minute). So does anyone have any suggestions for me on speeding this up? Is it the index? The function is run daily during the mid afternoon to early evening and really drags the performance of the server down (it also hosts a web site). Thanks Ron
Ron, > The emp table has 60 columns, all indexed, about two-thirds are numeric, > but they are not affected by this update. The other 50+ columns are > updated in the middle of the night and the amount of time that update > takes isn't a concern. Well, I'd say that you have an application design problem, but that's not what you asked for help with ;-) > Late last night I dumped the table, dropped it and re-created it from > the dump (on the production server - when no one was looking). When I > re-ran the function it took almost 11 minutes, which was pretty much in > line with my results from the dev server. Sounds like you need to run a REINDEX on the table -- and after that, dramatically increase your max_fsm_pages, and run lazy VACUUM immediately after the batch update to clean up. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: >Ron, > > > >>The emp table has 60 columns, all indexed, about two-thirds are numeric, >>but they are not affected by this update. The other 50+ columns are >>updated in the middle of the night and the amount of time that update >>takes isn't a concern. >> >> > >Well, I'd say that you have an application design problem, but that's not what >you asked for help with ;-) > > Yeah I agree but I'm not allowed to remove those indexes. > > >>Late last night I dumped the table, dropped it and re-created it from >>the dump (on the production server - when no one was looking). When I >>re-ran the function it took almost 11 minutes, which was pretty much in >>line with my results from the dev server. >> >> > >Sounds like you need to run a REINDEX on the table -- and after that, >dramatically increase your max_fsm_pages, and run lazy VACUUM immediately >after the batch update to clean up. > > > On my dev server I increased max_fsm_pages from the default of 20000 to 40000, increased checkpoint_segments from 3 to 5, and the function ran in about 6-7 minutes which is a nice increase. According to the docs "Annotated postgresql.conf and Global User Configuration (GUC) Guide" on varlena I'll have to re-start postgres for the changes to take effect there (correct?). Also the docs on Varlena show the max_fsm_pages default to be 10,000 but my default was 20,000, looks like that needs updating. Thanks for your help Josh, I'll see after the weekend what the impact the changes will have on the production server. Ron
Ron, > Yeah I agree but I'm not allowed to remove those indexes. It's not the indexes I'm talking about, it's the table. > On my dev server I increased max_fsm_pages from the default of 20000 to > 40000, A better way to set this would be to run VACUUM VERBOSE ANALYZE right after doing one of your update batches, and see how many dead pages are being reclaimed, and then set max_fsm_pages to that # + 50% (or more). increased checkpoint_segments from 3 to 5, and the function ran > in about 6-7 minutes which is a nice increase. According to the docs > "Annotated postgresql.conf and Global User Configuration (GUC) Guide" on > varlena I'll have to re-start postgres for the changes to take effect > there (correct?). Correct. > Also the docs on Varlena show the max_fsm_pages > default to be 10,000 but my default was 20,000, looks like that needs > updating. I don't think the default has been changed. Anyone? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > A better way to set this would be to run VACUUM VERBOSE ANALYZE right after > doing one of your update batches, and see how many dead pages are being > reclaimed, and then set max_fsm_pages to that # + 50% (or more). Actually, since he's running 7.4, there's an even better way. Do a "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you ANALYZE or not). At the end of the very voluminous output, you'll see something like INFO: free space map: 240 relations, 490 pages stored; 4080 total pages needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to exactly cover the present freespace needs of my system. I concur with the suggestion to bump that up a good deal, of course, but that gives you a real number to start from. The DETAIL part of the message shows my current settings (which are the defaults) and what the FSM is costing me in shared memory space. If you have multiple active databases, the best approach to getting these numbers is to VACUUM in each one, adding VERBOSE when you do the last one. The FSM report is cluster-wide but you want to be sure the underlying info is up to date for all databases. >> Also the docs on Varlena show the max_fsm_pages >> default to be 10,000 but my default was 20,000, looks like that needs >> updating. > I don't think the default has been changed. Anyone? Yes, I kicked it up for 7.4 because FSM covers indexes too now. Both the 7.3 and 7.4 defaults are pretty arbitrary of course... regards, tom lane
Tom Lane wrote: >Josh Berkus <josh@agliodbs.com> writes: > > >>A better way to set this would be to run VACUUM VERBOSE ANALYZE right after >>doing one of your update batches, and see how many dead pages are being >>reclaimed, and then set max_fsm_pages to that # + 50% (or more). >> >> > >Actually, since he's running 7.4, there's an even better way. Do a >"VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you >ANALYZE or not). At the end of the very voluminous output, you'll see >something like > >INFO: free space map: 240 relations, 490 pages stored; 4080 total pages needed >DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. > >Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to >exactly cover the present freespace needs of my system. I concur with >the suggestion to bump that up a good deal, of course, but that gives >you a real number to start from. > >The DETAIL part of the message shows my current settings (which are the >defaults) and what the FSM is costing me in shared memory space. > > > Okay, after running the function VACUUM VERBOSE is telling me: INFO: free space map: 136 relations, 25014 pages stored; 22608 total pages needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. My max_fsm_pages was set to 20,000 and I reset it to 40,000 on the dev server and the function ran about 20-30% faster, so I'll try the same on the production server. Thanks for the analysis of the VACUUM info. Ron
Ron St-Pierre wrote: > I am using postgres 7.4.1 and have a problem with a plpgsql function. > When I run the function on the production server it takes approx 33 > minutes to run. I dumped the DB and copied it to a similarly configured > box and ran the function and it ran in about 10 minutes. Can anyone > offer advice on tuning the function or my database? Here are the > lengthy, gory details. > > F u n c t i o n > It updates seven columns of a table 1 to 4 times daily. Current data = > 42,000 rows, new data = 30,000 rows. > > CREATE TYPE employeeType AS (empID INTEGER, updateDate DATE, bDate > INTEGER, val1 NUMERIC, val2 NUMERIC, val3 NUMERIC, val4 NUMERIC, favNum > NUMERIC); > > CREATE OR REPLACE FUNCTION updateEmployeeData() RETURNS SETOF > employeeType AS ' > DECLARE > rec RECORD; > BEGIN > FOR rec IN SELECT empID, updateDate, bDate, val1, val2, val3, val4, favNum FROM newData LOOP > RETURN NEXT rec; > UPDATE currentData SET val1=rec.val1, val2=rec.val2, val3=rec.val2, val4=rec.val4, favNum=rec.favNum, updateDate=rec.updateDate > WHERE empID=rec.empID; > END LOOP; > RETURN; > END; > ' LANGUAGE 'plpgsql'; Can't you handle this with a simple update query? UPDATE currentData SET val1 = newData.val1, val2 = newData.val2, val3 = newData.val3, val4 = newData.val4, favNum = newData.favNum, updateDate = newData.updateDate FROM newData WHERE newDate.empID = currentData.empID Jochem -- I don't get it immigrants don't work and steal our jobs - Loesje