Thread: Index Problem?

Index Problem?

From
Ron St-Pierre
Date:
I am using postgres 7.4.1 and have a problem with a plpgsql function.
When I run the function on the production server it takes approx 33
minutes to run. I dumped the DB and copied it to a similarly configured
box and ran the function and it ran in about 10 minutes. Can anyone
offer advice on tuning the function or my database? Here are the
lengthy, gory details.

F u n c t i o n
It updates seven columns of a table 1 to 4 times daily. Current data =
42,000 rows, new data = 30,000 rows.

    CREATE TYPE employeeType AS (empID INTEGER, updateDate DATE, bDate
INTEGER, val1 NUMERIC, val2 NUMERIC, val3 NUMERIC, val4 NUMERIC, favNum
NUMERIC);

    CREATE OR REPLACE FUNCTION updateEmployeeData() RETURNS SETOF
employeeType AS '
        DECLARE
            rec     RECORD;
        BEGIN
            FOR rec IN SELECT empID, updateDate, bDate, val1, val2 ,
val3, val4, favNum FROM newData LOOP
                RETURN NEXT rec;
                UPDATE currentData SET val1=rec.val1, val2=rec.val2,
val3=rec.val2, val4=rec.val4, favNum=rec.favNum, updateDate=rec.updateDate
                WHERE empID=rec.empID;
            END LOOP;
            RETURN;
        END;
    ' LANGUAGE 'plpgsql';

The emp table has 60 columns, all indexed, about two-thirds are numeric,
but they are not affected by this update. The other 50+ columns are
updated in the middle of the night and the amount of time that update
takes isn't a concern.

Late last night I dumped the table, dropped it and re-created it from
the dump (on the production server - when no one was looking). When I
re-ran the function it took almost 11 minutes, which was pretty much in
line with my results from the dev server.

D e t a i l s
v 7.4.1
Debian stable
1 GB ram
shared_buffers = 2048
sort_mem = 1024
SHMMAX 360000000 (360,000,000)
VACUUM FULL ANALYZE is run every night, and I ran it yesterday between
running the function and it made no difference in running time.
top shows the postmaster using minimal cpu (0-40%) and miniscule memory.
vmstat shows a fair amount of IO (bo=1000->4000).

Yesterday on the dev server we upgraded to the 2.6 kernel and
unfortunately only noticed a small increase in update time (about one
minute).
So does anyone have any suggestions for me on speeding this up? Is it
the index?  The function is run daily during the mid afternoon to early
evening and really drags the performance of the server down (it also
hosts a web site).

Thanks
Ron



Re: Index Problem?

From
Josh Berkus
Date:
Ron,

> The emp table has 60 columns, all indexed, about two-thirds are numeric,
> but they are not affected by this update. The other 50+ columns are
> updated in the middle of the night and the amount of time that update
> takes isn't a concern.

Well, I'd say that you have an application design problem, but that's not what
you asked for help with ;-)

> Late last night I dumped the table, dropped it and re-created it from
> the dump (on the production server - when no one was looking). When I
> re-ran the function it took almost 11 minutes, which was pretty much in
> line with my results from the dev server.

Sounds like you need to run a REINDEX on the table -- and after that,
dramatically increase your max_fsm_pages, and run lazy VACUUM immediately
after the batch update to clean up.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Index Problem?

From
Ron St-Pierre
Date:
Josh Berkus wrote:

>Ron,
>
>
>
>>The emp table has 60 columns, all indexed, about two-thirds are numeric,
>>but they are not affected by this update. The other 50+ columns are
>>updated in the middle of the night and the amount of time that update
>>takes isn't a concern.
>>
>>
>
>Well, I'd say that you have an application design problem, but that's not what
>you asked for help with ;-)
>
>
Yeah I agree but I'm not allowed to remove those indexes.

>
>
>>Late last night I dumped the table, dropped it and re-created it from
>>the dump (on the production server - when no one was looking). When I
>>re-ran the function it took almost 11 minutes, which was pretty much in
>>line with my results from the dev server.
>>
>>
>
>Sounds like you need to run a REINDEX on the table -- and after that,
>dramatically increase your max_fsm_pages, and run lazy VACUUM immediately
>after the batch update to clean up.
>
>
>
On my dev server I increased max_fsm_pages from the default of 20000 to
40000, increased checkpoint_segments from 3 to 5, and the function ran
in about 6-7 minutes which is a nice increase. According to the docs
"Annotated postgresql.conf and Global User Configuration (GUC) Guide" on
varlena I'll have to re-start postgres for the changes to take effect
there (correct?). Also the docs on Varlena show the max_fsm_pages
default to be 10,000 but my default was 20,000, looks like that needs
updating.

Thanks for your help Josh, I'll see after the weekend what the impact
the changes will have on the production server.

Ron


Re: Index Problem?

From
Josh Berkus
Date:
Ron,

> Yeah I agree but I'm not allowed to remove those indexes.

It's not the indexes I'm talking about, it's the table.

> On my dev server I increased max_fsm_pages from the default of 20000 to
> 40000,

A better way to set this would be to run VACUUM VERBOSE ANALYZE right after
doing one of your update batches, and see how many dead pages are being
reclaimed, and then set max_fsm_pages to that # + 50% (or more).

increased checkpoint_segments from 3 to 5, and the function ran
> in about 6-7 minutes which is a nice increase. According to the docs
> "Annotated postgresql.conf and Global User Configuration (GUC) Guide" on
> varlena I'll have to re-start postgres for the changes to take effect
> there (correct?).

Correct.

> Also the docs on Varlena show the max_fsm_pages
> default to be 10,000 but my default was 20,000, looks like that needs
> updating.

I don't think the default has been changed.   Anyone?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Index Problem?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> A better way to set this would be to run VACUUM VERBOSE ANALYZE right after
> doing one of your update batches, and see how many dead pages are being
> reclaimed, and then set max_fsm_pages to that # + 50% (or more).

Actually, since he's running 7.4, there's an even better way.  Do a
"VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you
ANALYZE or not).  At the end of the very voluminous output, you'll see
something like

INFO:  free space map: 240 relations, 490 pages stored; 4080 total pages needed
DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory.

Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to
exactly cover the present freespace needs of my system.  I concur with
the suggestion to bump that up a good deal, of course, but that gives
you a real number to start from.

The DETAIL part of the message shows my current settings (which are the
defaults) and what the FSM is costing me in shared memory space.

If you have multiple active databases, the best approach to getting
these numbers is to VACUUM in each one, adding VERBOSE when you do the
last one.  The FSM report is cluster-wide but you want to be sure the
underlying info is up to date for all databases.

>> Also the docs on Varlena show the max_fsm_pages
>> default to be 10,000 but my default was 20,000, looks like that needs
>> updating.

> I don't think the default has been changed.   Anyone?

Yes, I kicked it up for 7.4 because FSM covers indexes too now.
Both the 7.3 and 7.4 defaults are pretty arbitrary of course...

            regards, tom lane

Re: Index Problem?

From
Ron St-Pierre
Date:
Tom Lane wrote:

>Josh Berkus <josh@agliodbs.com> writes:
>
>
>>A better way to set this would be to run VACUUM VERBOSE ANALYZE right after
>>doing one of your update batches, and see how many dead pages are being
>>reclaimed, and then set max_fsm_pages to that # + 50% (or more).
>>
>>
>
>Actually, since he's running 7.4, there's an even better way.  Do a
>"VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you
>ANALYZE or not).  At the end of the very voluminous output, you'll see
>something like
>
>INFO:  free space map: 240 relations, 490 pages stored; 4080 total pages needed
>DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory.
>
>Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to
>exactly cover the present freespace needs of my system.  I concur with
>the suggestion to bump that up a good deal, of course, but that gives
>you a real number to start from.
>
>The DETAIL part of the message shows my current settings (which are the
>defaults) and what the FSM is costing me in shared memory space.
>
>
>
Okay, after running the function VACUUM VERBOSE is telling me:
 INFO:  free space map: 136 relations, 25014 pages stored; 22608 total
pages needed
 DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB
shared memory.

My max_fsm_pages was set to 20,000 and I reset it to 40,000 on the dev
server and the function ran about 20-30% faster, so I'll try the same on
the production server. Thanks for the analysis of the VACUUM info.

Ron


Re: Index Problem?

From
Jochem van Dieten
Date:
Ron St-Pierre wrote:
> I am using postgres 7.4.1 and have a problem with a plpgsql function.
> When I run the function on the production server it takes approx 33
> minutes to run. I dumped the DB and copied it to a similarly configured
> box and ran the function and it ran in about 10 minutes. Can anyone
> offer advice on tuning the function or my database? Here are the
> lengthy, gory details.
>
> F u n c t i o n
> It updates seven columns of a table 1 to 4 times daily. Current data =
> 42,000 rows, new data = 30,000 rows.
>
>    CREATE TYPE employeeType AS (empID INTEGER, updateDate DATE, bDate
> INTEGER, val1 NUMERIC, val2 NUMERIC, val3 NUMERIC, val4 NUMERIC, favNum
> NUMERIC);
>
>    CREATE OR REPLACE FUNCTION updateEmployeeData() RETURNS SETOF
> employeeType AS '
>        DECLARE
>            rec     RECORD;
>        BEGIN
>            FOR rec IN SELECT empID, updateDate, bDate, val1, val2, val3, val4, favNum FROM newData LOOP
>                RETURN NEXT rec;
>                UPDATE currentData SET val1=rec.val1, val2=rec.val2, val3=rec.val2, val4=rec.val4, favNum=rec.favNum,
updateDate=rec.updateDate
>                WHERE empID=rec.empID;
>            END LOOP;
>            RETURN;
>        END;
>    ' LANGUAGE 'plpgsql';

Can't you handle this with a simple update query?

UPDATE
    currentData
SET
    val1 = newData.val1,
    val2 = newData.val2,
    val3 = newData.val3,
    val4 = newData.val4,
    favNum = newData.favNum,
    updateDate = newData.updateDate
FROM
    newData
WHERE
    newDate.empID = currentData.empID

Jochem

--
I don't get it
immigrants don't work
and steal our jobs
     - Loesje