Bug #769: Slow vacuuming due to error in optimization - Mailing list pgsql-bugs

From pgsql-bugs@postgresql.org
Subject Bug #769: Slow vacuuming due to error in optimization
Date
Msg-id 20020916180112.9CA22476567@postgresql.org
Whole thread Raw
Responses Re: Bug #769: Slow vacuuming due to error in optimization
List pgsql-bugs
Steve Marshall (smarshall@wsi.com) reports a bug with a severity of 3
The lower the number the more severe it is.

Short Description
Slow vacuuming due to error in optimization

Long Description
I have found very long vacuuming times when vacuuming large tables in Postgres 7.2.1, running on i686 hardware under
RedhatLinux 7.3. 

Long vacuum times should have little effect on applications, as the new VACUUM implementation does not exclusively lock
thetable for any great length of time. However, I found that near the end of the vacuum operation, the vacuuming
postgresbackend began using 100% of one of the system CPUs, and all insert operations on the table being vacuumed
stopped. This problem occurred after all the CPU times were reported for the VACUUM, but before the ANALYZE step began. 

To identify the source  of the problem, I inserted some additional log statements into the source file
backend/commands/vacuumlazy.c,and was able to track down the problem to the function that updates the free space map
(i.e.the function lazy_update_fsm). This in turn calls the function MultiRecordFreeSpace() in
storage/freespace/freespace.c.

Looking at the code in MultiRecordFreeSpace(), I found that this function imposes an exclusive lock.  This locking
explainswhy my insert operations stopped.   Looking further, I found a place where the comment and conditional logic
didnot seem to say the same thing, and hence looked suspicious.  Here is the code snippet: 

------
/*
 * Add new entries, if appropriate.
 *
 * XXX we could probably be smarter about this than doing it
 * completely separately for each one.  FIXME later.
 *
 * One thing we can do is short-circuit the process entirely if a
 * page (a) has too little free space to be recorded, and (b) is
 * within the minPage..maxPage range --- then we deleted any old
 * entry above, and we aren't going to make a new one. This is
 * particularly useful since in most cases, all the passed pages
 * will in fact be in the minPage..maxPage range.
 */
for (i = 0; i < nPages; i++)
{
    BlockNumber page = pages[i];
    Size        avail = spaceAvail[i];
    if (avail >= fsmrel->threshold ||
        page < minPage || page > maxPage)
        fsm_record_free_space(fsmrel, page, avail);
}

-------
The comment indicates that free space is recorded for a page if the available space is above the threshold AND the page
isnot within the min-max range that was handled in logic before this snippet.  However, the code records free space if
EITHERof these criteria are true. 

Therefore I tried changing the logic to an AND, e.g.:

   if (avail >= fsmrel->threshold &&
       (page < minPage || page > maxPage))
       fsm_record_free_space(fsmrel, page, avail);

This reduced my processing time in lazy_update_fsm() from about 2 minutes to nearly nothing, effectively solving my
performanceproblem. 
----
I'm a newbie to the Postgres source code, so I don't know if this is the proper place to submit this information,  If
I'vesubmitted incorrectly, please let me know, so I can do it right next time. 

I'd also be interested in knowing if this change has some hidden or long term effect I just don't see.

Sample Code


No file was uploaded with this report

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: make unsuccessful on mac os x 10.2
Next
From: Tom Lane
Date:
Subject: Re: Bug #769: Slow vacuuming due to error in optimization