Thread: vacuum process size

vacuum process size

From
Tatsuo Ishii
Date:
Just for a testing I made a huge table (>2GB and it has 10000000
tuples).  copy 10000000 tuples took 23 minutes. This is not so
bad. Vacuum analyze took 11 minutes, not too bad. After this I created
an index on int4 column. It took 9 minutes. Next I deleted 5000000
tuples to see how long delete took. I found it was 6
minutes. Good. Then I ran into a problem. After that I did vacuum
analyze, and seemed it took forever!  (actually took 47 minutes). The
biggest problem was postgres's process size. It was 478MB! This is not
acceptable for me.  Any idea?

This is PostgreSQL 6.5.1 running on RH 6.0.
--
Tatsuo Ishii


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Just for a testing I made a huge table (>2GB and it has 10000000
> tuples).  copy 10000000 tuples took 23 minutes. This is not so
> bad. Vacuum analyze took 11 minutes, not too bad. After this I created
> an index on int4 column. It took 9 minutes. Next I deleted 5000000
> tuples to see how long delete took. I found it was 6
> minutes. Good. Then I ran into a problem. After that I did vacuum
> analyze, and seemed it took forever!  (actually took 47 minutes). The
> biggest problem was postgres's process size. It was 478MB! This is not
> acceptable for me.  Any idea?

Yeah, I've complained about that before --- it seems that vacuum takes
a really unreasonable amount of time to remove dead tuples from an index.
It's been like that at least since 6.3.2, probably longer.
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Tatsuo Ishii
Date:
>Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Just for a testing I made a huge table (>2GB and it has 10000000
>> tuples).  copy 10000000 tuples took 23 minutes. This is not so
>> bad. Vacuum analyze took 11 minutes, not too bad. After this I created
>> an index on int4 column. It took 9 minutes. Next I deleted 5000000
>> tuples to see how long delete took. I found it was 6
>> minutes. Good. Then I ran into a problem. After that I did vacuum
>> analyze, and seemed it took forever!  (actually took 47 minutes). The
>> biggest problem was postgres's process size. It was 478MB! This is not
>> acceptable for me.  Any idea?
>
>Yeah, I've complained about that before --- it seems that vacuum takes
>a really unreasonable amount of time to remove dead tuples from an index.
>It's been like that at least since 6.3.2, probably longer.

Hiroshi came up with a work around for this(see included
patches). After applying it, the process size shrinked from 478MB to
86MB! (the processing time did not descrease, however). According to
him, repalloc seems not very effective with large number of calls. The
patches probably descreases the number to 1/10.
--
Tatsuo Ishii

-------------------------------------------------------------------------
*** vacuum.c.orig    Sat Jul  3 09:32:40 1999
--- vacuum.c    Thu Aug 19 17:34:18 1999
***************
*** 2519,2530 **** static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) {      /* allocate a VPageDescr entry if
needed*/     if (vpl->vpl_num_pages == 0)
 
!         vpl->vpl_pagedesc = (VPageDescr *) palloc(100 * sizeof(VPageDescr));
!     else if (vpl->vpl_num_pages % 100 == 0)
!         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) *
sizeof(VPageDescr));    vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;     (vpl->vpl_num_pages)++; 
 
--- 2519,2531 ---- static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) {
+ #define PG_NPAGEDESC 1000      /* allocate a VPageDescr entry if needed */     if (vpl->vpl_num_pages == 0)
!         vpl->vpl_pagedesc = (VPageDescr *) palloc(PG_NPAGEDESC * sizeof(VPageDescr));
!     else if (vpl->vpl_num_pages % PG_NPAGEDESC == 0)
!         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + PG_NPAGEDESC) *
sizeof(VPageDescr));    vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;     (vpl->vpl_num_pages)++; 
 


RE: [HACKERS] vacuum process size

From
"Hiroshi Inoue"
Date:
Hi all,

I found the following comment in utils/mmgr/aset.c.
The high memory usage of big vacuum is probably caused by this
change.
Calling repalloc() many times with its size parameter increasing
would need large amount of memory.

Should vacuum call realloc() directly ?
Or should AllocSet..() be changed ?

Comments ?
* NOTE:*      This is a new (Feb. 05, 1999) implementation of the allocation set*      routines. AllocSet...() does not
useOrderedSet...() any more.*      Instead it manages allocations in a block pool by itself, combining*      many small
allocationsin a few bigger blocks. AllocSetFree() does*      never free() memory really. It just add's the free'd area
tosome       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*      list for later reuse by AllocSetAlloc(). All memory blocks are
 
free()'d

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

>
> >Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> Just for a testing I made a huge table (>2GB and it has 10000000
> >> tuples).  copy 10000000 tuples took 23 minutes. This is not so
> >> bad. Vacuum analyze took 11 minutes, not too bad. After this I created
> >> an index on int4 column. It took 9 minutes. Next I deleted 5000000
> >> tuples to see how long delete took. I found it was 6
> >> minutes. Good. Then I ran into a problem. After that I did vacuum
> >> analyze, and seemed it took forever!  (actually took 47 minutes). The
> >> biggest problem was postgres's process size. It was 478MB! This is not
> >> acceptable for me.  Any idea?
> >
> >Yeah, I've complained about that before --- it seems that vacuum takes
> >a really unreasonable amount of time to remove dead tuples from an index.
> >It's been like that at least since 6.3.2, probably longer.
>
> Hiroshi came up with a work around for this(see included
> patches). After applying it, the process size shrinked from 478MB to
> 86MB! (the processing time did not descrease, however). According to
> him, repalloc seems not very effective with large number of calls. The
> patches probably descreases the number to 1/10.
> --
> Tatsuo Ishii
>
> -------------------------------------------------------------------------
> *** vacuum.c.orig    Sat Jul  3 09:32:40 1999
> --- vacuum.c    Thu Aug 19 17:34:18 1999
> ***************
> *** 2519,2530 ****
>   static void
>   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>   {
>
>       /* allocate a VPageDescr entry if needed */
>       if (vpl->vpl_num_pages == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *) palloc(100 *
> sizeof(VPageDescr));
> !     else if (vpl->vpl_num_pages % 100 == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *)
> repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) *
> sizeof(VPageDescr));
>       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>       (vpl->vpl_num_pages)++;
>
> --- 2519,2531 ----
>   static void
>   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>   {
> + #define PG_NPAGEDESC 1000
>
>       /* allocate a VPageDescr entry if needed */
>       if (vpl->vpl_num_pages == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *)
> palloc(PG_NPAGEDESC * sizeof(VPageDescr));
> !     else if (vpl->vpl_num_pages % PG_NPAGEDESC == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *)
> repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + PG_NPAGEDESC) *
> sizeof(VPageDescr));
>       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>       (vpl->vpl_num_pages)++;
>



Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I found the following comment in utils/mmgr/aset.c.
> The high memory usage of big vacuum is probably caused by this
> change.

AFAIK, there is no "change" there.  free() doesn't give memory
back to the kernel either.

> Calling repalloc() many times with its size parameter increasing
> would need large amount of memory.

Good point, because aset.c doesn't coalesce adjacent free chunks.
And of course, reallocating the block bigger and bigger is exactly
the usual behavior with realloc-using code :-(

I don't think it would be a good idea to add coalescing logic to aset.c
--- that'd defeat the purpose of building a small/simple/fast allocator.

Perhaps for large standalone chunks (those that AllocSetAlloc made an
entire separate block for), AllocSetFree should free() the block instead
of putting the chunk on its own freelist.  Assuming that malloc/free are
smart enough to coalesce adjacent blocks, that would prevent the bad
behavior from recurring once the request size gets past
ALLOC_SMALLCHUNK_LIMIT, and for small requests we don't care.

But it doesn't look like there is any cheap way to detect that a chunk
being freed takes up all of its block.  We'd have to mark it specially
somehow.  A kluge that comes to mind is to set the chunk->size to zero
when it is a standalone allocation.

I believe Jan designed the current aset.c logic.  Jan, any comments?

> Should vacuum call realloc() directly ?

Not unless you like *permanent* memory leaks instead of transient ones.
Consider what will happen at elog().

However, another possible solution is to redesign the data structure
in vacuum() so that it can be made up of multiple allocation blocks,
rather than insisting that all the array entries always be consecutive.
Then it wouldn't depend on repalloc at all.  On the whole I like that
idea better --- even if repalloc can be fixed not to waste memory, it
still implies copying large amounts of data around for no purpose.
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Bruce Momjian
Date:
Tom, you already handled this, right?


> >Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> Just for a testing I made a huge table (>2GB and it has 10000000
> >> tuples).  copy 10000000 tuples took 23 minutes. This is not so
> >> bad. Vacuum analyze took 11 minutes, not too bad. After this I created
> >> an index on int4 column. It took 9 minutes. Next I deleted 5000000
> >> tuples to see how long delete took. I found it was 6
> >> minutes. Good. Then I ran into a problem. After that I did vacuum
> >> analyze, and seemed it took forever!  (actually took 47 minutes). The
> >> biggest problem was postgres's process size. It was 478MB! This is not
> >> acceptable for me.  Any idea?
> >
> >Yeah, I've complained about that before --- it seems that vacuum takes
> >a really unreasonable amount of time to remove dead tuples from an index.
> >It's been like that at least since 6.3.2, probably longer.
>
> Hiroshi came up with a work around for this(see included
> patches). After applying it, the process size shrinked from 478MB to
> 86MB! (the processing time did not descrease, however). According to
> him, repalloc seems not very effective with large number of calls. The
> patches probably descreases the number to 1/10.
> --
> Tatsuo Ishii
>
> -------------------------------------------------------------------------
> *** vacuum.c.orig    Sat Jul  3 09:32:40 1999
> --- vacuum.c    Thu Aug 19 17:34:18 1999
> ***************
> *** 2519,2530 ****
>   static void
>   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>   {
>
>       /* allocate a VPageDescr entry if needed */
>       if (vpl->vpl_num_pages == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *) palloc(100 * sizeof(VPageDescr));
> !     else if (vpl->vpl_num_pages % 100 == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) *
sizeof(VPageDescr));
>       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>       (vpl->vpl_num_pages)++;
>
> --- 2519,2531 ----
>   static void
>   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>   {
> + #define PG_NPAGEDESC 1000
>
>       /* allocate a VPageDescr entry if needed */
>       if (vpl->vpl_num_pages == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *) palloc(PG_NPAGEDESC * sizeof(VPageDescr));
> !     else if (vpl->vpl_num_pages % PG_NPAGEDESC == 0)
> !         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + PG_NPAGEDESC) *
sizeof(VPageDescr));
>       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>       (vpl->vpl_num_pages)++;
>
>
> ************
> Check out "PostgreSQL Wearables" @ http://www.pgsql.com
>
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, you already handled this, right?

Someone committed it, not sure if it was me.

I was worried whether vacuum's other expandable lists needed the same
treatment, but Hiroshi and/or Tatsuo seemed to think it wasn't worth the
trouble to change them.  So I guess the item is closed.
        regards, tom lane


>>>> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>>> Just for a testing I made a huge table (>2GB and it has 10000000
>>>>> tuples).  copy 10000000 tuples took 23 minutes. This is not so
>>>>> bad. Vacuum analyze took 11 minutes, not too bad. After this I created
>>>>> an index on int4 column. It took 9 minutes. Next I deleted 5000000
>>>>> tuples to see how long delete took. I found it was 6
>>>>> minutes. Good. Then I ran into a problem. After that I did vacuum
>>>>> analyze, and seemed it took forever!  (actually took 47 minutes). The
>>>>> biggest problem was postgres's process size. It was 478MB! This is not
>>>>> acceptable for me.  Any idea?
>>>> 
>>>> Yeah, I've complained about that before --- it seems that vacuum takes
>>>> a really unreasonable amount of time to remove dead tuples from an index.
>>>> It's been like that at least since 6.3.2, probably longer.
>> 
>> Hiroshi came up with a work around for this(see included
>> patches). After applying it, the process size shrinked from 478MB to
>> 86MB! (the processing time did not descrease, however). According to
>> him, repalloc seems not very effective with large number of calls. The
>> patches probably descreases the number to 1/10.
>> --
>> Tatsuo Ishii
>> 
>> -------------------------------------------------------------------------
>> *** vacuum.c.orig    Sat Jul  3 09:32:40 1999
>> --- vacuum.c    Thu Aug 19 17:34:18 1999
>> ***************
>> *** 2519,2530 ****
>> static void
>> vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>> {
>> 
>> /* allocate a VPageDescr entry if needed */
>> if (vpl->vpl_num_pages == 0)
>> !         vpl->vpl_pagedesc = (VPageDescr *) palloc(100 * sizeof(VPageDescr));
>> !     else if (vpl->vpl_num_pages % 100 == 0)
>> !         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) *
sizeof(VPageDescr));
vpl-> vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>> (vpl->vpl_num_pages)++;
>> 
>> --- 2519,2531 ----
>> static void
>> vc_vpinsert(VPageList vpl, VPageDescr vpnew)
>> {
>> + #define PG_NPAGEDESC 1000
>> 
>> /* allocate a VPageDescr entry if needed */
>> if (vpl->vpl_num_pages == 0)
>> !         vpl->vpl_pagedesc = (VPageDescr *) palloc(PG_NPAGEDESC * sizeof(VPageDescr));
>> !     else if (vpl->vpl_num_pages % PG_NPAGEDESC == 0)
>> !         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + PG_NPAGEDESC) *
sizeof(VPageDescr));
vpl-> vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
>> (vpl->vpl_num_pages)++;


Re: [HACKERS] vacuum process size

From
Tatsuo Ishii
Date:
>Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> Tom, you already handled this, right?
>
>Someone committed it, not sure if it was me.

I have comitted changes to both the statble and the current.

>I was worried whether vacuum's other expandable lists needed the same
>treatment, but Hiroshi and/or Tatsuo seemed to think it wasn't worth the
>trouble to change them.  So I guess the item is closed.

I think so.
--
Tatsuo Ishii