Thread: RE: [HACKERS] vacuum process size

RE: [HACKERS] vacuum process size

From
Mike Mascari
Date:
At the very least, couldn't vc_vpinsert() double
vpl->vpl_num_pages whenever vpl->vpl_num_pages
needs to be expanded instead of expanding linearly
by PG_NPAGEDESC, or by the original 100?

Mike Mascari
(mascarim@yahoo.com)

--- Hiroshi Inoue <Inoue@tpf.co.jp> wrote:
> Hi all,
> 
> I found the following comment in utils/mmgr/aset.c.
> The high memory usage of big vacuum is probably
> caused by this
> change.
> Calling repalloc() many times with its size
> parameter increasing
> would need large amount of memory.
> 
> Should vacuum call realloc() directly ?
> Or should AllocSet..() be changed ?
> 
> Comments ?
> 
>  * NOTE:
>  *      This is a new (Feb. 05, 1999) implementation
> of the allocation set
>  *      routines. AllocSet...() does not use
> OrderedSet...() any more.
>  *      Instead it manages allocations in a block
> pool by itself, combining
>  *      many small allocations in a few bigger
> blocks. AllocSetFree() does
>  *      never free() memory really. It just add's
> the free'd area to some
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  *      list for later reuse by AllocSetAlloc(). All
> memory blocks are
> free()'d
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp

> > *** vacuum.c.orig    Sat Jul  3 09:32:40 1999
> > --- vacuum.c    Thu Aug 19 17:34:18 1999
> > ***************
> > *** 2519,2530 ****
> >   static void
> >   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
> >   {
> >
> >       /* allocate a VPageDescr entry if needed */
> >       if (vpl->vpl_num_pages == 0)
> > !         vpl->vpl_pagedesc = (VPageDescr *) palloc(100
> *
> > sizeof(VPageDescr));
> > !     else if (vpl->vpl_num_pages % 100 == 0)
> > !         vpl->vpl_pagedesc = (VPageDescr *)
> > repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages +
> 100) *
> > sizeof(VPageDescr));
> >       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
> >       (vpl->vpl_num_pages)++;
> >
> > --- 2519,2531 ----
> >   static void
> >   vc_vpinsert(VPageList vpl, VPageDescr vpnew)
> >   {
> > + #define PG_NPAGEDESC 1000
> >
> >       /* allocate a VPageDescr entry if needed */
> >       if (vpl->vpl_num_pages == 0)
> > !         vpl->vpl_pagedesc = (VPageDescr *)
> > palloc(PG_NPAGEDESC * sizeof(VPageDescr));
> > !     else if (vpl->vpl_num_pages % PG_NPAGEDESC ==
> 0)
> > !         vpl->vpl_pagedesc = (VPageDescr *)
> > repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages +
> PG_NPAGEDESC) *
> > sizeof(VPageDescr));
> >       vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;
> >       (vpl->vpl_num_pages)++;
> >

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com



Re: [HACKERS] vacuum process size

From
Tatsuo Ishii
Date:
Mike, 

> At the very least, couldn't vc_vpinsert() double
> vpl->vpl_num_pages whenever vpl->vpl_num_pages
> needs to be expanded instead of expanding linearly
> by PG_NPAGEDESC, or by the original 100?

I have tested your idea and found even more improved memory usage
(86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with
deleting 5000000 tuples that would not be acceptable for most
configurations. I think we should fix this as soon as possible.  If
there's no objection, I will commit included patches to the stable
tree (seems Tom has more aggressive idea, so I'll leave the current
tree as it is).
---
Tatsuo Ishii
-------------------------------------------------------------------
*** vacuum.c.orig    Sat Jul  3 09:32:40 1999
--- vacuum.c    Tue Aug 24 10:08:43 1999
***************
*** 2519,2530 **** static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) {      /* allocate a VPageDescr entry if
needed*/     if (vpl->vpl_num_pages == 0)
 
!         vpl->vpl_pagedesc = (VPageDescr *) palloc(100 * sizeof(VPageDescr));
!     else if (vpl->vpl_num_pages % 100 == 0)
!         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) *
sizeof(VPageDescr));    vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;     (vpl->vpl_num_pages)++; 
 
--- 2519,2538 ---- static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) {
+ #define PG_NPAGEDESC 1024
+ static uint num_pages;      /* allocate a VPageDescr entry if needed */     if (vpl->vpl_num_pages == 0)
!     {
!         vpl->vpl_pagedesc = (VPageDescr *) palloc(PG_NPAGEDESC * sizeof(VPageDescr));
!         num_pages = PG_NPAGEDESC;
!     }
!     else if (vpl->vpl_num_pages >= num_pages)
!     {
!         num_pages *= 2;
!         vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, num_pages * sizeof(VPageDescr));
!     }     vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew;     (vpl->vpl_num_pages)++; 


Re: [HACKERS] vacuum process size

From
Bruce Momjian
Date:
> At the very least, couldn't vc_vpinsert() double
> vpl->vpl_num_pages whenever vpl->vpl_num_pages
> needs to be expanded instead of expanding linearly
> by PG_NPAGEDESC, or by the original 100?

This seems like a good idea.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have tested your idea and found even more improved memory usage
> (86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with
> deleting 5000000 tuples that would not be acceptable for most
> configurations. I think we should fix this as soon as possible.  If
> there's no objection, I will commit included patches to the stable
> tree (seems Tom has more aggressive idea, so I'll leave the current
> tree as it is).

No, please make the change in current as well.  I was thinking about
tweaking aset.c to be smarter about releasing large chunks, but in any
case having the doubling behavior at the request point will be a big
improvement.

I do not like your patch as given, however.  By using a static variable
you are assuming that there is only one active VPageList at a time.
It looks to me like there are at least two --- and there is no reason
to think they'd be the same size.

You need to add a num_pages field to the VPageList struct, not use
a static.
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
I have been looking some more at the vacuum-process-size issue, and
I am having a hard time understanding why the VPageList data structure
is the critical one.  As far as I can see, there should be at most one
pointer in it for each disk page of the relation.  OK, you were
vacuuming a table with something like a quarter million pages, so
the end size of the VPageList would have been something like a megabyte,
and given the inefficient usage of repalloc() in the original code,
a lot more space than that would have been wasted as the list grew.
So doubling the array size at each step is a good change.

But there are a lot more tuples than pages in most relations.

I see two lists with per-tuple data in vacuum.c, "vtlinks" in
vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
essentially the same technique of repalloc() after every N entries.
I'm not entirely clear on how many tuples get put into each of these
lists, but it sure seems like in ordinary circumstances they'd be much
bigger space hogs than any of the three VPageList lists.

I recommend going to a doubling approach for each of these lists as
well as for VPageList.

There is a fourth usage of repalloc with the same method, for "ioid"
in vc_getindices.  This only gets one entry per index on the current
relation, so it's unlikely to be worth changing on its own merit.
But it might be worth building a single subroutine that expands a
growable list of entries (taking sizeof() each entry as a parameter)
and applying it in all four places.
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Brian E Gallew
Date:
Then <tgl@sss.pgh.pa.us> spoke up and said:
> So doubling the array size at each step is a good change.
> 
> But there are a lot more tuples than pages in most relations.
> 
> I see two lists with per-tuple data in vacuum.c, "vtlinks" in
> vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
> essentially the same technique of repalloc() after every N entries.
> I'm not entirely clear on how many tuples get put into each of these
> lists, but it sure seems like in ordinary circumstances they'd be much
> bigger space hogs than any of the three VPageList lists.
> 
> I recommend going to a doubling approach for each of these lists as
> well as for VPageList.

Question: is there reliable information in pg_statistics (or other
system tables) which can be used to make a reasonable estimate for the
sizes of these structures before initial allocation?  Certainly the
file size can be gotten from a stat (some portability issues, sparse
file issues).


-- 
=====================================================================
| JAVA must have been developed in the wilds of West Virginia.      |
| After all, why else would it support only single inheritance??    |
=====================================================================
| Finger geek@cmu.edu for my public key.                            |
=====================================================================

Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
>> If there's no objection, I will commit included patches to the stable
>> tree (seems Tom has more aggressive idea, so I'll leave the current
>> tree as it is).

> No, please make the change in current as well.  I was thinking about
> tweaking aset.c to be smarter about releasing large chunks, but in any
> case having the doubling behavior at the request point will be a big
> improvement.

I have just committed changes into current (but not REL6_5) to make
aset.c smarter about giving back memory from large requests.  Basically,
for chunk sizes >= ALLOC_BIGCHUNK_LIMIT, pfree() does an actual free()
and repalloc() does an actual realloc().  There is no change in behavior
for smaller chunk sizes.  This should cap the amount of space that can
be wasted by aset.c while repalloc'ing a chunk larger and larger.

For lack of a better idea I set ALLOC_BIGCHUNK_LIMIT to 64K.  I don't
think it'd pay to make it very small, but I don't really know whether
this is a good choice or not.

It would still be a good idea to fix vacuum.c to double its repalloc
requests at each step, but Tatsuo was already working on that part
so I won't joggle his elbow...
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
Brian E Gallew <geek+@cmu.edu> writes:
> Question: is there reliable information in pg_statistics (or other
> system tables) which can be used to make a reasonable estimate for the
> sizes of these structures before initial allocation?  Certainly the
> file size can be gotten from a stat (some portability issues, sparse
> file issues).

pg_statistics would tell you what was found out by the last vacuum on
the table, if there ever was one.  Dunno how reliable you want to
consider that to be.  stat() would provide up-to-date info, but the
problem with it is that the total file size might be a drastic
overestimate of the number of pages that vacuum needs to put in these
lists.  There's not really much chance of getting a useful estimate from
the last vacuum run, either.  AFAICT what we are interested in is the
number of pages containing dead tuples, and by definition all of those
tuples will have died since the last vacuum...

On the whole, just fixing the memory management seems like the best bet.
We know how to do that, and it may benefit other things besides vacuum.
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Tatsuo Ishii
Date:
>I have just committed changes into current (but not REL6_5) to make

Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the
CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5
statble tree and would eventually become 6.5.2. If so, what is the
REL6_5 Tag? Or I totally miss the point?
--
Tatsuo Ishii



Re: [HACKERS] vacuum process size

From
Bruce Momjian
Date:
> >I have just committed changes into current (but not REL6_5) to make
> 
> Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the
> CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5
> statble tree and would eventually become 6.5.2. If so, what is the
> REL6_5 Tag? Or I totally miss the point?

REL6_5 was a mistake.
--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] vacuum process size

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, August 25, 1999 1:20 AM
> To: t-ishii@sra.co.jp
> Cc: Mike Mascari; Hiroshi Inoue; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] vacuum process size 
> 
> 
> I have been looking some more at the vacuum-process-size issue, and
> I am having a hard time understanding why the VPageList data structure
> is the critical one.  As far as I can see, there should be at most one
> pointer in it for each disk page of the relation.  OK, you were
> vacuuming a table with something like a quarter million pages, so
> the end size of the VPageList would have been something like a megabyte,
> and given the inefficient usage of repalloc() in the original code,
> a lot more space than that would have been wasted as the list grew.
> So doubling the array size at each step is a good change.
> 
> But there are a lot more tuples than pages in most relations.
> 
> I see two lists with per-tuple data in vacuum.c, "vtlinks" in
> vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
> essentially the same technique of repalloc() after every N entries.
> I'm not entirely clear on how many tuples get put into each of these
> lists, but it sure seems like in ordinary circumstances they'd be much
> bigger space hogs than any of the three VPageList lists.
>

AFAIK,both vtlinks and vtmove are NULL if vacuum is executed
without concurrent transactions.
They won't be so big unless loooong concurrent transactions exist.
Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] vacuum process size

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the
> CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5
> statble tree and would eventually become 6.5.2. If so, what is the
> REL6_5 Tag? Or I totally miss the point?

Right, REL6_5_PATCHES is the 6.5.* branch.  REL6_5 is just a tag ---
that is, it's effectively a frozen snapshot of the 6.5 release,
not an evolvable branch.

I am not sure if Marc intends to continue this naming convention
in future, or if it was just a mistake to create REL6_5 as a tag
not a branch.  I don't see a whole lot of use for the frozen tag
myself...
        regards, tom lane


Re: [HACKERS] vacuum process size

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I have tested your idea and found even more improved memory usage
> > (86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with
> > deleting 5000000 tuples that would not be acceptable for most
> > configurations. I think we should fix this as soon as possible.  If
> > there's no objection, I will commit included patches to the stable
> > tree (seems Tom has more aggressive idea, so I'll leave the current
> > tree as it is).
> 
> No, please make the change in current as well.  I was thinking about
> tweaking aset.c to be smarter about releasing large chunks, but in any
> case having the doubling behavior at the request point will be a big
> improvement.
> 
> I do not like your patch as given, however.  By using a static variable
> you are assuming that there is only one active VPageList at a time.
> It looks to me like there are at least two --- and there is no reason
> to think they'd be the same size.
> 
> You need to add a num_pages field to the VPageList struct, not use
> a static.

Good point. I have committed new patches that do not use static
variables anymore to both REL6_5_PATCHES and current tree.

Modified files: backend/commands/vacuum.c and
include/commands/vacuum.h.
---
Tatsuo Ishii


Re: [HACKERS] vacuum process size

From
The Hermit Hacker
Date:
On Wed, 25 Aug 1999, Tom Lane wrote:

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the
> > CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5
> > statble tree and would eventually become 6.5.2. If so, what is the
> > REL6_5 Tag? Or I totally miss the point?
> 
> Right, REL6_5_PATCHES is the 6.5.* branch.  REL6_5 is just a tag ---
> that is, it's effectively a frozen snapshot of the 6.5 release,
> not an evolvable branch.
> 
> I am not sure if Marc intends to continue this naming convention
> in future, or if it was just a mistake to create REL6_5 as a tag
> not a branch.  I don't see a whole lot of use for the frozen tag
> myself...

I like the frozen tag myself, since, in the future, if we need to create a
quick tar ball of what things looked like at that release (ie.
v6.5->v6.5.2 patch?), its easy to generate...

Actually, come to think of it...am going to try that out now...report back
in a bit...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org