Thread: RE: [HACKERS] vacuum process size
At the very least, couldn't vc_vpinsert() double vpl->vpl_num_pages whenever vpl->vpl_num_pages needs to be expanded instead of expanding linearly by PG_NPAGEDESC, or by the original 100? Mike Mascari (mascarim@yahoo.com) --- Hiroshi Inoue <Inoue@tpf.co.jp> wrote: > Hi all, > > I found the following comment in utils/mmgr/aset.c. > The high memory usage of big vacuum is probably > caused by this > change. > Calling repalloc() many times with its size > parameter increasing > would need large amount of memory. > > Should vacuum call realloc() directly ? > Or should AllocSet..() be changed ? > > Comments ? > > * NOTE: > * This is a new (Feb. 05, 1999) implementation > of the allocation set > * routines. AllocSet...() does not use > OrderedSet...() any more. > * Instead it manages allocations in a block > pool by itself, combining > * many small allocations in a few bigger > blocks. AllocSetFree() does > * never free() memory really. It just add's > the free'd area to some > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > * list for later reuse by AllocSetAlloc(). All > memory blocks are > free()'d > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp > > *** vacuum.c.orig Sat Jul 3 09:32:40 1999 > > --- vacuum.c Thu Aug 19 17:34:18 1999 > > *************** > > *** 2519,2530 **** > > static void > > vc_vpinsert(VPageList vpl, VPageDescr vpnew) > > { > > > > /* allocate a VPageDescr entry if needed */ > > if (vpl->vpl_num_pages == 0) > > ! vpl->vpl_pagedesc = (VPageDescr *) palloc(100 > * > > sizeof(VPageDescr)); > > ! else if (vpl->vpl_num_pages % 100 == 0) > > ! vpl->vpl_pagedesc = (VPageDescr *) > > repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + > 100) * > > sizeof(VPageDescr)); > > vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew; > > (vpl->vpl_num_pages)++; > > > > --- 2519,2531 ---- > > static void > > vc_vpinsert(VPageList vpl, VPageDescr vpnew) > > { > > + #define PG_NPAGEDESC 1000 > > > > /* allocate a VPageDescr entry if needed */ > > if (vpl->vpl_num_pages == 0) > > ! vpl->vpl_pagedesc = (VPageDescr *) > > palloc(PG_NPAGEDESC * sizeof(VPageDescr)); > > ! else if (vpl->vpl_num_pages % PG_NPAGEDESC == > 0) > > ! vpl->vpl_pagedesc = (VPageDescr *) > > repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + > PG_NPAGEDESC) * > > sizeof(VPageDescr)); > > vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew; > > (vpl->vpl_num_pages)++; > > __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com
Mike, > At the very least, couldn't vc_vpinsert() double > vpl->vpl_num_pages whenever vpl->vpl_num_pages > needs to be expanded instead of expanding linearly > by PG_NPAGEDESC, or by the original 100? I have tested your idea and found even more improved memory usage (86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with deleting 5000000 tuples that would not be acceptable for most configurations. I think we should fix this as soon as possible. If there's no objection, I will commit included patches to the stable tree (seems Tom has more aggressive idea, so I'll leave the current tree as it is). --- Tatsuo Ishii ------------------------------------------------------------------- *** vacuum.c.orig Sat Jul 3 09:32:40 1999 --- vacuum.c Tue Aug 24 10:08:43 1999 *************** *** 2519,2530 **** static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) { /* allocate a VPageDescr entry if needed*/ if (vpl->vpl_num_pages == 0) ! vpl->vpl_pagedesc = (VPageDescr *) palloc(100 * sizeof(VPageDescr)); ! else if (vpl->vpl_num_pages % 100 == 0) ! vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, (vpl->vpl_num_pages + 100) * sizeof(VPageDescr)); vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew; (vpl->vpl_num_pages)++; --- 2519,2538 ---- static void vc_vpinsert(VPageList vpl, VPageDescr vpnew) { + #define PG_NPAGEDESC 1024 + static uint num_pages; /* allocate a VPageDescr entry if needed */ if (vpl->vpl_num_pages == 0) ! { ! vpl->vpl_pagedesc = (VPageDescr *) palloc(PG_NPAGEDESC * sizeof(VPageDescr)); ! num_pages = PG_NPAGEDESC; ! } ! else if (vpl->vpl_num_pages >= num_pages) ! { ! num_pages *= 2; ! vpl->vpl_pagedesc = (VPageDescr *) repalloc(vpl->vpl_pagedesc, num_pages * sizeof(VPageDescr)); ! } vpl->vpl_pagedesc[vpl->vpl_num_pages] = vpnew; (vpl->vpl_num_pages)++;
> At the very least, couldn't vc_vpinsert() double > vpl->vpl_num_pages whenever vpl->vpl_num_pages > needs to be expanded instead of expanding linearly > by PG_NPAGEDESC, or by the original 100? This seems like a good idea. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I have tested your idea and found even more improved memory usage > (86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with > deleting 5000000 tuples that would not be acceptable for most > configurations. I think we should fix this as soon as possible. If > there's no objection, I will commit included patches to the stable > tree (seems Tom has more aggressive idea, so I'll leave the current > tree as it is). No, please make the change in current as well. I was thinking about tweaking aset.c to be smarter about releasing large chunks, but in any case having the doubling behavior at the request point will be a big improvement. I do not like your patch as given, however. By using a static variable you are assuming that there is only one active VPageList at a time. It looks to me like there are at least two --- and there is no reason to think they'd be the same size. You need to add a num_pages field to the VPageList struct, not use a static. regards, tom lane
I have been looking some more at the vacuum-process-size issue, and I am having a hard time understanding why the VPageList data structure is the critical one. As far as I can see, there should be at most one pointer in it for each disk page of the relation. OK, you were vacuuming a table with something like a quarter million pages, so the end size of the VPageList would have been something like a megabyte, and given the inefficient usage of repalloc() in the original code, a lot more space than that would have been wasted as the list grew. So doubling the array size at each step is a good change. But there are a lot more tuples than pages in most relations. I see two lists with per-tuple data in vacuum.c, "vtlinks" in vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with essentially the same technique of repalloc() after every N entries. I'm not entirely clear on how many tuples get put into each of these lists, but it sure seems like in ordinary circumstances they'd be much bigger space hogs than any of the three VPageList lists. I recommend going to a doubling approach for each of these lists as well as for VPageList. There is a fourth usage of repalloc with the same method, for "ioid" in vc_getindices. This only gets one entry per index on the current relation, so it's unlikely to be worth changing on its own merit. But it might be worth building a single subroutine that expands a growable list of entries (taking sizeof() each entry as a parameter) and applying it in all four places. regards, tom lane
Then <tgl@sss.pgh.pa.us> spoke up and said: > So doubling the array size at each step is a good change. > > But there are a lot more tuples than pages in most relations. > > I see two lists with per-tuple data in vacuum.c, "vtlinks" in > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with > essentially the same technique of repalloc() after every N entries. > I'm not entirely clear on how many tuples get put into each of these > lists, but it sure seems like in ordinary circumstances they'd be much > bigger space hogs than any of the three VPageList lists. > > I recommend going to a doubling approach for each of these lists as > well as for VPageList. Question: is there reliable information in pg_statistics (or other system tables) which can be used to make a reasonable estimate for the sizes of these structures before initial allocation? Certainly the file size can be gotten from a stat (some portability issues, sparse file issues). -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@cmu.edu for my public key. | =====================================================================
>> If there's no objection, I will commit included patches to the stable >> tree (seems Tom has more aggressive idea, so I'll leave the current >> tree as it is). > No, please make the change in current as well. I was thinking about > tweaking aset.c to be smarter about releasing large chunks, but in any > case having the doubling behavior at the request point will be a big > improvement. I have just committed changes into current (but not REL6_5) to make aset.c smarter about giving back memory from large requests. Basically, for chunk sizes >= ALLOC_BIGCHUNK_LIMIT, pfree() does an actual free() and repalloc() does an actual realloc(). There is no change in behavior for smaller chunk sizes. This should cap the amount of space that can be wasted by aset.c while repalloc'ing a chunk larger and larger. For lack of a better idea I set ALLOC_BIGCHUNK_LIMIT to 64K. I don't think it'd pay to make it very small, but I don't really know whether this is a good choice or not. It would still be a good idea to fix vacuum.c to double its repalloc requests at each step, but Tatsuo was already working on that part so I won't joggle his elbow... regards, tom lane
Brian E Gallew <geek+@cmu.edu> writes: > Question: is there reliable information in pg_statistics (or other > system tables) which can be used to make a reasonable estimate for the > sizes of these structures before initial allocation? Certainly the > file size can be gotten from a stat (some portability issues, sparse > file issues). pg_statistics would tell you what was found out by the last vacuum on the table, if there ever was one. Dunno how reliable you want to consider that to be. stat() would provide up-to-date info, but the problem with it is that the total file size might be a drastic overestimate of the number of pages that vacuum needs to put in these lists. There's not really much chance of getting a useful estimate from the last vacuum run, either. AFAICT what we are interested in is the number of pages containing dead tuples, and by definition all of those tuples will have died since the last vacuum... On the whole, just fixing the memory management seems like the best bet. We know how to do that, and it may benefit other things besides vacuum. regards, tom lane
>I have just committed changes into current (but not REL6_5) to make Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5 statble tree and would eventually become 6.5.2. If so, what is the REL6_5 Tag? Or I totally miss the point? -- Tatsuo Ishii
> >I have just committed changes into current (but not REL6_5) to make > > Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the > CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5 > statble tree and would eventually become 6.5.2. If so, what is the > REL6_5 Tag? Or I totally miss the point? REL6_5 was a mistake. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, August 25, 1999 1:20 AM > To: t-ishii@sra.co.jp > Cc: Mike Mascari; Hiroshi Inoue; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] vacuum process size > > > I have been looking some more at the vacuum-process-size issue, and > I am having a hard time understanding why the VPageList data structure > is the critical one. As far as I can see, there should be at most one > pointer in it for each disk page of the relation. OK, you were > vacuuming a table with something like a quarter million pages, so > the end size of the VPageList would have been something like a megabyte, > and given the inefficient usage of repalloc() in the original code, > a lot more space than that would have been wasted as the list grew. > So doubling the array size at each step is a good change. > > But there are a lot more tuples than pages in most relations. > > I see two lists with per-tuple data in vacuum.c, "vtlinks" in > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with > essentially the same technique of repalloc() after every N entries. > I'm not entirely clear on how many tuples get put into each of these > lists, but it sure seems like in ordinary circumstances they'd be much > bigger space hogs than any of the three VPageList lists. > AFAIK,both vtlinks and vtmove are NULL if vacuum is executed without concurrent transactions. They won't be so big unless loooong concurrent transactions exist. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the > CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5 > statble tree and would eventually become 6.5.2. If so, what is the > REL6_5 Tag? Or I totally miss the point? Right, REL6_5_PATCHES is the 6.5.* branch. REL6_5 is just a tag --- that is, it's effectively a frozen snapshot of the 6.5 release, not an evolvable branch. I am not sure if Marc intends to continue this naming convention in future, or if it was just a mistake to create REL6_5 as a tag not a branch. I don't see a whole lot of use for the frozen tag myself... regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > I have tested your idea and found even more improved memory usage > > (86MB vs. 43MB). Standard vacuum consumes as much as 478MB memory with > > deleting 5000000 tuples that would not be acceptable for most > > configurations. I think we should fix this as soon as possible. If > > there's no objection, I will commit included patches to the stable > > tree (seems Tom has more aggressive idea, so I'll leave the current > > tree as it is). > > No, please make the change in current as well. I was thinking about > tweaking aset.c to be smarter about releasing large chunks, but in any > case having the doubling behavior at the request point will be a big > improvement. > > I do not like your patch as given, however. By using a static variable > you are assuming that there is only one active VPageList at a time. > It looks to me like there are at least two --- and there is no reason > to think they'd be the same size. > > You need to add a num_pages field to the VPageList struct, not use > a static. Good point. I have committed new patches that do not use static variables anymore to both REL6_5_PATCHES and current tree. Modified files: backend/commands/vacuum.c and include/commands/vacuum.h. --- Tatsuo Ishii
On Wed, 25 Aug 1999, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Just for a confirmation: I see REL6_5_PATCHES and REL6_5 Tag in the > > CVS respository. I thought that REL6_5_PATCHES is the Tag for the 6.5 > > statble tree and would eventually become 6.5.2. If so, what is the > > REL6_5 Tag? Or I totally miss the point? > > Right, REL6_5_PATCHES is the 6.5.* branch. REL6_5 is just a tag --- > that is, it's effectively a frozen snapshot of the 6.5 release, > not an evolvable branch. > > I am not sure if Marc intends to continue this naming convention > in future, or if it was just a mistake to create REL6_5 as a tag > not a branch. I don't see a whole lot of use for the frozen tag > myself... I like the frozen tag myself, since, in the future, if we need to create a quick tar ball of what things looked like at that release (ie. v6.5->v6.5.2 patch?), its easy to generate... Actually, come to think of it...am going to try that out now...report back in a bit... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org