Thread: memory usage of pg_upgrade

memory usage of pg_upgrade

From
Jeff Janes
Date:
pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
tablespace name of every object (table, toast table, index) in the
database being upgraded.  This adds up pretty quickly when there is a
very large number of objects.  It could be changed to char* to a
separately allocated name that takes only as much space it needs.  But
maybe it would be better to point into os_info.old_tablespaces or
something like that, as surely there are not going to be one
independent file space per object.


typedef struct
{    ...   char        tablespace[MAXPGPATH];
} RelInfo;

The struct FileNameMap has 4 more .

Since there seems to be some interest in improving the scalability of
pg_upgrade, this is one of the things to consider fixing.  What is the
best way to do it?

Cheers,

Jeff



Re: memory usage of pg_upgrade

From
Andrew Dunstan
Date:
On 09/09/2013 06:20 PM, Jeff Janes wrote:
> pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
> tablespace name of every object (table, toast table, index) in the
> database being upgraded.  This adds up pretty quickly when there is a
> very large number of objects.  It could be changed to char* to a
> separately allocated name that takes only as much space it needs.  But
> maybe it would be better to point into os_info.old_tablespaces or
> something like that, as surely there are not going to be one
> independent file space per object.
>
>
> typedef struct
> {
>       ...
>      char        tablespace[MAXPGPATH];
> } RelInfo;
>
> The struct FileNameMap has 4 more .
>
> Since there seems to be some interest in improving the scalability of
> pg_upgrade, this is one of the things to consider fixing.  What is the
> best way to do it?


Send in a patch :-)

We recently ripped out some uses of statically sized strings in the 
parallel code and replaced them with pointers to palloc'ed strings. So 
there is good precedent for this. See 
<https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179>

In the case of tablespaces, I should have thought you could keep a hash 
table of the names and just store an entry id in the table structure. 
But that's just my speculation without actually looking at the code, so 
don't take my word for it :-)

cheers

andrew




Re: memory usage of pg_upgrade

From
Bruce Momjian
Date:
On Mon, Sep  9, 2013 at 06:39:39PM -0400, Andrew Dunstan wrote:
> 
> On 09/09/2013 06:20 PM, Jeff Janes wrote:
> >pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
> >tablespace name of every object (table, toast table, index) in the
> >database being upgraded.  This adds up pretty quickly when there is a
> >very large number of objects.  It could be changed to char* to a
> >separately allocated name that takes only as much space it needs.  But
> >maybe it would be better to point into os_info.old_tablespaces or
> >something like that, as surely there are not going to be one
> >independent file space per object.
> >
> >
> >typedef struct
> >{
> >      ...
> >     char        tablespace[MAXPGPATH];
> >} RelInfo;
> >
> >The struct FileNameMap has 4 more .
> >
> >Since there seems to be some interest in improving the scalability of
> >pg_upgrade, this is one of the things to consider fixing.  What is the
> >best way to do it?
> 
> 
> Send in a patch :-)
> 
> We recently ripped out some uses of statically sized strings in the
> parallel code and replaced them with pointers to palloc'ed strings.
> So there is good precedent for this. See
<https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179>
> 
> In the case of tablespaces, I should have thought you could keep a
> hash table of the names and just store an entry id in the table
> structure. But that's just my speculation without actually looking
> at the code, so don't take my word for it :-)

Yes, please feel free to improve the code.  I improved pg_upgrade CPU
usage for a lerge number of objects, but never thought to look at memory
usage.  It would be a big win to just palloc/pfree the memory, rather
than allocate tones of memory.  If you don't get to it, I will in a few
weeks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: memory usage of pg_upgrade

From
Bruce Momjian
Date:
On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
> > In the case of tablespaces, I should have thought you could keep a
> > hash table of the names and just store an entry id in the table
> > structure. But that's just my speculation without actually looking
> > at the code, so don't take my word for it :-)
>
> Yes, please feel free to improve the code.  I improved pg_upgrade CPU
> usage for a lerge number of objects, but never thought to look at memory
> usage.  It would be a big win to just palloc/pfree the memory, rather
> than allocate tones of memory.  If you don't get to it, I will in a few
> weeks.

Thanks you for pointing out this problem.  I have researched the cause
and the major problem was that I was allocating the maximum path length
in a struct rather than allocating just the length I needed, and was not
reusing string pointers that I knew were not going to change.

The updated attached patch significantly decreases memory consumption:

    tables        orig      patch        % decrease
    ----
    1         27,168 kB    27,168 kB     0
    1k         46,136 kB    27,920 kB    39
    2k         65,224 kB    28,796 kB    56
    4k        103,276 kB    30,472 kB    70
    8k        179,512 kB    33,900 kB    81
    16k        331,860 kB    40,788 kB    88
    32k        636,544 kB    54,572 kB    91
    64k      1,245,920 kB    81,876 kB    93

As you can see, a database with 64k tables shows a 93% decrease in
memory use.  I plan to apply this for PG 9.4.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: memory usage of pg_upgrade

From
Bruce Momjian
Date:
On Mon, Feb  3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote:
> On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
> > > In the case of tablespaces, I should have thought you could keep a
> > > hash table of the names and just store an entry id in the table
> > > structure. But that's just my speculation without actually looking
> > > at the code, so don't take my word for it :-)
> > 
> > Yes, please feel free to improve the code.  I improved pg_upgrade CPU
> > usage for a lerge number of objects, but never thought to look at memory
> > usage.  It would be a big win to just palloc/pfree the memory, rather
> > than allocate tones of memory.  If you don't get to it, I will in a few
> > weeks.
> 
> Thanks you for pointing out this problem.  I have researched the cause
> and the major problem was that I was allocating the maximum path length
> in a struct rather than allocating just the length I needed, and was not
> reusing string pointers that I knew were not going to change.
> 
> The updated attached patch significantly decreases memory consumption:
> 
>     tables        orig      patch        % decrease
>     ----
>     1         27,168 kB    27,168 kB     0
>     1k         46,136 kB    27,920 kB    39
>     2k         65,224 kB    28,796 kB    56
>     4k        103,276 kB    30,472 kB    70
>     8k        179,512 kB    33,900 kB    81
>     16k        331,860 kB    40,788 kB    88
>     32k        636,544 kB    54,572 kB    91
>     64k      1,245,920 kB    81,876 kB    93
> 
> As you can see, a database with 64k tables shows a 93% decrease in
> memory use.  I plan to apply this for PG 9.4.

Patch applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +