Thread: memory usage of pg_upgrade
pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the tablespace name of every object (table, toast table, index) in the database being upgraded. This adds up pretty quickly when there is a very large number of objects. It could be changed to char* to a separately allocated name that takes only as much space it needs. But maybe it would be better to point into os_info.old_tablespaces or something like that, as surely there are not going to be one independent file space per object. typedef struct { ... char tablespace[MAXPGPATH]; } RelInfo; The struct FileNameMap has 4 more . Since there seems to be some interest in improving the scalability of pg_upgrade, this is one of the things to consider fixing. What is the best way to do it? Cheers, Jeff
On 09/09/2013 06:20 PM, Jeff Janes wrote: > pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the > tablespace name of every object (table, toast table, index) in the > database being upgraded. This adds up pretty quickly when there is a > very large number of objects. It could be changed to char* to a > separately allocated name that takes only as much space it needs. But > maybe it would be better to point into os_info.old_tablespaces or > something like that, as surely there are not going to be one > independent file space per object. > > > typedef struct > { > ... > char tablespace[MAXPGPATH]; > } RelInfo; > > The struct FileNameMap has 4 more . > > Since there seems to be some interest in improving the scalability of > pg_upgrade, this is one of the things to consider fixing. What is the > best way to do it? Send in a patch :-) We recently ripped out some uses of statically sized strings in the parallel code and replaced them with pointers to palloc'ed strings. So there is good precedent for this. See <https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179> In the case of tablespaces, I should have thought you could keep a hash table of the names and just store an entry id in the table structure. But that's just my speculation without actually looking at the code, so don't take my word for it :-) cheers andrew
On Mon, Sep 9, 2013 at 06:39:39PM -0400, Andrew Dunstan wrote: > > On 09/09/2013 06:20 PM, Jeff Janes wrote: > >pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the > >tablespace name of every object (table, toast table, index) in the > >database being upgraded. This adds up pretty quickly when there is a > >very large number of objects. It could be changed to char* to a > >separately allocated name that takes only as much space it needs. But > >maybe it would be better to point into os_info.old_tablespaces or > >something like that, as surely there are not going to be one > >independent file space per object. > > > > > >typedef struct > >{ > > ... > > char tablespace[MAXPGPATH]; > >} RelInfo; > > > >The struct FileNameMap has 4 more . > > > >Since there seems to be some interest in improving the scalability of > >pg_upgrade, this is one of the things to consider fixing. What is the > >best way to do it? > > > Send in a patch :-) > > We recently ripped out some uses of statically sized strings in the > parallel code and replaced them with pointers to palloc'ed strings. > So there is good precedent for this. See <https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179> > > In the case of tablespaces, I should have thought you could keep a > hash table of the names and just store an entry id in the table > structure. But that's just my speculation without actually looking > at the code, so don't take my word for it :-) Yes, please feel free to improve the code. I improved pg_upgrade CPU usage for a lerge number of objects, but never thought to look at memory usage. It would be a big win to just palloc/pfree the memory, rather than allocate tones of memory. If you don't get to it, I will in a few weeks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote: > > In the case of tablespaces, I should have thought you could keep a > > hash table of the names and just store an entry id in the table > > structure. But that's just my speculation without actually looking > > at the code, so don't take my word for it :-) > > Yes, please feel free to improve the code. I improved pg_upgrade CPU > usage for a lerge number of objects, but never thought to look at memory > usage. It would be a big win to just palloc/pfree the memory, rather > than allocate tones of memory. If you don't get to it, I will in a few > weeks. Thanks you for pointing out this problem. I have researched the cause and the major problem was that I was allocating the maximum path length in a struct rather than allocating just the length I needed, and was not reusing string pointers that I knew were not going to change. The updated attached patch significantly decreases memory consumption: tables orig patch % decrease ---- 1 27,168 kB 27,168 kB 0 1k 46,136 kB 27,920 kB 39 2k 65,224 kB 28,796 kB 56 4k 103,276 kB 30,472 kB 70 8k 179,512 kB 33,900 kB 81 16k 331,860 kB 40,788 kB 88 32k 636,544 kB 54,572 kB 91 64k 1,245,920 kB 81,876 kB 93 As you can see, a database with 64k tables shows a 93% decrease in memory use. I plan to apply this for PG 9.4. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Mon, Feb 3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote: > On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote: > > > In the case of tablespaces, I should have thought you could keep a > > > hash table of the names and just store an entry id in the table > > > structure. But that's just my speculation without actually looking > > > at the code, so don't take my word for it :-) > > > > Yes, please feel free to improve the code. I improved pg_upgrade CPU > > usage for a lerge number of objects, but never thought to look at memory > > usage. It would be a big win to just palloc/pfree the memory, rather > > than allocate tones of memory. If you don't get to it, I will in a few > > weeks. > > Thanks you for pointing out this problem. I have researched the cause > and the major problem was that I was allocating the maximum path length > in a struct rather than allocating just the length I needed, and was not > reusing string pointers that I knew were not going to change. > > The updated attached patch significantly decreases memory consumption: > > tables orig patch % decrease > ---- > 1 27,168 kB 27,168 kB 0 > 1k 46,136 kB 27,920 kB 39 > 2k 65,224 kB 28,796 kB 56 > 4k 103,276 kB 30,472 kB 70 > 8k 179,512 kB 33,900 kB 81 > 16k 331,860 kB 40,788 kB 88 > 32k 636,544 kB 54,572 kB 91 > 64k 1,245,920 kB 81,876 kB 93 > > As you can see, a database with 64k tables shows a 93% decrease in > memory use. I plan to apply this for PG 9.4. Patch applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +