Thread: Dept of ugly hacks: eliminating padding space in system indexes

Dept of ugly hacks: eliminating padding space in system indexes

From
Tom Lane
Date:
I was thinking a bit about how we pad columns of type NAME to
fixed-width, even though they're semantically equivalent to C strings.
The reason for wasting that space is that it makes it possible to
overlay a C struct onto the leading columns of most system catalogs.
I don't wish to propose changing that (at least not today), but it
struck me that there is no reason to overlay a C struct onto index
entries, and that getting rid of the padding space would be even more
useful in an index than in the catalog itself.  It turns out to be
dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING.  Because the
two types are effectively binary-compatible as long as you don't
look at the padding, the attached ugly-but-impressively-short patch
seems to accomplish this.  It passes the regression tests anyway.
Here are some numbers about the space savings in a virgin database:

                            CVS HEAD    w/patch    savings

pg_database_size('postgres')                4439752    4071112    8.3%
pg_relation_size('pg_class_relname_nsp_index')        57344    40960    28%
pg_relation_size('pg_proc_proname_args_nsp_index')  319488    204800    35%

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

            regards, tom lane


Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.300
diff -c -r1.300 index.c
*** src/backend/catalog/index.c    19 Jun 2008 00:46:04 -0000    1.300
--- src/backend/catalog/index.c    23 Jun 2008 19:34:54 -0000
***************
*** 262,267 ****
--- 262,278 ----

              ReleaseSysCache(tuple);
          }
+
+         /*
+          * For an index on NAME, force the index storage to be CSTRING,
+          * rather than padded to fixed length.
+          */
+         if (to->atttypid == NAMEOID)
+         {
+             to->atttypid = CSTRINGOID;
+             to->attlen = -2;
+             to->attalign = 'c';
+         }
      }

      return indexTupDesc;

Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Bruce Momjian
Date:
I would mention in the C comment that we are doing this for space
savings, but other than that, it seems fine.

---------------------------------------------------------------------------

Tom Lane wrote:
> I was thinking a bit about how we pad columns of type NAME to
> fixed-width, even though they're semantically equivalent to C strings.
> The reason for wasting that space is that it makes it possible to
> overlay a C struct onto the leading columns of most system catalogs.
> I don't wish to propose changing that (at least not today), but it
> struck me that there is no reason to overlay a C struct onto index
> entries, and that getting rid of the padding space would be even more
> useful in an index than in the catalog itself.  It turns out to be
> dead easy to implement this: effectively, we just decree that the
> index column storage type for NAME is always CSTRING.  Because the
> two types are effectively binary-compatible as long as you don't
> look at the padding, the attached ugly-but-impressively-short patch
> seems to accomplish this.  It passes the regression tests anyway.
> Here are some numbers about the space savings in a virgin database:
> 
>                             CVS HEAD    w/patch    savings
> 
> pg_database_size('postgres')                4439752    4071112    8.3%
> pg_relation_size('pg_class_relname_nsp_index')        57344    40960    28%
> pg_relation_size('pg_proc_proname_args_nsp_index')  319488    204800    35%
> 
> Cutting a third off the size of a system index has got to be worth
> something, but is it worth a hack as ugly as this one?
> 
>             regards, tom lane
> 
> 

Content-Description: index-name-as-cstring.patch

> Index: src/backend/catalog/index.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
> retrieving revision 1.300
> diff -c -r1.300 index.c
> *** src/backend/catalog/index.c    19 Jun 2008 00:46:04 -0000    1.300
> --- src/backend/catalog/index.c    23 Jun 2008 19:34:54 -0000
> ***************
> *** 262,267 ****
> --- 262,278 ----
>   
>               ReleaseSysCache(tuple);
>           }
> + 
> +         /*
> +          * For an index on NAME, force the index storage to be CSTRING,
> +          * rather than padded to fixed length.
> +          */
> +         if (to->atttypid == NAMEOID)
> +         {
> +             to->atttypid = CSTRINGOID;
> +             to->attlen = -2;
> +             to->attalign = 'c';
> +         }
>       }
>   
>       return indexTupDesc;

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Cutting a third off the size of a system index has got to be worth
> something, but is it worth a hack as ugly as this one?
>   
>   

I think so.

cheers

andrew


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Simon Riggs
Date:
On Mon, 2008-06-23 at 15:52 -0400, Tom Lane wrote:
>                             CVS HEAD    w/patch    savings
> 
> pg_database_size('postgres')                4439752    4071112    8.3%
> pg_relation_size('pg_class_relname_nsp_index')        57344    40960    28%
> pg_relation_size('pg_proc_proname_args_nsp_index')  319488    204800    35%
> 
> Cutting a third off the size of a system index has got to be worth
> something, but is it worth a hack as ugly as this one?

Not doing it would be more ugly, unless there is some negative
side-effect?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Mark Mielke
Date:
Andrew Dunstan wrote:
> Tom Lane wrote:
>> Cutting a third off the size of a system index has got to be worth
>> something, but is it worth a hack as ugly as this one?
>>     
>
> I think so.

Were you able to time any speedup? Is this something that would benefit 
installations with a lot of metadata? I presume most of this information 
normally easily fits in cache most of the time?

I am trying to understand what exactly it is worth... :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Tom Lane
Date:
Mark Mielke <mark@mark.mielke.cc> writes:
>> Tom Lane wrote:
>>> Cutting a third off the size of a system index has got to be worth
>>> something, but is it worth a hack as ugly as this one?

> Were you able to time any speedup?

I didn't try; can you suggest any suitable benchmark?

The performance impact is probably going to be limited by our extensive
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again.  Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit.
        regards, tom lane


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2008-06-23 at 15:52 -0400, Tom Lane wrote:
>> Cutting a third off the size of a system index has got to be worth
>> something, but is it worth a hack as ugly as this one?

> Not doing it would be more ugly, unless there is some negative
> side-effect?

I thought some more about why this seems ugly to me, and realized that a
lot of it has to do with the change in typalign.  Currently, a compiler
is entitled to assume that a pointer to Name is 4-byte aligned; thus
for instance it could generate word-wide instructions for copying a Name
from one place to another.  A "Name" that is stored as just CSTRING
might break that.  We are already at risk of this, really, because of
all the places where we gaily pass plain old C strings to syscache and
index searches on Name columns.  I think the only reason we've not been
burnt is that it's hard to optimize strcmp() into word-wide operations.

However the solution to that seems fairly obvious: let's downgrade Name
to typalign 1 instead of 4.
        regards, tom lane


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Mark Mielke
Date:
Tom Lane wrote: <blockquote cite="mid:4440.1214262551@sss.pgh.pa.us" type="cite"><blockquote type="cite"><pre
wrap="">Wereyou able to time any speedup?   </pre></blockquote><pre wrap="">
 
I didn't try; can you suggest any suitable benchmark?
 </pre></blockquote><br /> Unfortunately - no. I kind of think it won't benefit any of my databases in any noticeable
way.My numbers are similar to yours:<br /><br /><blockquote type="cite">pccyber=# select
pg_database_size('postgres');<br/>           4468332<br /><br /> pccyber=# select
pg_relation_size('pg_class_relname_nsp_index');<br/>             90112<br /><br /> pccyber=# select
pg_relation_size('pg_proc_proname_args_nsp_index');<br/>            294912<br /></blockquote><br /> Not that I disagree
withyour change, but < 5 Mbytes in 4 Gbytes of RAM for my main PostgreSQL system that I manage seems like a drop in
thebucket. Even if 40% of pg_class_relname and pg_proc_proname indices was saved - we're talking about 154 Kbytes saved
onboth those indices combined. Minor? Major? I bet I wouldn't notice unless my database requirements used up all RAM,
andeven then I'm suspecting it wouldn't matter except for border line cases (like all pages required for everything
elsehappened to equal 4 Gbytes near exactly).<br /><br /><blockquote cite="mid:4440.1214262551@sss.pgh.pa.us"
type="cite"><prewrap="">The performance impact is probably going to be limited by our extensive
 
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again.  Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit. </pre></blockquote><br /> In my opinion it is 'do the right
thing',rather than a performance question. It seems to me that an index keeping tracking of space characters at the end
ofa name, char, varchar, or text does not make sense, and the right thing may be to do a more generic version of your
patch?In the few cases that space at the end matters, couldn't that be determined by re-checking the table row after
queryingit?<br /><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Teodor Sigaev
Date:
> dead easy to implement this: effectively, we just decree that the
> index column storage type for NAME is always CSTRING.  Because the

Isn't it a reason to add STORAGE option of CREATE OPERATOR CLASS to BTree? as 
it's done for GiST and GIN indexes.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Shane Ambler
Date:
Mark Mielke wrote:

> Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of RAM 
> for my main PostgreSQL system that I manage seems like a drop in the 
> bucket. Even if 40% of pg_class_relname and pg_proc_proname indices was 
> saved - we're talking about 154 Kbytes saved on both those indices 
> combined. Minor? Major? I bet I wouldn't notice unless my database 
> requirements used up all RAM, and even then I'm suspecting it wouldn't 
> matter except for border line cases (like all pages required for 
> everything else happened to equal 4 Gbytes near exactly).

Guess the mileage will vary depending on the complexity of the db 
structure. Shorter names will also benefit more than longer ones.

>> The performance impact is probably going to be limited by our extensive
>> use of catalog caches --- once a desired row is in a backend's catcache,
>> it doesn't take a btree search to fetch it again.  Still, the system
>> indexes are probably "hot" enough to stay in shared buffers most of the
>> time, and the smaller they are the more space will be left for other
>> stuff, so I think there should be a distributed benefit.
>>   

My question is whether this is limited to system catalogs? or will this 
benefit char() index used on any table? The second would make it more 
worthwhile.



-- 

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
"Heikki Linnakangas"
Date:
Shane Ambler wrote:
> My question is whether this is limited to system catalogs? or will this 
> benefit char() index used on any table? The second would make it more 
> worthwhile.

char(n) fields are already stored as variable-length on disk. This isn't 
applicable to them.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> dead easy to implement this: effectively, we just decree that the
>> index column storage type for NAME is always CSTRING.  Because the

> Isn't it a reason to add STORAGE option of CREATE OPERATOR CLASS to BTree? as
> it's done for GiST and GIN indexes.

Hmm ... I don't see a point in exposing that as a user-level facility,
unless you can point to other use-cases besides NAME.  But it would be
cute to implement the hack by changing the initial contents of
pg_opclass instead of inserting code in the backend.  I'll give that
a try.
        regards, tom lane


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Josh Berkus
Date:
Shane Ambler wrote:
> Mark Mielke wrote:
> 
>> Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of 
>> RAM for my main PostgreSQL system that I manage seems like a drop in 
>> the bucket. Even if 40% of pg_class_relname and pg_proc_proname 
>> indices was saved - we're talking about 154 Kbytes saved on both those 
>> indices combined. Minor? Major? I bet I wouldn't notice unless my 
>> database requirements used up all RAM, and even then I'm suspecting it 
>> wouldn't matter except for border line cases (like all pages required 
>> for everything else happened to equal 4 Gbytes near exactly).
> 
> Guess the mileage will vary depending on the complexity of the db 
> structure. Shorter names will also benefit more than longer ones.

There are PostgreSQL users out there with more than 100,000 tables per 
server instance.  This will make more of a difference to them.

--Josh



Re: Dept of ugly hacks: eliminating padding space in system indexes

From
"Joshua D. Drake"
Date:
Josh Berkus wrote:
> Shane Ambler wrote:
>> Mark Mielke wrote:
>>
>>> Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of 
>>> RAM for my main PostgreSQL system that I manage seems like a drop in 
>>> the bucket. Even if 40% of pg_class_relname and pg_proc_proname 
>>> indices was saved - we're talking about 154 Kbytes saved on both 
>>> those indices combined. Minor? Major? I bet I wouldn't notice unless 
>>> my database requirements used up all RAM, and even then I'm 
>>> suspecting it wouldn't matter except for border line cases (like all 
>>> pages required for everything else happened to equal 4 Gbytes near 
>>> exactly).
>>
>> Guess the mileage will vary depending on the complexity of the db 
>> structure. Shorter names will also benefit more than longer ones.
> 
> There are PostgreSQL users out there with more than 100,000 tables per 
> server instance.  This will make more of a difference to them.

More than I think people realize.

Joshua D. Drake

> 
> --Josh
> 
> 



Re: Dept of ugly hacks: eliminating padding space in system indexes

From
"Stephen R. van den Berg"
Date:
Mark Mielke wrote:
>saved - we're talking about 154 Kbytes saved on both those indices 
>combined. Minor? Major? I bet I wouldn't notice unless my database 
>requirements used up all RAM, and even then I'm suspecting it wouldn't 
>matter except for border line cases (like all pages required for 
>everything else happened to equal 4 Gbytes near exactly).

There is always only so much of 1st level and 2nd level cache; for those
the savings might well make a difference, even on multigigabyte
databases.
-- 
Sincerely,          Stephen R. van den Berg.

Life is that brief interlude between nothingness and eternity.


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Josh Berkus
Date:
Mark,

> Not that I disagree with your change, but < 5 Mbytes in 4 Gbytes of RAM 
> for my main PostgreSQL system that I manage seems like a drop in the 
> bucket. Even if 40% of pg_class_relname and pg_proc_proname indices was 
> saved - we're talking about 154 Kbytes saved on both those indices 
> combined. Minor? Major? I bet I wouldn't notice unless my database 
> requirements used up all RAM, and even then I'm suspecting it wouldn't 
> matter except for border line cases (like all pages required for 
> everything else happened to equal 4 Gbytes near exactly).

Again, I think the best way to test this would be to create an 
installation with more than 100,000 tables & views. That's not 
hypothetical; I've encountered it already twice in production users.

--Josh Berkus


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Zdenek Kotala
Date:
Tom Lane napsal(a):

> Cutting a third off the size of a system index has got to be worth
> something, but is it worth a hack as ugly as this one?
> 

The problem what I see there is how to fit with in-place-upgrade. Catalog should 
be generate from scratch, but if somebody uses name in regular table it invokes 
request for reindex.
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Tom Lane napsal(a):
>> Cutting a third off the size of a system index has got to be worth
>> something, but is it worth a hack as ugly as this one?

> The problem what I see there is how to fit with in-place-upgrade. Catalog should 
> be generate from scratch, but if somebody uses name in regular table it invokes 
> request for reindex.

Actually, an existing index stored as "name" would continue to work
fine, so I think this could be worked around.   But in any case name is
deprecated for user use, so anyone who suffers a reindex has only
themselves to blame.
        regards, tom lane


Re: Dept of ugly hacks: eliminating padding space in system indexes

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> Tom Lane napsal(a):
>>> Cutting a third off the size of a system index has got to be worth
>>> something, but is it worth a hack as ugly as this one?
> 
>> The problem what I see there is how to fit with in-place-upgrade. Catalog should 
>> be generate from scratch, but if somebody uses name in regular table it invokes 
>> request for reindex.
> 
> Actually, an existing index stored as "name" would continue to work
> fine, so I think this could be worked around.   But in any case name is
> deprecated for user use, so anyone who suffers a reindex has only
> themselves to blame.

Yes, it is deprecated by you know a user. Give him a loaded shot-gun and he 
start play a golf :-).

However, reindex is acceptable penalty for user who uses deprecated things.
    Zdenek