Thread: space-effective varchar(255)-like arrangement

space-effective varchar(255)-like arrangement

From
Jan Vicherek
Date:
 Hello,

   I'm creating a list of all my files anywhere in a 60' diameter on any
possible media around me.

   For such purpose I'm planning to use PostgreSQL.

  The main table might look like :
create table file (
        name    varchar(255),   -- file : name; tape : name / #; tapedata : #
        am      int,            -- method of accessing contained file(s)/data
                                -- one of tape,gz,tar,Z,zip,spio,afio,cab,crypt
                                -- pgp,AUTO
        size    int8,           -- file : file size up to 2 ^ 64
                                -- tape : sum of datablock sizes
        md5     char(32),       -- MD5 sum of the data of size
        created datetime,       -- date of creation timestamp
        modified datetime,      -- date of last modification timestamp
        type    int,            -- directory entry type :
                                --   file,dir,link,device,pipe
        mode    int2,           -- 16 mode bits
        ownernm char(8),        -- owner name
        ownerid int2,           -- owner id
        groupnm char(8),        -- group name
        groupid int2,           -- group id
        links   int,            -- number of hard links
        mjmn    int2            -- if device, major / minor #
);


  However, the first line "name    varchar(255)" will waste a lot of
space, since only about %0.02 files will be over 200 chars, %0.1 over 100,
%0.2 over 64 and %1.6 over 32.

  I'm looking for alternate ways to store a full 255 char filename. I
would prefer to be able to get all filenames through a single SELECT that
looks the same for both short filenames (<32) and long filenames
(>32,<256).
  It is not terribly important that select or insert access is fast.
Though having faster insert might be good. Almost no updates will be
issued. But the ability to index on the filename is of some importance (at
least on the first about 32 chars).

  I was wondering whether "name    text," would save the space or not.

  Perhaps there would be smarter or more elegant suggestions than "name
text" for the field type, so I'm writing to the list to get some ideas.


    TIA,

      All the best for PostgreSQL team in 1999,

         Jan.

 -- Gospel of Jesus is the saving power of God for all who believe --
                ## To some, nothing is impossible. ##
                   http://Vicherek.Waterloo.on.ca/



Re: [GENERAL] space-effective varchar(255)-like arrangement

From
Bruce Momjian
Date:
>
>  Hello,
>
>    I'm creating a list of all my files anywhere in a 60' diameter on any
> possible media around me.
>
>    For such purpose I'm planning to use PostgreSQL.
>
>   The main table might look like :
> create table file (
>         name    varchar(255),   -- file : name; tape : name / #; tapedata : #
>         am      int,            -- method of accessing contained file(s)/data
>                                 -- one of tape,gz,tar,Z,zip,spio,afio,cab,crypt
>                                 -- pgp,AUTO
>         size    int8,           -- file : file size up to 2 ^ 64
>                                 -- tape : sum of datablock sizes
>         md5     char(32),       -- MD5 sum of the data of size
>         created datetime,       -- date of creation timestamp
>         modified datetime,      -- date of last modification timestamp
>         type    int,            -- directory entry type :
>                                 --   file,dir,link,device,pipe
>         mode    int2,           -- 16 mode bits
>         ownernm char(8),        -- owner name
>         ownerid int2,           -- owner id
>         groupnm char(8),        -- group name
>         groupid int2,           -- group id
>         links   int,            -- number of hard links
>         mjmn    int2            -- if device, major / minor #
> );
>
>
>   However, the first line "name    varchar(255)" will waste a lot of
> space, since only about %0.02 files will be over 200 chars, %0.1 over 100,
> %0.2 over 64 and %1.6 over 32.


See the FAQ under character types.  varcahr() does not store the max
space, only used space.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] space-effective varchar(255)-like arrangement

From
Jan Vicherek
Date:

  Hmm,

On Sat, 26 Dec 1998, Bruce Momjian wrote:

> >   The main table might look like :
> > create table file (
> >         name    varchar(255),   -- file : name; tape : name / #; tapedata : #
 ...
> >         mjmn    int2            -- if device, major / minor #
> > );
> >
> >   However, the first line "name    varchar(255)" will waste a lot of
> > space, since only about %0.02 files will be over 200 chars, %0.1 over 100,
> > %0.2 over 64 and %1.6 over 32.
>
> See the FAQ under character types.  varcahr() does not store the max
> space, only used space.

   hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead
for each row, then I can store 390 rows in one block if, on average,
"names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390

   If each record has variable length, pg has no way of calculating where
is start of next record, other than going through the block and summing up
lengths of varchars to determine where does next record begin, right ?
(above I use word "record" and "row" interchangeably).

    Thanx a bunch,

        Jan



 -- Gospel of Jesus is the saving power of God for all who believe --
                ## To some, nothing is impossible. ##
                   http://Vicherek.Waterloo.on.ca/


Re: [GENERAL] space-effective varchar(255)-like arrangement

From
Jan Vicherek
Date:

 Hmm, let me guess ...

  one of the overhead elements for each row header is pointer to the next
row, right ? That way the summing-up of varlena types is avoided, although
the postgres engine still has to go from row header to row header to find
Nth row in a page (block), right ?

   Thx,

        Jan



On Sat, 26 Dec 1998, Jan Vicherek wrote:

> On Sat, 26 Dec 1998, Bruce Momjian wrote:
> >
> > See the FAQ under character types.  varcahr() does not store the max
> > space, only used space.
>
>    hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead
> for each row, then I can store 390 rows in one block if, on average,
> "names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390
>
>    If each record has variable length, pg has no way of calculating where
> is start of next record, other than going through the block and summing up
> lengths of varchars to determine where does next record begin, right ?
> (above I use word "record" and "row" interchangeably).
>
>     Thanx a bunch,
>
>         Jan
>
>
>
>  -- Gospel of Jesus is the saving power of God for all who believe --
>                 ## To some, nothing is impossible. ##
>                    http://Vicherek.Waterloo.on.ca/
>
>

 -- Gospel of Jesus is the saving power of God for all who believe --
                ## To some, nothing is impossible. ##
                   http://Vicherek.Waterloo.on.ca/


Re: [GENERAL] space-effective varchar(255)-like arrangement

From
Bruce Momjian
Date:
>    hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead
> for each row, then I can store 390 rows in one block if, on average,
> "names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390

Yes.  See FAQ for row overhead calculations.

>    If each record has variable length, pg has no way of calculating where
> is start of next record, other than going through the block and summing up
> lengths of varchars to determine where does next record begin, right ?
> (above I use word "record" and "row" interchangeably).

It is not that bad.  It knows where each record starts.  It is finding
a certain field in the record that is a problem for any field AFTER the
first variable length field.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] space-effective varchar(255)-like arrangement

From
Bruce Momjian
Date:
>
>
>  Hmm, let me guess ...
>
>   one of the overhead elements for each row header is pointer to the next
> row, right ? That way the summing-up of varlena types is avoided, although
> the postgres engine still has to go from row header to row header to find
> Nth row in a page (block), right ?

Sort of.  At the start of each row, there is control information.  On of
the pieces of information stored is the length of the entire row.  Using
that, you can quickly skip to the end of the row.

Because rows can't grow/shrink in place, and only vacuum removes
superseded rows, the next row is right after the end of the current row.

This is getting in to hackers territory.  I have set the reply-to
accordingly.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026