Thread: space-effective varchar(255)-like arrangement
Hello, I'm creating a list of all my files anywhere in a 60' diameter on any possible media around me. For such purpose I'm planning to use PostgreSQL. The main table might look like : create table file ( name varchar(255), -- file : name; tape : name / #; tapedata : # am int, -- method of accessing contained file(s)/data -- one of tape,gz,tar,Z,zip,spio,afio,cab,crypt -- pgp,AUTO size int8, -- file : file size up to 2 ^ 64 -- tape : sum of datablock sizes md5 char(32), -- MD5 sum of the data of size created datetime, -- date of creation timestamp modified datetime, -- date of last modification timestamp type int, -- directory entry type : -- file,dir,link,device,pipe mode int2, -- 16 mode bits ownernm char(8), -- owner name ownerid int2, -- owner id groupnm char(8), -- group name groupid int2, -- group id links int, -- number of hard links mjmn int2 -- if device, major / minor # ); However, the first line "name varchar(255)" will waste a lot of space, since only about %0.02 files will be over 200 chars, %0.1 over 100, %0.2 over 64 and %1.6 over 32. I'm looking for alternate ways to store a full 255 char filename. I would prefer to be able to get all filenames through a single SELECT that looks the same for both short filenames (<32) and long filenames (>32,<256). It is not terribly important that select or insert access is fast. Though having faster insert might be good. Almost no updates will be issued. But the ability to index on the filename is of some importance (at least on the first about 32 chars). I was wondering whether "name text," would save the space or not. Perhaps there would be smarter or more elegant suggestions than "name text" for the field type, so I'm writing to the list to get some ideas. TIA, All the best for PostgreSQL team in 1999, Jan. -- Gospel of Jesus is the saving power of God for all who believe -- ## To some, nothing is impossible. ## http://Vicherek.Waterloo.on.ca/
> > Hello, > > I'm creating a list of all my files anywhere in a 60' diameter on any > possible media around me. > > For such purpose I'm planning to use PostgreSQL. > > The main table might look like : > create table file ( > name varchar(255), -- file : name; tape : name / #; tapedata : # > am int, -- method of accessing contained file(s)/data > -- one of tape,gz,tar,Z,zip,spio,afio,cab,crypt > -- pgp,AUTO > size int8, -- file : file size up to 2 ^ 64 > -- tape : sum of datablock sizes > md5 char(32), -- MD5 sum of the data of size > created datetime, -- date of creation timestamp > modified datetime, -- date of last modification timestamp > type int, -- directory entry type : > -- file,dir,link,device,pipe > mode int2, -- 16 mode bits > ownernm char(8), -- owner name > ownerid int2, -- owner id > groupnm char(8), -- group name > groupid int2, -- group id > links int, -- number of hard links > mjmn int2 -- if device, major / minor # > ); > > > However, the first line "name varchar(255)" will waste a lot of > space, since only about %0.02 files will be over 200 chars, %0.1 over 100, > %0.2 over 64 and %1.6 over 32. See the FAQ under character types. varcahr() does not store the max space, only used space. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hmm, On Sat, 26 Dec 1998, Bruce Momjian wrote: > > The main table might look like : > > create table file ( > > name varchar(255), -- file : name; tape : name / #; tapedata : # ... > > mjmn int2 -- if device, major / minor # > > ); > > > > However, the first line "name varchar(255)" will waste a lot of > > space, since only about %0.02 files will be over 200 chars, %0.1 over 100, > > %0.2 over 64 and %1.6 over 32. > > See the FAQ under character types. varcahr() does not store the max > space, only used space. hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead for each row, then I can store 390 rows in one block if, on average, "names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390 If each record has variable length, pg has no way of calculating where is start of next record, other than going through the block and summing up lengths of varchars to determine where does next record begin, right ? (above I use word "record" and "row" interchangeably). Thanx a bunch, Jan -- Gospel of Jesus is the saving power of God for all who believe -- ## To some, nothing is impossible. ## http://Vicherek.Waterloo.on.ca/
Hmm, let me guess ... one of the overhead elements for each row header is pointer to the next row, right ? That way the summing-up of varlena types is avoided, although the postgres engine still has to go from row header to row header to find Nth row in a page (block), right ? Thx, Jan On Sat, 26 Dec 1998, Jan Vicherek wrote: > On Sat, 26 Dec 1998, Bruce Momjian wrote: > > > > See the FAQ under character types. varcahr() does not store the max > > space, only used space. > > hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead > for each row, then I can store 390 rows in one block if, on average, > "names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390 > > If each record has variable length, pg has no way of calculating where > is start of next record, other than going through the block and summing up > lengths of varchars to determine where does next record begin, right ? > (above I use word "record" and "row" interchangeably). > > Thanx a bunch, > > Jan > > > > -- Gospel of Jesus is the saving power of God for all who believe -- > ## To some, nothing is impossible. ## > http://Vicherek.Waterloo.on.ca/ > > -- Gospel of Jesus is the saving power of God for all who believe -- ## To some, nothing is impossible. ## http://Vicherek.Waterloo.on.ca/
> hmm, thus if one block has 8192 bytes, and if there is 9-byte overhead > for each row, then I can store 390 rows in one block if, on average, > "names" are 10 chars in length ? echo $[ 8192 / ( 9 + 10 + 2 ) ] # = 390 Yes. See FAQ for row overhead calculations. > If each record has variable length, pg has no way of calculating where > is start of next record, other than going through the block and summing up > lengths of varchars to determine where does next record begin, right ? > (above I use word "record" and "row" interchangeably). It is not that bad. It knows where each record starts. It is finding a certain field in the record that is a problem for any field AFTER the first variable length field. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > > Hmm, let me guess ... > > one of the overhead elements for each row header is pointer to the next > row, right ? That way the summing-up of varlena types is avoided, although > the postgres engine still has to go from row header to row header to find > Nth row in a page (block), right ? Sort of. At the start of each row, there is control information. On of the pieces of information stored is the length of the entire row. Using that, you can quickly skip to the end of the row. Because rows can't grow/shrink in place, and only vacuum removes superseded rows, the next row is right after the end of the current row. This is getting in to hackers territory. I have set the reply-to accordingly. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026