Thread: int2 vs int4 in Postgres
Is there an performance benefit to using int2 (instead of int4) in cases where i know i will be well within its numeric range? I want to conserve storage space and gain speed anywhere i can, but i know some apps simply end up casting 2byte data to 4byte (like Java int/short). These int2 values will be used in primary and foreign key fields and I know that i must explicitly use tick marks (ex: where int2_column = '12') in order to make use of indexes, but my question is IS IT WORTH IT? IS THERE ANY REAL GAIN FOR DOING THIS? An simple scenario would be: Songs ------- song_id serial pkey genre int2 fkey title varchar ... Genres ------- genreid int2 pkey name varchar description varchar I KNOW that I am not going to have anywhere near 32,000+ different genres in my genre table so why use int4? Would that squeeze a few more milliseconds of performance out of a LARGE song table query with a genre lookup? Thanks, -Aaron
On Mon, Sep 26, 2005 at 12:54:05PM -0500, Announce wrote: > Is there an performance benefit to using int2 (instead of int4) in cases > where i know i will be well within its numeric range? I want to conserve > storage space and gain speed anywhere i can, but i know some apps simply end > up casting 2byte data to 4byte (like Java int/short). > > These int2 values will be used in primary and foreign key fields and I know > that i must explicitly use tick marks (ex: where int2_column = '12') in > order to make use of indexes, but my question is IS IT WORTH IT? IS THERE > ANY REAL GAIN FOR DOING THIS? > An simple scenario would be: > > Songs > ------- > song_id serial pkey > genre int2 fkey > title varchar Not in this case, because the varchar column that follows the int2 column needs 4-byte alignment, so after the int2 column there must be 2 bytes of padding. If you had two consecutive int2 fields you would save some the space. Or int2/bool/bool (bool has 1-byte alignment), etc. This assumes you are in a tipical x86 environment ... in other environments the situation may be different. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 17.7", W 73º 14' 26.8" Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos / con todos los humanos acabaré (Bender)
On Mon, 2005-26-09 at 12:54 -0500, Announce wrote: > Is there an performance benefit to using int2 (instead of int4) in cases > where i know i will be well within its numeric range? int2 uses slightly less storage space (2 bytes rather than 4). Depending on alignment and padding requirements, as well as the other columns in the table, that may translate into requiring fewer disk pages and therefore slightly better performance and lower storage requirements. -Neil
truthhurts@insightbb.com ("Announce") writes: > I KNOW that I am not going to have anywhere near 32,000+ different > genres in my genre table so why use int4? Would that squeeze a few > more milliseconds of performance out of a LARGE song table query > with a genre lookup? By the way, I see a lot of queries on tables NOT optimized in this fashion that run in less than a millisecond, so it would seem remarkable to me if there were milliseconds to be squeezed out in the first place... -- output = reverse("moc.enworbbc" "@" "enworbbc") http://www.ntlug.org/~cbbrowne/sap.html Why do we drive on parkways and park on driveways?
truthhurts@insightbb.com ("Announce") writes: > I KNOW that I am not going to have anywhere near 32,000+ different > genres in my genre table so why use int4? Would that squeeze a few > more milliseconds of performance out of a LARGE song table query > with a genre lookup? If the field is immaterial in terms of the size of the table, then it won't help materially. If you were going to index on it, however, THAT would make it significant for indices involving the "genre" column. Fitting more tuples into each page is a big help, and this would help. I doubt it'll be material, but I'd think it a good thing to apply what restrictions to your data types that you can, a priori, so I'd be inclined to use "int2" for this... -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://cbbrowne.com/info/nonrdbms.html Rules of the Evil Overlord #136. "If I build a bomb, I will simply remember which wire to cut if it has to be deactivated and make every wire red." <http://www.eviloverlord.com/>
Chris Browne <cbbrowne@acm.org> writes: > If the field is immaterial in terms of the size of the table, then it > won't help materially. > If you were going to index on it, however, THAT would make it > significant for indices involving the "genre" column. Fitting more > tuples into each page is a big help, and this would help. For a multicolumn index it might help to replace int4 by int2. For a single-column index, alignment constraints on the index entries will prevent you from saving anything :-( regards, tom lane