Thread: int2 vs int4 in Postgres

int2 vs int4 in Postgres

From
"Announce"
Date:
Is there an performance benefit to using int2 (instead of int4) in cases
where i know i will be well within its numeric range? I want to conserve
storage space and gain speed anywhere i can, but i know some apps simply end
up casting 2byte data to 4byte (like Java int/short).

These int2 values will be used in primary and foreign key fields and I know
that i must explicitly use tick marks (ex: where int2_column = '12') in
order to make use of indexes, but my question is IS IT WORTH IT?  IS THERE
ANY REAL GAIN FOR DOING THIS?

An simple scenario would be:

Songs
-------
song_id serial pkey
genre   int2   fkey
title    varchar
...


Genres
-------
genreid int2   pkey
name      varchar
description varchar



I KNOW that I am not going to have anywhere near 32,000+ different genres in
my genre table so why use int4?  Would that squeeze a few more milliseconds
of performance out of a LARGE song table query with a genre lookup?


Thanks,

-Aaron


Re: int2 vs int4 in Postgres

From
Alvaro Herrera
Date:
On Mon, Sep 26, 2005 at 12:54:05PM -0500, Announce wrote:
> Is there an performance benefit to using int2 (instead of int4) in cases
> where i know i will be well within its numeric range? I want to conserve
> storage space and gain speed anywhere i can, but i know some apps simply end
> up casting 2byte data to 4byte (like Java int/short).
>
> These int2 values will be used in primary and foreign key fields and I know
> that i must explicitly use tick marks (ex: where int2_column = '12') in
> order to make use of indexes, but my question is IS IT WORTH IT?  IS THERE
> ANY REAL GAIN FOR DOING THIS?

> An simple scenario would be:
>
> Songs
> -------
> song_id serial pkey
> genre   int2   fkey
> title    varchar

Not in this case, because the varchar column that follows the int2
column needs 4-byte alignment, so after the int2 column there must be 2
bytes of padding.

If you had two consecutive int2 fields you would save some the space.
Or int2/bool/bool (bool has 1-byte alignment), etc.

This assumes you are in a tipical x86 environment ... in other
environments the situation may be different.

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 17.7", W 73º 14' 26.8"
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)

Re: int2 vs int4 in Postgres

From
Neil Conway
Date:
On Mon, 2005-26-09 at 12:54 -0500, Announce wrote:
> Is there an performance benefit to using int2 (instead of int4) in cases
> where i know i will be well within its numeric range?

int2 uses slightly less storage space (2 bytes rather than 4). Depending
on alignment and padding requirements, as well as the other columns in
the table, that may translate into requiring fewer disk pages and
therefore slightly better performance and lower storage requirements.

-Neil



Re: int2 vs int4 in Postgres

From
Chris Browne
Date:
truthhurts@insightbb.com ("Announce") writes:
> I KNOW that I am not going to have anywhere near 32,000+ different
> genres in my genre table so why use int4?  Would that squeeze a few
> more milliseconds of performance out of a LARGE song table query
> with a genre lookup?

By the way, I see a lot of queries on tables NOT optimized in this
fashion that run in less than a millisecond, so it would seem
remarkable to me if there were milliseconds to be squeezed out in the
first place...
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/sap.html
Why do we drive on parkways and park on driveways?

Re: int2 vs int4 in Postgres

From
Chris Browne
Date:
truthhurts@insightbb.com ("Announce") writes:
> I KNOW that I am not going to have anywhere near 32,000+ different
> genres in my genre table so why use int4?  Would that squeeze a few
> more milliseconds of performance out of a LARGE song table query
> with a genre lookup?

If the field is immaterial in terms of the size of the table, then it
won't help materially.

If you were going to index on it, however, THAT would make it
significant for indices involving the "genre" column.  Fitting more
tuples into each page is a big help, and this would help.

I doubt it'll be material, but I'd think it a good thing to apply what
restrictions to your data types that you can, a priori, so I'd be
inclined to use "int2" for this...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/nonrdbms.html
Rules of  the Evil Overlord  #136. "If I  build a bomb, I  will simply
remember which wire to cut if  it has to be deactivated and make every
wire red." <http://www.eviloverlord.com/>

Re: int2 vs int4 in Postgres

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> If the field is immaterial in terms of the size of the table, then it
> won't help materially.
> If you were going to index on it, however, THAT would make it
> significant for indices involving the "genre" column.  Fitting more
> tuples into each page is a big help, and this would help.

For a multicolumn index it might help to replace int4 by int2.  For a
single-column index, alignment constraints on the index entries will
prevent you from saving anything :-(

            regards, tom lane