Thread: int4 vs varchar to store ip addr

int4 vs varchar to store ip addr

From
Pomarede Nicolas
Date:

Hello,

I have an authorization table that associates 1 customer IP to a service
IP to determine a TTL (used by a radius server).

table auth
  client varchar(15);
  service varchar(15);
  ttl int4;


client and service are both ip addr.

The number of distinct clients can be rather large (say around 4 million)
and the number of distinct service around 1000.

table auth can contain between 10 M and 20 M lines.

there's a double index on ( client , service ).

Since I would like to maximize the chance to have the whole table cached
by the OS (linux), I'd like to reduce the size of the table by replacing
the varchar by another data type more suited to store ip addr.

I could use PG internal inet/cidr type to store the ip addrs, which would
take 12 bytes per IP, thus gaining a few bytes per row.

Apart from gaining some bytes, would the btree index scan be faster with
this data type compared to plain varchar ?


Also, in my case, I don't need the mask provided by inet/cidr ; is there a
way to store an IPV4 addr directly into an INT4 but using the same syntax
as varchar or inet/cidr (that is I could use '192.12.18.1' for example),
or should I create my own data type and develop the corresponding function
to convert from a text input to an int4 storage ?

This would really reduce the size of the table, since it would need 3 int4
for client/service/ttl and I guess index scan would be faster with int4
data that with varchar(15) ?

Thanks for any input.


Nicolas

Re: int4 vs varchar to store ip addr

From
Florian Weimer
Date:
* Pomarede Nicolas:

> I could use PG internal inet/cidr type to store the ip addrs, which
> would take 12 bytes per IP, thus gaining a few bytes per row.

I thought it's down to 8 bytes in PostgreSQL 8.2, but I could be
mistaken.

> Apart from gaining some bytes, would the btree index scan be faster
> with this data type compared to plain varchar ?

It will be faster because less I/O is involved.

For purposes like yours, there is a special ip4 type in a contributed
package which brings down the byte count to 4.  I'm not sure if it's
been ported to PostgreSQL 8.2 yet.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: int4 vs varchar to store ip addr

From
Cosimo Streppone
Date:
Nicolas wrote:

> I have an authorization table that associates 1 customer IP to a service
> IP to determine a TTL (used by a radius server).
>
> table auth
>  client varchar(15);
>  service varchar(15);
>  ttl int4;
> client and service are both ip addr.
>
> The number of distinct clients can be rather large (say around 4
> million) and the number of distinct service around 1000.
>
> there's a double index on ( client , service ).

It comes to mind another solution... I don't know if it is better or worse,
but you could give it a try.
Store IP addresses as 4 distinct columns, like the following:

     CREATE TABLE auth (
         client_ip1 shortint,
         client_ip2 shortint,
         client_ip3 shortint,
         client_ip4 shortint,
         service    varchar(15),
         ttl        int4,
     );

And then index by client_ip4/3/2/1, then service.

     CREATE INDEX auth_i1 ON auth (client_ip4, client_ip3, client_ip2, client_ip1);

or:

     CREATE INDEX auth_i1 ON auth (client_ip4, client_ip3, client_ip2, client_ip1, service);

I'm curious to know from pg internals experts if this could be a
valid idea or is totally non-sense.

Probably the builtin ip4 type is better suited for these tasks?

--
Cosimo

Re: int4 vs varchar to store ip addr

From
Pomarede Nicolas
Date:
On Mon, 29 Jan 2007, Florian Weimer wrote:

> * Pomarede Nicolas:
>
>> I could use PG internal inet/cidr type to store the ip addrs, which
>> would take 12 bytes per IP, thus gaining a few bytes per row.
>
> I thought it's down to 8 bytes in PostgreSQL 8.2, but I could be
> mistaken.
>
>> Apart from gaining some bytes, would the btree index scan be faster
>> with this data type compared to plain varchar ?
>
> It will be faster because less I/O is involved.
>
> For purposes like yours, there is a special ip4 type in a contributed
> package which brings down the byte count to 4.  I'm not sure if it's
> been ported to PostgreSQL 8.2 yet.

Yes thanks for this reference, ip4r package seems to be a nice addition to
postgres for what I'd like to do. Does someone here have some real life
experience with it (regarding performance and stability) ?

Also, is it possible that this package functionalities' might be merged
into postgres one day, I think the benefit of using 4 bytes to store an
ipv4 addr could be really interesting for some case ?

thanks,


----------------
Nicolas Pomarede                   e-mail:   npomarede@corp.free.fr

"In a world without walls and fences, who needs windows and gates ?"

Re: int4 vs varchar to store ip addr

From
"Michael Artz"
Date:
On 1/30/07, Pomarede Nicolas <npomarede@corp.free.fr> wrote:
> On Mon, 29 Jan 2007, Florian Weimer wrote:
>
> > * Pomarede Nicolas:
> >
> >> I could use PG internal inet/cidr type to store the ip addrs, which
> >> would take 12 bytes per IP, thus gaining a few bytes per row.
> >
> > I thought it's down to 8 bytes in PostgreSQL 8.2, but I could be
> > mistaken.
> >
> >> Apart from gaining some bytes, would the btree index scan be faster
> >> with this data type compared to plain varchar ?
> >
> > It will be faster because less I/O is involved.
> >
> > For purposes like yours, there is a special ip4 type in a contributed
> > package which brings down the byte count to 4.  I'm not sure if it's
> > been ported to PostgreSQL 8.2 yet.
>
> Yes thanks for this reference, ip4r package seems to be a nice addition to
> postgres for what I'd like to do. Does someone here have some real life
> experience with it (regarding performance and stability) ?

I'm using IP4 and have not had a problem with it in 8.2 (or 8.1) in
terms of stability.  As I designed my DB using it, I don't really have
any comparisons to inet and/or varchar.  One of the most useful things
for me is the ability to create a GIST index to support determination
of range inclusion (i.e. 192.168.23.1 is in the 192.168/16 network
range), although it doesn't sound like this would be useful to you.

-Mike