Thread: indexes on varchar fields

indexes on varchar fields

From
Peter Nixon
Date:
Hi guys

I have a large database (actually a bunch of large databases) that a
running as the backends of radius servers running voip accounting.

The databases are getting quite large now and some queries are taking >
5min to return. Now I read somewhere that indexing varchar fields is not
particularly worth it especially if you are only doing queries
occasionally. I am doing 200-500 inserts per min but only a query run on
this particular field when I want to compare my flat file logs against
whats in the DB to see if they match for billing purposes.

Can someone inform me as to whether an index on the following field:

   h323ConfID varchar(64) DEFAULT '' NOT NULL

Which contains data resembling the following:

   41A4DCE3 8CF2D611 85170004 75AE73D4

which is most likely unique, but not guaranteed so, is going to be worth
it?

--

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

Re: indexes on varchar fields

From
"Nigel J. Andrews"
Date:
On Fri, 8 Nov 2002, Peter Nixon wrote:

> Hi guys
>
> I have a large database (actually a bunch of large databases) that a
> running as the backends of radius servers running voip accounting.
>
> The databases are getting quite large now and some queries are taking >
> 5min to return. Now I read somewhere that indexing varchar fields is not
> particularly worth it especially if you are only doing queries
> occasionally. I am doing 200-500 inserts per min but only a query run on
> this particular field when I want to compare my flat file logs against
> whats in the DB to see if they match for billing purposes.
>
> Can someone inform me as to whether an index on the following field:
>
>    h323ConfID varchar(64) DEFAULT '' NOT NULL
>
> Which contains data resembling the following:
>
>    41A4DCE3 8CF2D611 85170004 75AE73D4
>
> which is most likely unique, but not guaranteed so, is going to be worth
> it?

Well I would say that on the face of it it's a good index to keep. Without it
you are going to have to do a seqscan over the entire table to select on that
field. Even if there is only one select per 1000's of inserts you're going to
find that select will be painfully long on a large table, and presumably just
get longer as time goes on.

However, do you know that the index is actually being used in the query? Try
EXPLAINing the query in psql. If not may be that is why the query is taking so
long. If it is then dropping the index may well make it worse.

The idea of dropping an index for many insert/update per select is to avoid the
work of managing the index for all those changes of data speeding up that side
of things.

One thing that does jump out at me is that those examples you give look like
hexadecimal representation. Is this the case? I see you've imposed a limit of
64 characters on the field so it could be a very large number if so. However,
as given, those data items look ideal to be stored as integers which I think
may improve your searching speed a little.


--
Nigel J. Andrews


Re: indexes on varchar fields

From
Peter Nixon
Date:
On Fri, 8 Nov 2002 12:03:15 +0000 (UTC)
Squire "Nigel J. Andrews" uttered the following:

> On Fri, 8 Nov 2002, Peter Nixon wrote:
>
> > Hi guys
> >
> > I have a large database (actually a bunch of large databases) that a
> > running as the backends of radius servers running voip accounting.
> >
> > The databases are getting quite large now and some queries are taking >
> > 5min to return. Now I read somewhere that indexing varchar fields is
> > not particularly worth it especially if you are only doing queries
> > occasionally. I am doing 200-500 inserts per min but only a query run
> > on this particular field when I want to compare my flat file logs
> > against whats in the DB to see if they match for billing purposes.
> >
> > Can someone inform me as to whether an index on the following field:
> >
> >    h323ConfID varchar(64) DEFAULT '' NOT NULL
> >
> > Which contains data resembling the following:
> >
> >    41A4DCE3 8CF2D611 85170004 75AE73D4
> >
> > which is most likely unique, but not guaranteed so, is going to be
> > worth it?
>
> Well I would say that on the face of it it's a good index to keep.
> Without it you are going to have to do a seqscan over the entire table to
> select on that field. Even if there is only one select per 1000's of
> inserts you're going to find that select will be painfully long on a
> large table, and presumably just get longer as time goes on.

Yes. That is the case.

> However, do you know that the index is actually being used in the query?
> Try EXPLAINing the query in psql. If not may be that is why the query is
> taking so long. If it is then dropping the index may well make it worse.
>
> The idea of dropping an index for many insert/update per select is to
> avoid the work of managing the index for all those changes of data
> speeding up that side of things.
>
> One thing that does jump out at me is that those examples you give look
> like hexadecimal representation. Is this the case? I see you've imposed a
> limit of 64 characters on the field so it could be a very large number if
> so. However, as given, those data items look ideal to be stored as
> integers which I think may improve your searching speed a little.

Well, they are generated by cisco routers. They are the conference id for
each voip call and "should" be unique (at least to each router).
This (41A4DCE3 8CF2D611 85170004 75AE73D4) is the format they come out of
the router in, and I am just inserting the data directly to a text field.
You are right, they do seem to be hex numbers. Can postgres accept hex as a
numberic? if so that would speed things up ALOT...

--

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

Re: indexes on varchar fields

From
"Nigel J. Andrews"
Date:
On Fri, 8 Nov 2002, Peter Nixon wrote:

> On Fri, 8 Nov 2002 12:03:15 +0000 (UTC)
> Squire "Nigel J. Andrews" uttered the following:
>
> Well, they are generated by cisco routers. They are the conference id for
> each voip call and "should" be unique (at least to each router).
> This (41A4DCE3 8CF2D611 85170004 75AE73D4) is the format they come out of
> the router in, and I am just inserting the data directly to a text field.
> You are right, they do seem to be hex numbers. Can postgres accept hex as a
> numberic? if so that would speed things up ALOT...

I'm sure it must be possible but I can't see anything at the moment. So how
about:


create function hex_to_int (text) returns integer as '
    return hex($_[0]);
' language 'plpgsql';

create function int_to_hex (integer) returns text as '
    return sprintf("%X",$_[0]);
' language 'plpgsql';

Used as:

insert int mytable values (hex_to_int('AB456F56'));

and

select int_to_hex(colname) from mytable;

You'd need to be confident that you weren't going to exceed your native integer
size in perl though.


--
Nigel J. Andrews


Re: indexes on varchar fields

From
Peter Nixon
Date:
On Fri, 8 Nov 2002 03:13 pm, Nigel J. Andrews wrote:
> On Fri, 8 Nov 2002, Peter Nixon wrote:
> > On Fri, 8 Nov 2002 12:03:15 +0000 (UTC)
> > Squire "Nigel J. Andrews" uttered the following:
> >
> > Well, they are generated by cisco routers. They are the conference id for
> > each voip call and "should" be unique (at least to each router).
> > This (41A4DCE3 8CF2D611 85170004 75AE73D4) is the format they come out of
> > the router in, and I am just inserting the data directly to a text field.
> > You are right, they do seem to be hex numbers. Can postgres accept hex as
> > a numberic? if so that would speed things up ALOT...
>
> I'm sure it must be possible but I can't see anything at the moment. So how
> about:
>
>
> create function hex_to_int (text) returns integer as '
>     return hex($_[0]);
> ' language 'plpgsql';
>
> create function int_to_hex (integer) returns text as '
>     return sprintf("%X",$_[0]);
> ' language 'plpgsql';
>
> Used as:
>
> insert int mytable values (hex_to_int('AB456F56'));
>
> and
>
> select int_to_hex(colname) from mytable;
>
> You'd need to be confident that you weren't going to exceed your native
> integer size in perl though.

radius=# create function hex_to_int (text) returns integer as '
radius'#         return hex($_[0]);
radius'# ' language 'plperl';
CREATE FUNCTION
radius=# select hex_to_int('AB456F56');
ERROR:  pg_atoi: error reading "2873454422": Numerical result out of range

Thanks for the idea though. I will investigate this further.

Also, can anyone explain to me why the following happens?

radius=# create function testperltest2 (text) returns text as '
radius'#         my $datetime = $_[0];
radius'#         return $datetime;
radius'#         # return $_[0];
radius'# ' language 'plperl';
CREATE FUNCTION
radius=# select testperltest2('.16:46:02.356 EET Wed Dec 11 2002');
           testperltest2
-----------------------------------
 .16:46:02.356 EET Wed Dec 11 2002
(1 row)

OK. Fine. My test function returns correctly..

radius=# create function testperltest3 (text) returns text as '
radius'#         my $datetime = $_[0];
radius'#         # Remove . from the start of time fields (routers that have
lost ntp timesync)
radius'#         $datetime =~ s/^\.*//;
radius'#         return $datetime;
radius'# ' language 'plperl';
CREATE FUNCTION
radius=# select testperltest3('.16:46:02.356 EET Wed Dec 11 2002');
 testperltest3
---------------

(1 row)

Do you have any idea why that regexp would be returning nothing? It works fine
in standard perl...


--

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc


Re: indexes on varchar fields

From
"Peter Gibbs"
Date:
Peter Nixon wrote:

> radius=# create function testperltest3 (text) returns text as '
> radius'#         my $datetime = $_[0];
> radius'#         # Remove . from the start of time fields (routers that
have
> lost ntp timesync)
> radius'#         $datetime =~ s/^\.*//;
> radius'#         return $datetime;
> radius'# ' language 'plperl';
> CREATE FUNCTION
> radius=# select testperltest3('.16:46:02.356 EET Wed Dec 11 2002');
>  testperltest3
> ---------------
>
> (1 row)

The backslash is being treated as an escape character by Postgres when the
function is created, and hence is not present in the actual function code.
Try doubling it.
--
Peter Gibbs
EmKel Systems


Re: indexes on varchar fields

From
Tom Lane
Date:
Peter Nixon <listuser@peternixon.net> writes:
> radius=# create function testperltest3 (text) returns text as '
> radius'#         my $datetime = $_[0];
> radius'#         # Remove . from the start of time fields (routers that have
> lost ntp timesync)
> radius'#         $datetime =~ s/^\.*//;
> radius'#         return $datetime;
> radius'# ' language 'plperl';
> CREATE FUNCTION

You need another backslash --- the string-literal parser is eating that
one, so Perl sees s/^.*//.

            regards, tom lane