Thread: Indexing questions: Index == key? And index vs substring - how successful?

As the title of this message suggests, I've got a couple of questions about indexing that I'm not sure about.  I've tried to take a look at the docs, but I can't remember seeing anything on these; it's quite possible, I admit, that I'm simply not remembering all of what I saw, but I would appreciate it if someone would help me to understand these.
 
1. Does an indexed column on a table have to be a potential primary key?  I've been working with a couple of rather large tables where a common select is on a foreign key called 'cntrct_id' (Varchar(9) in format).  However, the same 'cntrct_id' can appear on multiple records in the tables I'm trying to work with now; the tables themselves record events associated with the given 'cntrct_id' record and can store many events for one 'cntrct_id' value.  I'd thought that creating an index on the table.cntrct_id field for the event tables would allow me to speed up the transations some, but comparisons of time before and after the indexing lead me to wonder if I was mistaken in this.  The times were almost identical in the following areas: Before Indexing, after Indexing but before Analyzing, and after Analyzing.
 
2. Another common sort on these fields uses part, not all, of the 'cntrct_id' value to search for things; the first character marks original location in an internal framework we're using, for example, and the third character marks the month of the year that the original 'cntrct_id' record was set up.  Sorts on either of those are fairly common as well; would indexing on the cntrct_id as a whole be able to speed up a sort on a portion of it?
 
I have in mind something like this:
select * from [event table] where substring(cntrct_id, 3,1) = 'H';
which should select any event records associated with 'cntrct_id' values initally set up in August.  (Jan = A, Feb = B, etc)
 
If I established an index on the 'cntrct_id' field in the event tables, would it assist in speeding up the substring-based search, or would it not be effective at doing so?
 
Thank you for your assistance.


Ahhh...imagining that irresistible "new car" smell?
Check out new cars at Yahoo! Autos.

Re: Indexing questions: Index == key? And index vs substring - how successful?

From
Martijn van Oosterhout
Date:
On Thu, May 03, 2007 at 01:42:44PM -0700, Andrew Edson wrote:
> As the title of this message suggests, I've got a couple of questions about indexing that I'm not sure about.  I've
triedto take a look at the docs, but I can't remember seeing anything on these; it's quite possible, I admit, that I'm
simplynot remembering all of what I saw, but I would appreciate it if someone would help me to understand these. 
>
>   1. Does an indexed column on a table have to be a potential primary key?

No, that's the difference between unique and non-unique indexes.

> The times were almost identical in the following areas: Before Indexing, after Indexing but before Analyzing, and
afterAnalyzing. 

If you want reasons, you're going to need to provide EXPLAIN ANALYSE
output.

>   I have in mind something like this:
>   select * from [event table] where substring(cntrct_id, 3,1) = 'H';
>   which should select any event records associated with 'cntrct_id' values initally set up in August.  (Jan = A, Feb
=B, etc) 
>
>   If I established an index on the 'cntrct_id' field in the event
>   tables, would it assist in speeding up the substring-based search,
>   or would it not be effective at doing so?

Not directly, no. However, you can have indexes on expressions:
CREATE INDEX foo ON bar((substring(cntrct_id, 3,1)));

Which could speed up the above query (could, since it depends on
exactly how much of the table actually needs to be searched...)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment
>  1. Does an indexed column on a table have to be a potential primary
> key?

    Nope, create as many index as you need/must/should.

> I've been working with a couple of rather large tables where a common
> select is on a foreign key called 'cntrct_id' (Varchar(9) in format).
> However, the same 'cntrct_id' can appear on multiple records in the
> tables I'm trying to work with now; the tables themselves record events
> associated with the given 'cntrct_id' record and can store many events
> for one 'cntrct_id' value.  I'd thought that creating an index on the
> table.cntrct_id field for the event tables would allow me to speed up
> the transations some, but comparisons of time before and after the
> indexing lead me to wonder if I was mistaken in this.  The times were
> almost identical in the following areas: Before Indexing, after Indexing
> but before Analyzing, and after Analyzing.
>  2. Another common sort on these fields uses part, not all, of the
> 'cntrct_id' value to search for things; the first character marks
> original location in an internal framework we're using, for example, and
> the third character marks the month of the year that the original
> 'cntrct_id' record was set up.  Sorts on either of those are fairly
> common as well; would indexing on the cntrct_id as a whole be able to
> speed up a sort on a portion of it?

    Nope.
    This looks like suboptimal schema design...
    If you had an indexed date column, you would be able to make fast indexed
queries with BETWEEN, >=, <=, etc.