Re: Pros / cons for indexing a small table - Mailing list pgsql-novice
From | Josh Kupershmidt |
---|---|
Subject | Re: Pros / cons for indexing a small table |
Date | |
Msg-id | 4ec1cf760911101141t6ebdb6a6jb040afd018964fe0@mail.gmail.com Whole thread Raw |
In response to | Pros / cons for indexing a small table (Rikard Bosnjakovic <rikard.bosnjakovic@gmail.com>) |
List | pgsql-novice |
On Tue, Nov 10, 2009 at 6:25 AM, Rikard Bosnjakovic <rikard.bosnjakovic@gmail.com> wrote: > I'm having a simple table for managing hit counts on WWW. The table > was setup some year ago using this: > > CREATE TABLE hit_counts ( > page varchar(255), > hits integer, > last_visit timestamp, > id serial primary key > ); > > When updating a hit, I use this PHP-code: > > $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'"; > $res = pg_query($db, $q); > if (pg_num_rows($res) == 0) { > $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')"; > } else { > $row = pg_fetch_object($res); > $hits = $row->hits; > $hits++; > $q = "UPDATE hit_counters SET last_visit = (SELECT > localtimestamp(0)), hits = $hits WHERE page='$this_uri'"; > } > pg_query($db, $q); > [snip] > > 1. Since the id-column is never used, I will probably drop it to save > some space. Or, is there any point in keeping it? Drop it, and make column "page" your PRIMARY KEY. The way you have that table declared now, you're implicitly using "page" as your primary key anyway (i.e. lookups are done against "page", and your code assumes page is unique), but you're not actually enforcing a uniqueness constraint. In case other tables have FK references to "hit_counts"."id", google for "keyvil" to understand the pitfalls of using this surrogate key. > > 2. From what I've read in the manual and what other people have told > me, creating an index can speed up table accessing. I tried by adding > an index to the page-column in the above table. Then by using ANALYZE > and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a > pretty big save although I'm not sure what the value really is. Is > creating an index on 'page' the best way to speed up this particular > table? > A unique index or primary key on page will help enforce database integrity and probably speed up your queries as well. > 3. The page-column is queried using full length strings (no regexps or > wildcards). Will there be any performance issues if I have the > page-column indexed and then run wildcard-queries on it? What I mean > is, if I run wildcard-queries on this indexed column, will the index > be of any use at all? > See here, this should answer your question: http://www.postgresql.org/docs/8.3/interactive/indexes-types.html Some additional comments on your schema and application code: First, you have a race condition in your code, assuming you are using the default READ COMMITTED isolation level. What if another client inserts a row in between when you check for the row's existence, and insert if the row doesn't exist? SELECT FOR UPDATE is probably what you want to use. Second, I really hope that $this_uri is being properly escaped before your queries are executed... Third, I recommend using "timestamp with time zone" instead of your timestamp type for last_visit -- and then just use CURRENT_TIMESTAMP instead of calling localtimestamp(0). Your table right now looks like it'll have problems during daylight savings switches. Josh
pgsql-novice by date: