Hi there,
I've got a database query that sounded easy at first, but I'm having a
hard time wrapping my head around how to get it to work.
Here's the situation. I have a table that lists about 20-30 server
paths. A program goes through and uses another tool to generate
information (as contents change) for all filespecs that start with
these paths. For example, one entry might be:
//depot/program/src/trunk/ and the maintenance program runs hourly and
generates data about everything under that (like
//depot/program/src/trunk/file.c,
//depot/program/src/trunk/tool/otherfile.cpp). As a result, I have
another table that's been populated with about 300,000 entries.
With this in mind, I want to write a query that will list the entries
in the first table (easy) along with a count() of how many entries in
the other table start with that path (the hard part).
The table with the 20-30 entrie list of paths:
CREATE TABLE trackedpaths
(
path_id int8 NOT NULL,
pathspec varchar(512),
path_name varchar(512),
lastupdated timestamp,
lastchangelist int8
)
The 300K+ table. Each 'filespec' below begins with a 'pathspec' from
the table above:
CREATE TABLE changehistory
(
linesadded int8,
linesdeleted int8,
lineschanged int8,
datestamp timestamp,
change int8 NOT NULL,
filespec varchar(512) NOT NULL,
changeauthor varchar(128),
"comment" varchar(32)
)
I tried handling this programmaticaly by having a loop that queries
each path, then does another query below of "SELECT COUNT(*) FROM
changehistory WHERE UPPER(filespec) LIKE UPPER('$pathspec%')". Each
count query works fine, but the performance is crippling.
Any ideas on how to make Postgres do the heavy lifting?
Thanks!
Ben Hallert