Thread: Tricky SELECT question involving subqueries

Tricky SELECT question involving subqueries

From
"Ben Hallert"
Date:
Hi there,

I've got a database query that sounded easy at first, but I'm having a
hard time wrapping my head around how to get it to work.

Here's the situation.  I have a table that lists about 20-30 server
paths.  A program goes through and uses another tool to generate
information (as contents change) for all filespecs that start with
these paths.  For example, one entry might be:
//depot/program/src/trunk/ and the maintenance program runs hourly and
generates data about everything under that (like
//depot/program/src/trunk/file.c,
//depot/program/src/trunk/tool/otherfile.cpp).  As a result, I have
another table that's been populated with about 300,000 entries.

With this in mind, I want to write a query that will list the entries
in the first table (easy) along with a count() of how many entries in
the other table start with that path (the hard part).

The table with the 20-30 entrie list of paths:
CREATE TABLE trackedpaths
(
  path_id int8 NOT NULL,
  pathspec varchar(512),
  path_name varchar(512),
  lastupdated timestamp,
  lastchangelist int8
)

The 300K+ table.  Each 'filespec' below begins with a 'pathspec' from
the table above:
CREATE TABLE changehistory
(
  linesadded int8,
  linesdeleted int8,
  lineschanged int8,
  datestamp timestamp,
  change int8 NOT NULL,
  filespec varchar(512) NOT NULL,
  changeauthor varchar(128),
  "comment" varchar(32)
)

I tried handling this programmaticaly by having a loop that queries
each path, then does another query below of "SELECT COUNT(*) FROM
changehistory WHERE UPPER(filespec) LIKE UPPER('$pathspec%')".  Each
count query works fine, but the performance is crippling.

Any ideas on how to make Postgres do the heavy lifting?

Thanks!

Ben Hallert


Re: Tricky SELECT question involving subqueries

From
Michael Fuhr
Date:
On Thu, Sep 08, 2005 at 10:02:44AM -0700, Ben Hallert wrote:
> With this in mind, I want to write a query that will list the entries
> in the first table (easy) along with a count() of how many entries in
> the other table start with that path (the hard part).

[...]

> I tried handling this programmaticaly by having a loop that queries
> each path, then does another query below of "SELECT COUNT(*) FROM
> changehistory WHERE UPPER(filespec) LIKE UPPER('$pathspec%')".  Each
> count query works fine, but the performance is crippling.

Do you have an expression index on upper(filespec)?  That should
speed up queries such as the above.  Another possibility might
involve using contrib/ltree.  And instead of looping through each
path, you could use an inner or outer join.

CREATE TABLE trackedpaths (pathname ltree);
CREATE TABLE changehistory (filespec ltree);

INSERT INTO trackedpaths (pathname) VALUES ('abc.def');
INSERT INTO trackedpaths (pathname) VALUES ('ghi.jkl');
INSERT INTO trackedpaths (pathname) VALUES ('mno.pqr');

INSERT INTO changehistory (filespec) VALUES ('abc.def.123');
INSERT INTO changehistory (filespec) VALUES ('abc.def.123.456');
INSERT INTO changehistory (filespec) VALUES ('ghi.jkl.789');

SELECT t.pathname, count(c.*)
FROM trackedpaths AS t
LEFT OUTER JOIN changehistory AS c ON c.filespec <@ t.pathname
GROUP BY t.pathname
ORDER BY t.pathname;

 pathname | count
----------+-------
 abc.def  |     2
 ghi.jkl  |     1
 mno.pqr  |     0
(3 rows)

--
Michael Fuhr