How can this be optimized, if possible? - Mailing list pgsql-general

From Net Virtual Mailing Lists
Subject How can this be optimized, if possible?
Date
Msg-id 20050705044702.28859@mail.net-virtual.com
Whole thread Raw
Responses Re: How can this be optimized, if possible?
List pgsql-general
Hello,


My database has grown far faster then expected and a query which used to
run acceptably now does not.   I'm trying to figure out a way to make
this operate faster and scale better.   I'm very open to the idea that
this does not need to be done using a SQL query at all - right now I'm
really just in need of some conceptual/architectural help on this one.


So I have two tables:

         Table "category"
    Column    |       Type        | Modifiers
--------------+-------------------+-----------
 head_title   | character varying |
 cat_title    | character varying |
 subcat_title | character varying |
 category     | ltree             |



                                             Table "test"
       Column       |           Type           |
   Modifiers
--------------------+--------------------------
+-----------------------------------------------------------------
 id                 | integer                  | not null default
nextval('master.test_id_seq'::text)
 category           | ltree[]                  |


... there are other fields in the test table, but these are really the
only two relevant to this.


The query I want to run against these two tables is something like this:

SELECT
 count(*) as count,
 category.category,
 nlevel(category.category) AS level,
 subpath(category.category,0,nlevel(category.category)-1) as parent,
 category.head_title,
 category.cat_title,
 category.subcat_title
FROM
 test,
 category
WHERE
 test.category <@ category.category
GROUP BY
  category.category, category.head_title, category.cat_title,
category.subcat_title |


Many times the "WHERE" clause will contain additional search criteria on
the 'test' table.  What I am trying to get is a count of how many rows
from the test table fall into each category, being limited by the search
criteria.

This query is starting to take an enormous amount of time (30+ seconds)
and I really need the results of this in a couple seconds tops.  I can do
a "select category from test" and it completes in about .5 seconds.  The
category table currently only has 225 rows, the test table having
approximately 30,000.  "SELECT count(category,category FROM test GROUP BY
category" is quite slow and I thought of making a materialized view of
this, but then of course I don't see any way to make that result limited
by my search criteria.

I am completely open to re-architecting this entirely, performance of
this query is critical to my application - I really just am not sure
where to start.  It seems like everything I do is worse then what I
started with.

... It *almost* seems as if I need to build some sort of "search engine
like" tool which performs all the queries against the database, has tons
of memory, and cache the category attributes for each record in memory.
This sure seems like a lot of work though - I sincerely hope there is an
easier way.....

Thanks for your help, as always!

- Greg


pgsql-general by date:

Previous
From: Mike Rylander
Date:
Subject: Re: tsearch2 and case
Next
From: postgresql@bryden.co.za
Date:
Subject: Statistics and Indexes