Re: Question about ltree.... - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Question about ltree.... |
Date | |
Msg-id | Pine.GSO.4.61.0410310727480.24492@ra.sai.msu.su Whole thread Raw |
In response to | Question about ltree.... ("Greg Saylor" <gsaylor@integrated-support.com>) |
List | pgsql-general |
Greg, I don't understand your idea ! Why do you need three Ltrees ? Oleg On Tue, 26 Oct 2004, Greg Saylor wrote: > Hello, > > First let me thank Mike for his earlier suggestion that I consider using > ltree - it really is working very nicely!... > > I do have a question about performance/design though... > > I have a table like this: > > CREATE TABLE sometable ( > id SERIAL, > category0 LTREE, > category1 LTREE, > category2 LTREE > ); > > CREATE INDEX sometable_category0 ON sometable USING gist(category0) WHERE > category0 IS NOT NULL; > CREATE INDEX sometable_category1 ON sometable USING gist(category1) WHERE > category1 IS NOT NULL; > CREATE INDEX sometable_category2 ON sometable USING gist(category2) WHERE > category2 IS NOT NULL; > > > ... The idea is that every row in sometable can be assigned to 1, 2, or 3 > categories of a users choosing.... > > I essentially have categories like this: > a > a.a > a.a.a > a.a.a.a > a.a.a.b > a.a.a.c > a.a.b > a.a.b.a > a.a.b.b > > > ... so the categories are essentially 3 levels deep (not counting the top > level) - in my test case I have about 4300 rows of data in this table. > > Unfortunately, queries against this table can be a bit inefficient, for > example, if I want to get a count of all of items within a category I > have do something like: > > > SELECT count(*) AS count FROM > (SELECT category0 AS category FROM sometable WHERE category0 IS NOT NULL > UNION SELECT category1 AS category FROM sometable WHERE category1 > IS NOT NULL > UNION SELECT category2 AS category FROM sometable WHERE category2 > IS NOT NULL) AS b > WHERE b.category <@ 'a.b'; > > > > If I want to retrieve rows which fit into one or more of the categories, > I have to do something like: > > SELECT * FROM sometable WHERE category0 <@ 'a.b' OR category1 <@ 'a.b' OR > category2 <@ 'a.c'; > > > .. What is interesting is that this does a sequential scan: > > SELECT * FROM sometable WHERE category0 <@ 'a.b'; > > .. This on the other hand does an index scan, despite the fact that every > row IS NOT NULL and it is about 10 times faster: > > SELECT * FROM sometable WHERE category0 <@ 'a.b' AND category0 IS NOT NULL; > > .. But, as soon as I add a second OR clause the planner decides to start > doing sequential scans again, despite the fact all category1 columns are NULL: > > SELECT * FROM sometable WHERE (category0 <@ 'a.b' AND category0 IS NOT > NULL) OR (category1 <@ 'a.b' AND category1 IS NOT NULL); > > > > > > Is there anything which jumps out as being something that I could do to > improve the schema design or the performance of the two queries I > identified above - as these get executed very often any performance > improvement would dramatically boost performance... > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: