Re: Optmal tags design? - Mailing list pgsql-performance

From Bryan Murphy
Subject Re: Optmal tags design?
Date
Msg-id bd8531800707181454xb968a48ya3c0efc160d104a4@mail.gmail.com
Whole thread Raw
In response to Optmal tags design?  (lists@on-track.ca)
List pgsql-performance
We store tags on our items like this like this:

Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE

Item.ID INT NOT NULL PRIMARY KEY

ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE

with appropriate indexes on the columns we need to frequently query.

We have about 3 million tag bindings right now, and have not run into any performance issues related to tagging other than generating tag clouds (which we pre-calculate anyway).

I'll have to get back to you when we get up to 10's, or even 100's of millions and let you know how it scaled.

Bryan

On 7/18/07, lists@on-track.ca <lists@on-track.ca> wrote:
I am planning to add a tags (as in the "web 2.0" thing) feature to my web
based application. I would like some feedback from the experts here on
what the best database design for that would be.

The possibilities I have come up with are:
* A tags table containing the tag and id number of what it links to.
select pid from tags where tag='bla'
select tag from tags where pid=xxx.

* a tags table where each tag exists only once, and a table with the tag
ID and picture ID to link them together.

select pid from tags inner join picture_tags using(tag_id) where tag='bla'
select tag from tags inner join picture_tags using(tag_id) where pid='xxx'

* A full text index in the picture table containing the tags

select pid from pictures where tags @@ to_tsquery('bla')
(or the non-fti version)
select pid from pictures where tags ~* '.*bla.*'

select tags from pictures where pid=xxx;

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

pgsql-performance by date:

Previous
From: Mark Lewis
Date:
Subject: Re: Optmal tags design?
Next
From: "Kalle Hallivuori"
Date:
Subject: Re: improvement suggestions for performance design