We store tags on our items like this like this:
Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE
Item.ID INT NOT NULL PRIMARY KEY
ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE
with appropriate indexes on the columns we need to frequently query.
We have about 3 million tag bindings right now, and have not run into any performance issues related to tagging other than generating tag clouds (which we pre-calculate anyway).
I'll have to get back to you when we get up to 10's, or even 100's of millions and let you know how it scaled.
Bryan
On 7/18/07, lists@on-track.ca <lists@on-track.ca> wrote: I am planning to add a tags (as in the "web 2.0" thing) feature to my web
based application. I would like some feedback from the experts here on
what the best database design for that would be.
The possibilities I have come up with are:
* A tags table containing the tag and id number of what it links to.
select pid from tags where tag='bla'
select tag from tags where pid=xxx.
* a tags table where each tag exists only once, and a table with the tag
ID and picture ID to link them together.
select pid from tags inner join picture_tags using(tag_id) where tag='bla'
select tag from tags inner join picture_tags using(tag_id) where pid='xxx'
* A full text index in the picture table containing the tags
select pid from pictures where tags @@ to_tsquery('bla')
(or the non-fti version)
select pid from pictures where tags ~* '.*bla.*'
select tags from pictures where pid=xxx;
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match