Re: Questions on Tags table schema - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Questions on Tags table schema
Date
Msg-id 46ADFC97.1040607@archonet.com
Whole thread Raw
In response to Questions on Tags table schema  ("Jay Kang" <arrival123@gmail.com>)
List pgsql-performance
Jay Kang wrote:
> Hey Richard,
>
> Sorry for the late reply, I was just making my first test version of the DB
> closely resembling you suggested design. Just wanted to write you back
> answering your questions. So here we go:

No problem - it's email and what with different timezones it's common to
have gaps.

>> What is AddedBy - a name, a user-id?
>> If it's an ID, then it seems very long.
>> If it's a name, then 256 characters sounds a bit arbitrary as a length.
>> Why choose 256?
>
> No, AddedBy is the username of the individual. Why is 256 characters
> arbitrary as a length? Would 255 be better or 32? I guess, your saying its
> too long, shorten it, I'm just going with what the books says, but I really
> welcome any comments you have^^

The book doesn't know what you're trying to do. You do. The important
thing is not whether you choose 256 or 32 or 100, it's that you've
thought about it first.

Obvious thoughts:
1. Is this going to be a real name "Richard Huxton" or an identifier
"rhuxton123"? You'll want more characters for the real name than an
identifier.
2. Where will this be displayed and will it take up too much space? If I
pick a username of WWW...250 repeats...WWW does that mess up any formatting?
3. Do we allow HTML-unsafe characters ('<', '&') and escape them when
used, or just not allow them?

No wrong or right, the process of thinking about it is important.

> The advantage of *not* having AddedBy as a foreign-key is that you can
>> delete users and not have to update tags with their user-id. The
>> disadvantage is the same thing. You can end up with tags added by
>> non-existent users.
>
> Thanks, I would like anonymous users to be able to add tags, so I guess I'll
> leave it the way it is^^

You would normally use NULL to indicate "unknown", which in the case of
an anonymous user would be true. A NULL foreign-key is allowed (unless
you define the column not-null of course).

[snip]
>> Well, it all depends on your use analysis. You could make a good
>> argument that there are two sets of fact data:
>> 1. "Identity" data - id, name, added-by, added-ts, status
>> 2. "Activity" data - number of views, last time clicked etc
>
> If I were to create Identity and Activity for the Tags table, would I be
> creating two separate tables called TagActivities and TagIdentities?

That's what I'm talking about, and you'd have a foreign-key constraint
to make sure activity refers to a real tag identity. Again, I'm not
saying you *do* want to do this, just that you'll need to think about it.

> Currently, I'm not sure how I'll analysis the data for Tags. I know that I
> want to do the bigger font if it is popular and smaller font if its not.
> Hmm, would like to see examples of other websites that utilized Tags tables
> to see how they implemented this function. I was thinking of adding tagcount
> (popularity) for each user within the user definition table.

For this particular case, you'll almost certainly want to cache the
results anyway. The popularity isn't going to change that fast, and
presumably you'll only want to categorise them as VERY BIG, Big, normal
etc. I assume asp.net allows you to cache this sort of information somehow.

>>>> You have cars which have tags and planes which have tags. Tagging a
>>>> plane is not the same as tagging a car. Either you confuse that issue,
>>>> or you want separate tables to track each relationship.
>>> Hmm, so if I have a tag called "Saab" and a user clicks on Saab, then
>>> information from both Cars and Planes table would appear.
>> Well, if you UNION them, yes. Of course you'll need to find columns that
>> make sense across all types. If planes have e.g. "wingspan" then you'll
>> need to add 'not applicable'::text in the car-related subquery.
>
> Hmm, currently I can't visualize the query, again, it would help if I can
> see the data to see what you mean. If planes table had a tag called
> wingspan, wouldn't the query just not show any value for the field so it
> wouldn't need 'not applicable' in the car-related subquery? Not sure really.

Sorry - I'm trying to say that if you UNION together several queries
they all need to have the same columns. So - if one subquery doesn't
have that column you'll need to provide a "not applicable" value instead.

Best of luck with the application, and don't forget to cache query
results when they don't change often. It'll boost performance quite a bit.

P.S. - try the "general" mailing list if you want to discuss this sort
of thing some more. This one is really supposed to be
performance-related questions only.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Questions on Tags table schema
Next
From: "Steven Flatt"
Date:
Subject: Re: Vacuum looping?