Re: Questions on Tags table schema - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Questions on Tags table schema
Date
Msg-id 46ADD212.3090406@archonet.com
Whole thread Raw
In response to Re: Questions on Tags table schema  ("Jay Kang" <arrival123@gmail.com>)
List pgsql-performance
Jay Kang wrote:
>> One quick point. SQL is case-insensitive unless you double-quote
>> identifiers. This means CamelCase tend not to be used. So instead of
>> AddedBy you'd more commonly see added_by.
>
> Yes, I am aware that postgre is case-insensitive, but I write all query with
> case so its easier for me to read later on.

It's SQL that's case insensitive. Pretty much any SQL-based database
system you use will do case-folding in some way.

>> CREATE TABLE Tags
>>> (
>>>    TagID serial NOT NULL,
>>>    TagName varchar(64) NOT NULL,
>>>    AddedBy varchar(256) NOT NULL,
>> This is supposed to be a user? But it's not a foreign-key, and you've
>> decided that 255 characters will be a good length, but 257 is impossible.
>
>
> I'm developing in c# with asp.net 2.0 which as a membership provider. I'm
> using ASP.NET 2.0 Website Programming / Problem - Design - Solution" (Wrox
> Press) <http://www.amazon.com/gp/product/0764584642> as a reference, so not
> having AddedBy as a foreign key within each of the tables was taken directly
> from the text. I do not understand your comment about 255 character with 257
> being impossible? Could you elaborate, if you feel it warrants further
> elaboration.

What is AddedBy - a name, a user-id?
If it's an ID, then it seems very long.
If it's a name, then 256 characters sounds a bit arbitrary as a length.
Why choose 256?

The advantage of *not* having AddedBy as a foreign-key is that you can
delete users and not have to update tags with their user-id. The
disadvantage is the same thing. You can end up with tags added by
non-existent users.

>>    AddedDate timestamp NOT NULL,
>>
>> You probably want "timestamp with time zone" (which represents an
>> absolute time) rather than without time-zone (which means 1pm in London
>> is different from 1pm in New York).
>
> OK, timestamp with time zone it is. To be honest, I've been using postgresql
> for a while now, but never tried using timestamp with time zone.

You can get away with it as long as the time-zone setting on your client
  stays the same. Then it changes, and you're left wondering why all
your comparisons are hours out.

> Also, if it's "AddedDate" why isn't it a date?
>
> I had this first as a date, but asp.net 2.0 didn't like it, and changing it
> to a timestamp fixed the problem.

Surely asp.net has a date type? If not, I'd suggest AddedTimestamp as a
name (or AddedTS if you don't enjoy lots of typing :-). It won't matter
to you now, but 12 months from now it'll save you looking up data types.

>>    Status int NOT NULL,
>>>    ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT
>> (('0'))
>>> );
>> You might not want to mix in details about number of views with details
>> of the tag. Particularly if you might record more details later (when
>> viewed, by whom etc).
>
> Are you suggesting to separate the Tags table into Tags and TagDetails?
> Because ViewCount within Tags table would represent how many times that tag
> was clicked, I think others would call this field Popularity. I've been
> reading alot about tags and I am fascinated at all the information about
> user tags can provide. Where would I put information such as ViewCount,
> AddedBy, Status, etc if not within the Tags table? Sorry, if I'm totally
> missing your point.

Well, it all depends on your use analysis. You could make a good
argument that there are two sets of fact data:
1. "Identity" data - id, name, added-by, added-ts, status
2. "Activity" data - number of views, last time clicked etc

Depending on how you intend to use the tags, it might make sense to
separate these. Particularly if you find yourself with no sensible
values in activity data until a tag is used.

 From a separate performance-related point of view, you would expect
activity data to be updated much more often than identity data.

>> You have cars which have tags and planes which have tags. Tagging a
>> plane is not the same as tagging a car. Either you confuse that issue,
>> or you want separate tables to track each relationship.
>
> Hmm, so if I have a tag called "Saab" and a user clicks on Saab, then
> information from both Cars and Planes table would appear.

Well, if you UNION them, yes. Of course you'll need to find columns that
make sense across all types. If planes have e.g. "wingspan" then you'll
need to add 'not applicable'::text in the car-related subquery.

 > If I'm inserting a
> new row for a tag, wouldn't I need to check if that tagname already appears
> within the Tags table or would I just create a new row with that tag name.
> Sorry, I'm not sure what " 'car'::text " this is doing, but I'm guessing its
> used to group the cars, planes, etc. so it knows which item_type it is.
> Brilliant!

Yes, if you're giving a list of all "Saab"s, I'm assuming your users
will want to know if it's a plane or car. The ::text is just PostgreSQL
shorthand for a cast - it's good practice to specify precise types for
literal values.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "Jay Kang"
Date:
Subject: Re: Questions on Tags table schema
Next
From: Richard Huxton
Date:
Subject: Re: Questions on Tags table schema