Thread: Function to Pivot data
I'm wondering if it is worth it to normalize data or not. One problem with normalizing is that the data may require pivoting. Is there a pivot funtion available for postgresql? If normalized I'll have a book table, book_author table and an author table. I would want to produce a table the authors in columns instead of rows which would require pivoting. Ellen ----
Andrew, I know normalizing the database is the best for accessing and querying, it's the maintainance and reporting that is my major concern. The reporting will be done in PHP to the web. The PHP code will be written by a student and the desire is to keep the code and database easy to understand, easy to learn and easy to maintain. Actually, this database is just a single table right now. The structure is: Title Source Date LastNameAuthor1 FirstNameAuthor1 LastNameAuthor2 FirstNameAuthor2 LastNameAuthor3 FirstNameAuthor3 Subject1 Subject2 Subject3 Subject4 Department A few of the tables in a normalized database would be: Author: AuthorID, LastName, FirstName, DepartmentID Author_Book: AuthorID, BookID Book: BookID, Title, Date I would want to be able to produce a report that contained the following row structure: Title, Date, Author1, Author2, Author3, Author4 Where Author1, Author2, etc. are FirstName + LastName. Thanks for the help. Ellen ----- At 10:49 AM 01/31/2002 -0500, you wrote: >On Thu, Jan 31, 2002 at 09:17:35AM -0500, Ellen Cyran wrote: >> I'm wondering if it is worth it to normalize data or not. >> One problem with normalizing is that the data may require >> pivoting. Is there a pivot funtion available for postgresql? > >Postgres (and SQL) doesn't really have the idea of pivot tables. >Really, what you are talking about is a data _presentation_ problem, >and not a data _storage_ problem. > >The reason to normalise is that it gets you the most flexible data >store. If you have denormalised data, you find yourself tripping >over the poor separation of the conceptual pieces. > >Tools that offer "pivot tables" are not really _databases_ (in that >function), but report generators. It's important to separate these >things conceptually, so that you don't mess up your data storage with >limiting considerations from what you want to do with it right now. > >You haven't offered an outline of the database schema or anything, >here, so I can't suggest how you might go about getting the output >you want. But a quick bit of work in Perl might help. > >A > >-- >---- >Andrew Sullivan 87 Mowat Avenue >Liberty RMS Toronto, Ontario Canada ><andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 > >
Can anyone suggest tools for pivoting that work well in a postgres enviroment (and in Unix)? Sorry if this is a bit off topic, Jim Martinez > >Tools that offer "pivot tables" are not really _databases_ (in that > >function), but report generators. It's important to separate these > >things conceptually, so that you don't mess up your data storage with > >limiting considerations from what you want to do with it right now. > >
On Thu, Jan 31, 2002 at 11:43:37AM -0500, Ellen Cyran wrote: > > A few of the tables in a normalized database would be: > Author: > AuthorID, LastName, FirstName, DepartmentID > Author_Book: > AuthorID, BookID Seems to me like the author_book table will need a field which indicates "first author", "second author", &c. You can't just sort alphabetically, because that might not be the correct precedence. So you need something like authorno (probably NOT NULL DEFAULT 1, but you'll have to do some extra work to make sure that you never have a book with more than one 1st author, 2d author, &c.). > Title, Date, Author1, Author2, Author3, Author4 > > Where Author1, Author2, etc. are FirstName + LastName. I'm not an expert in designing this sort of thing, and someone is going to choke when s/he sees what a horribly inefficient way this works (if I gave it more thought, I could probably come up with a better answer). Still, this would work in case you have a known number of authors for every book: SELECT a.title,b.firstname||' '||b.lastname AS auth1, c.firstname||' '||c.lastname AS auth2, a.pubdate FROM books AS a, author AS b, author AS c, author_books AS d, author_books AS e WHERE a.id=d.book_id AND b.id=d.auth_id AND d.auth_rank=1 AND c.id=e.auth_id AND e.auth_rank = 2; I doubt you'll have that case, though, and you'd have to add some LEFT JOINs to the mix. For any amount of data at all, you'll have a performance problem. But I wonder if the difficulty might be because you're trying to normalise a simple one-to-many relation, and you actually have a one-to-many relation which has order in the "many" side. That's a different problem, really, and probably needs something like a unified book-author table with the book information in it: CREATE TABLE book (title text, pubdate date, author1 int4, author2 int4, author3 int4 . . . The trouble in this case is that you'll be limited to some maximim number of authors. (This is one problem the MARC cataloguing standard gets around. But I think that's rather more complicated that you want.) The author info could still be made separate, and references could be put in the book table so that if the author's info changed, you could change it for every book entry in one go. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
I've run the SQL statement below and it doesn't give me what I thought and even gives me some incorrect data. Any idea why? Here's my version of the statement: SELECT a.title, b.Author AS auth1, c.author AS auth2 FROM book AS a, tbl_author AS b, tbl_author AS c, author_book AS d, author_book AS e WHERE a.bookID=d.bookID And b.authorID=d.authorid And d.auth_rank=1 And c.authorID=e.authorid And e.auth_rank=2; Here's the tables: tbl_author: author authorid brown 1 jones 2 smith 3 Author_Book: bookid authorid auth_rank 2 1 1 1 2 1 2 2 2 3 2 1 3 3 2 1 3 2 title bookid book1 1 book2 2 book3 3 Here's the output: title auth1 auth2 book2 brown jones book2 brown smith book2 brown smith book1 jones jones book1 jones smith book1 jones smith book3 jones jones book3 jones smith book3 jones smith Book2 should only be brown and jones not brown and smith. Also, is there a way to eliminate the jones and jones for book1 and book3? I can easily get rid of the duplicates using distinct so that shouldn't be a problem. Ellen ----- At 02:49 PM 01/31/2002 -0500, Andrew Sullivan wrote: >On Thu, Jan 31, 2002 at 11:43:37AM -0500, Ellen Cyran wrote: >> >> A few of the tables in a normalized database would be: >> Author: >> AuthorID, LastName, FirstName, DepartmentID > >> Author_Book: >> AuthorID, BookID > > >Seems to me like the author_book table will need a field which >indicates "first author", "second author", &c. You can't just sort >alphabetically, because that might not be the correct precedence. So >you need something like authorno (probably NOT NULL DEFAULT 1, but >you'll have to do some extra work to make sure that you never have a >book with more than one 1st author, 2d author, &c.). > >> Title, Date, Author1, Author2, Author3, Author4 >> >> Where Author1, Author2, etc. are FirstName + LastName. > >I'm not an expert in designing this sort of thing, and someone is >going to choke when s/he sees what a horribly inefficient way this >works (if I gave it more thought, I could probably come up with a >better answer). Still, this would work in case you have a known >number of authors for every book: > >SELECT a.title,b.firstname||' '||b.lastname AS auth1, c.firstname||' >'||c.lastname AS auth2, a.pubdate FROM books AS a, author AS b, >author AS c, author_books AS d, author_books AS e WHERE >a.id=d.book_id AND b.id=d.auth_id AND d.auth_rank=1 AND >c.id=e.auth_id AND e.auth_rank = 2; > >I doubt you'll have that case, though, and you'd have to add some >LEFT JOINs to the mix. For any amount of data at all, you'll have a >performance problem. > >But I wonder if the difficulty might be because you're trying to >normalise a simple one-to-many relation, and you actually have a >one-to-many relation which has order in the "many" side. That's a >different problem, really, and probably needs something like a >unified book-author table with the book information in it: > >CREATE TABLE book (title text, > pubdate date, > author1 int4, > author2 int4, > author3 int4 . . . > >The trouble in this case is that you'll be limited to some maximim >number of authors. (This is one problem the MARC cataloguing >standard gets around. But I think that's rather more complicated >that you want.) > >The author info could still be made separate, and references could be >put in the book table so that if the author's info changed, you could >change it for every book entry in one go. > >A > >-- >---- >Andrew Sullivan 87 Mowat Avenue >Liberty RMS Toronto, Ontario Canada ><andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
On Thu, Jan 31, 2002 at 03:51:09PM -0500, Ellen Cyran wrote: > I've run the SQL statement below and it doesn't give me > what I thought and even gives me some incorrect data. > Any idea why? You're getting the Cartesian product. No, I don't know why, but it looks like the query I suggested doesn't work well. Someone else (who is better than I am) will likely see it. A ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Thu, 31 Jan 2002, Ellen Cyran wrote: > I've run the SQL statement below and it doesn't give me > what I thought and even gives me some incorrect data. > Any idea why? > > Here's my version of the statement: > > SELECT a.title, b.Author AS auth1, c.author AS auth2 > FROM book AS a, tbl_author AS b, tbl_author AS c, author_book AS d, > author_book AS e > WHERE a.bookID=d.bookID And b.authorID=d.authorid And d.auth_rank=1 And > c.authorID=e.authorid And e.auth_rank=2; Shouldn't you be checking a.bookid=e.bookid as well or am I missing something?
You seem to have a CARTESIAN JOIN in the query from what i can see anyway a -> d b -> d c -> e There are two views (i don't remember correct syntax) that are not joined so you will A) Get duplicates B) Get the wrong information Try making sure that all of your tables are linked Darren Ferguson On Thu, 31 Jan 2002, Ellen Cyran wrote: > I've run the SQL statement below and it doesn't give me > what I thought and even gives me some incorrect data. > Any idea why? > > Here's my version of the statement: > > SELECT a.title, b.Author AS auth1, c.author AS auth2 > FROM book AS a, tbl_author AS b, tbl_author AS c, author_book AS d, > author_book AS e > WHERE a.bookID=d.bookID And b.authorID=d.authorid And d.auth_rank=1 And > c.authorID=e.authorid And e.auth_rank=2; > > Here's the tables: > > tbl_author: > author authorid > brown 1 > jones 2 > smith 3 > > Author_Book: > bookid authorid auth_rank > 2 1 1 > 1 2 1 > 2 2 2 > 3 2 1 > 3 3 2 > 1 3 2 > > title bookid > book1 1 > book2 2 > book3 3 > > Here's the output: > title auth1 auth2 > book2 brown jones > book2 brown smith > book2 brown smith > book1 jones jones > book1 jones smith > book1 jones smith > book3 jones jones > book3 jones smith > book3 jones smith > > Book2 should only be brown and jones not brown and smith. Also, is there a > way to eliminate > the jones and jones for book1 and book3? I can easily get rid of the > duplicates using distinct > so that shouldn't be a problem. > > Ellen > ----- > At 02:49 PM 01/31/2002 -0500, Andrew Sullivan wrote: > >On Thu, Jan 31, 2002 at 11:43:37AM -0500, Ellen Cyran wrote: > >> > >> A few of the tables in a normalized database would be: > >> Author: > >> AuthorID, LastName, FirstName, DepartmentID > > > >> Author_Book: > >> AuthorID, BookID > > > > > >Seems to me like the author_book table will need a field which > >indicates "first author", "second author", &c. You can't just sort > >alphabetically, because that might not be the correct precedence. So > >you need something like authorno (probably NOT NULL DEFAULT 1, but > >you'll have to do some extra work to make sure that you never have a > >book with more than one 1st author, 2d author, &c.). > > > >> Title, Date, Author1, Author2, Author3, Author4 > >> > >> Where Author1, Author2, etc. are FirstName + LastName. > > > >I'm not an expert in designing this sort of thing, and someone is > >going to choke when s/he sees what a horribly inefficient way this > >works (if I gave it more thought, I could probably come up with a > >better answer). Still, this would work in case you have a known > >number of authors for every book: > > > >SELECT a.title,b.firstname||' '||b.lastname AS auth1, c.firstname||' > >'||c.lastname AS auth2, a.pubdate FROM books AS a, author AS b, > >author AS c, author_books AS d, author_books AS e WHERE > >a.id=d.book_id AND b.id=d.auth_id AND d.auth_rank=1 AND > >c.id=e.auth_id AND e.auth_rank = 2; > > > >I doubt you'll have that case, though, and you'd have to add some > >LEFT JOINs to the mix. For any amount of data at all, you'll have a > >performance problem. > > > >But I wonder if the difficulty might be because you're trying to > >normalise a simple one-to-many relation, and you actually have a > >one-to-many relation which has order in the "many" side. That's a > >different problem, really, and probably needs something like a > >unified book-author table with the book information in it: > > > >CREATE TABLE book (title text, > > pubdate date, > > author1 int4, > > author2 int4, > > author3 int4 . . . > > > >The trouble in this case is that you'll be limited to some maximim > >number of authors. (This is one problem the MARC cataloguing > >standard gets around. But I think that's rather more complicated > >that you want.) > > > >The author info could still be made separate, and references could be > >put in the book table so that if the author's info changed, you could > >change it for every book entry in one go. > > > >A > > > >-- > >---- > >Andrew Sullivan 87 Mowat Avenue > >Liberty RMS Toronto, Ontario Canada > ><andrew@libertyrms.info> M6K 3E3 > > +1 416 646 3304 x110 > > > > > >---------------------------(end of broadcast)--------------------------- > >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Yes, the other bookid check was missing. I only have two problems now. 1. I don't always have 2 authors, if I only have 1 then I don't get that book at all. 2. I can't be sure what the maximum number of authors is either. I could of course make the maximum pretty large, but then it does become somewhat tedious to code the SQL statement. Could this be easily made into a function where the maximum authors is passed to it? Thanks for the help. At 01:22 PM 01/31/2002 -0800, Stephan Szabo wrote: >On Thu, 31 Jan 2002, Ellen Cyran wrote: > >> I've run the SQL statement below and it doesn't give me >> what I thought and even gives me some incorrect data. >> Any idea why? >> >> Here's my version of the statement: >> >> SELECT a.title, b.Author AS auth1, c.author AS auth2 >> FROM book AS a, tbl_author AS b, tbl_author AS c, author_book AS d, >> author_book AS e >> WHERE a.bookID=d.bookID And b.authorID=d.authorid And d.auth_rank=1 And >> c.authorID=e.authorid And e.auth_rank=2; > >Shouldn't you be checking a.bookid=e.bookid as well or am I missing >something? > >
You should use a LEFT OUTER JOIN on the table if you are not getting anything because of a NULL. This will return the book and it will return NULL values in the fields that have no information. Darren Ferguson On Thu, 31 Jan 2002, Ellen Cyran wrote: > Yes, the other bookid check was missing. I only have two problems now. > > 1. I don't always have 2 authors, if I only have 1 then I don't > get that book at all. > 2. I can't be sure what the maximum number of authors is either. I could > of course make the maximum pretty large, but then it does become > somewhat tedious to code the SQL statement. Could this be easily made into a > function where the maximum authors is passed to it? > > Thanks for the help. > > > At 01:22 PM 01/31/2002 -0800, Stephan Szabo wrote: > >On Thu, 31 Jan 2002, Ellen Cyran wrote: > > > >> I've run the SQL statement below and it doesn't give me > >> what I thought and even gives me some incorrect data. > >> Any idea why? > >> > >> Here's my version of the statement: > >> > >> SELECT a.title, b.Author AS auth1, c.author AS auth2 > >> FROM book AS a, tbl_author AS b, tbl_author AS c, author_book AS d, > >> author_book AS e > >> WHERE a.bookID=d.bookID And b.authorID=d.authorid And d.auth_rank=1 And > >> c.authorID=e.authorid And e.auth_rank=2; > > > >Shouldn't you be checking a.bookid=e.bookid as well or am I missing > >something? > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Ellen Cyran <ellen@urban.csuohio.edu> writes: > 2. I can't be sure what the maximum number of authors is either. I could > of course make the maximum pretty large, but then it does become > somewhat tedious to code the SQL statement. Could this be easily made into a One way that would work is select title, (select b.author from tbl_author b, author_book c where a.bookID = c.bookID and b.authorID = c.authorID and c.auth_rank = 1) as auth1, (select b.author from tbl_author b, author_book c where a.bookID = c.bookID and b.authorID = c.authorID and c.auth_rank = 2) as auth2, (select b.author from tbl_author b, author_book c where a.bookID = c.bookID and b.authorID = c.authorID and c.auth_rank = 3) as auth3, -- repeat until bored from book a; This is pretty grotty however: it's both verbose and inefficient since each subselect gets evaluated independently. What I think I'd really do is join the authors to author_book just once using a temp table: create temp table author_match as select bookID, author, auth_rank from tbl_author b, author_book c where b.authorID = c.authorID; create index author_match_index on author_match (bookID, auth_rank); Then select a.title, (select author from author_match am where am.bookID = a.bookID and auth_rank = 1) as auth1, (select author from author_match am where am.bookID = a.bookID and auth_rank = 2) as auth2, (select author from author_match am where am.bookID = a.bookID and auth_rank = 3) as auth3, -- repeat until bored from book a; With the index, this should run tolerably fast. regards, tom lane
On Thu, Jan 31, 2002 at 05:35:26PM -0500, Tom Lane wrote: > > select > a.title, > (select author from author_match am > where am.bookID = a.bookID and auth_rank = 1) as auth1, > (select author from author_match am > where am.bookID = a.bookID and auth_rank = 2) as auth2, > (select author from author_match am > where am.bookID = a.bookID and auth_rank = 3) as auth3, > -- repeat until bored ^^^^^^^^^^^^^^^^^^ This is the real problem: for any given book, you can't know in advance how many authors it might have. It's why I sort of thought that a simple lookup table approach wouldn't be a good answer for this: you have an ordered data set of unpredictable size for every item in the book table. Maybe the answer is to use an array in the lookup table. That way you can order the author entries the way you want, and still look them up. I haven't worked with arrays in Postgres, though, so I don't know if this strategy will work well. It's certainly not as simple as the original outline supposed; but if you want to catalogue actual books, a simple model won't work. (If you doubt me, have a read of the MARC standard!) -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Fri, Feb 01, 2002 at 10:42:24AM -0500, Andrew Sullivan wrote: > On Thu, Jan 31, 2002 at 05:35:26PM -0500, Tom Lane wrote: > > > > select > > a.title, > > (select author from author_match am > > where am.bookID = a.bookID and auth_rank = 1) as auth1, > > (select author from author_match am > > where am.bookID = a.bookID and auth_rank = 2) as auth2, > > (select author from author_match am > > where am.bookID = a.bookID and auth_rank = 3) as auth3, > > -- repeat until bored > > This is the real problem: for any given book, you can't know in > advance how many authors it might have. It's why I sort of thought > that a simple lookup table approach wouldn't be a good answer for > this: you have an ordered data set of unpredictable size for every does it have to be the result of a sql select? how about reswizzling -- create table book ( id serial, title varchar(80), isbn varchar(10), ... ); create table author ( id serial, book_id int references book( id ), lname varchar(50), ... ); ... insert into book(title) values('Foundation and Empire'); insert into author(book_id,lname) values(currval('book_id_seq'),'Asimov'); insert into book(title) values('The Ugly Little Boy'); insert into author(book_id,lname) values(currval('book_id_seq'),'Asimov'); insert into author(book_id,lname) values(currval('book_id_seq'),'Silverberg'); then select b.title, a.lname from book b, author a where b.isbn = "$1" and a.book_id = b.id order by a.id ; sounds like a job for the middleware to assemble the output...? $auth = $dbh->selectall_arrayref( $sql_from_above ); my $ix = 0; my %fld = ( title => $auth->[0][0], map {$ix++; "author$ix" => $_->[1]} @$auth ); ... -- DEBIAN NEWBIE TIP #104 from Sean Quinlan <smq@gmx.co.uk> : Looking to CUSTOMIZE THE COLORS USED BY LS? I find its easier to run "dircolors -p >~/.dircolors" and then add "eval `dircolors -b ~/.dircolors`" to my .bashrc and then make all changes to ~/.dircolors (instead of the system-wide /etc/DIR_COLORS). Probably more pertinent on a multi user system, but good policy nevertheless. Also see http://newbieDoc.sourceForge.net/ ...
> > This is the real problem: for any given book, you can't know in > > advance how many authors it might have. It's why I sort of thought > > that a simple lookup table approach wouldn't be a good answer for > > this: you have an ordered data set of unpredictable size for every The way I did this for a tiny book database I have set up for my wife to keep track of books is to have an edition table, an author table and a table of edition author pairs. It isn't ordered, but it could be by adding another field to the edition, author pairs. I haven't finished all of the web based tools for dealing with this, as she isn't doing a lot with it now, but if you want to look at the scheme and the web tools that are there, you can look at: http://wolff.to/book/
On Mon, Feb 11, 2002 at 07:39:38AM -0600, Bruno Wolff III wrote: > The way I did this for a tiny book database I have set up for my wife > to keep track of books is to have an edition table, an author table and > a table of edition author pairs. It isn't ordered, but it could be > by adding another field to the edition, author pairs. That was my original suggestion. But then, how do you make sure that every edition has only one first author, only one second, &c.? Also, you can't have a generic query which gets the authors for every book, and shows them in the tabular output that was originally desired (hence the pivot table). You could, however, write some code outside the database which would first query the book_author table, figure out how many authors were necessary, and then build the real query that way. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Tue, Feb 12, 2002 at 11:49:26AM -0500, Andrew Sullivan <andrew@libertyrms.info> wrote: > On Mon, Feb 11, 2002 at 07:39:38AM -0600, Bruno Wolff III wrote: > > > The way I did this for a tiny book database I have set up for my wife > > to keep track of books is to have an edition table, an author table and > > a table of edition author pairs. It isn't ordered, but it could be > > by adding another field to the edition, author pairs. > > That was my original suggestion. But then, how do you make sure that > every edition has only one first author, only one second, &c.? Also, > you can't have a generic query which gets the authors for every book, > and shows them in the tabular output that was originally desired > (hence the pivot table). You could, however, write some code outside > the database which would first query the book_author table, figure > out how many authors were necessary, and then build the real query > that way. I did have another suggestion in there about using a third column on the author - book records to use for ordering. If you use something like that you could write general queries using order by to get things in author order.