Thread: Set Operators and Arrays Question
Hello readers and writers, I have a concrete problem which I think is similar to a large class of problems. I want to propose a better solution for my concrete problem, which, if implemented, would be generally useful. Then again someone may come up with a better method again... My concrete problem is with keeping track of fundraising letters at the RSPCA where I do volunteer programing. There are 100,000 people on the mailing list, and each Fundraising letter or reminder has an id and I need to keep track of who has got what so that people don't get too many letters, don't get reminders for fundraising appeals they weren't involved in, etc. For example there are (over say 5 years) perhaps 200 letters and each person receives a very small subset of these. Person| Letters received John : 3,35,45,27 Mary : 1,35,4, Ian : 1,35,4,12,45,63 ... Current Solution: My current tables look (in simplified form) like: create table member ( member_id serial, ...); create table mailing ( mailing_id serial, ... ); create table memb_mailing ( member_id int,mailing_id int); This all works fine and I can define indexes to speed up queries like "tell me all the people who got mailing 23?", "did person X get mailing Y?". But I'm not quite happy. Its a bit slow and takes up more disk than I think it should. 2nd attempt: create table member ( member_id serial, ...); create table mailing ( mailing_id serial, ...); create table memb_mailing ( member_id int,mailings bool[]); Here each of the members of the mailings boolean array represents one letter. You initialise it with '{f,f,f,f,f.....}'. A disadvantage is that you need to guestimate the largest letter number to build the initial string. On the other hand you can do great things like: update memb_mailing set mailings[15]='t' where member_id=templist.member_id; where you have previously built a temporary table templist of members. This is little better. It uses a bit less disk, and answers queries a little faster (even without indexes). Wish List Solution: Heres what I think would be really the best way to solve the problem. create table member ( member_id serial, data text); create table mailing ( mailing_id serial, name text); create table memb_mailing ( member_id int,mailings int[]); Here the initial mailings array is empty and I say things like: update memb_mailing add 27 into mailings where member_id=templist.member_id; This just inserts the letter number 27 into the mailings array. And we could also say things like: select M.member_id from member M,memb_mailing MM where M.member_id=MM.member_id and MM.mailings contains 27; Summary: I'm really just suggesting a couple of set operators for arrays. Perhaps there is already a better way of doing this? I'm open to suggestions. Cheers Geoff - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6 Fifth Ave +618-8332-5069 (Home) | St Morris SA 5068 +618-8207-2029 (Work) | geoff@austrics.com.au Adelaide, AUSTRALIA +618-8364-1543 (Fax-Home) |
Geoff Russell writes: > I'm really just suggesting a couple of set operators for arrays. Well, arrays are arrays, not sets. If you want sets, you create a table. If you want set operators, you use selects and joins. That is the design. However, if you want to outsmart the system you can use the operators in contrib/array, but don't say I didn't warn you. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
> Wish List Solution: > > Heres what I think would be really the best way to solve the problem. > > create table member ( member_id serial, data text); > create table mailing ( mailing_id serial, name text); > create table memb_mailing ( member_id int,mailings int[]); > > Here the initial mailings array is empty and I say things like: > > update memb_mailing add 27 into mailings > where member_id=templist.member_id; > > This just inserts the letter number 27 into the mailings array. And we could > also say things like: > > select M.member_id from member M,memb_mailing MM > where M.member_id=MM.member_id and > MM.mailings contains 27; You may want to look at the contrib/array directory for additional functions and operators for arrays (which includes is this an element of this array function/operator)
Geoff Russell wrote: > > Hello readers and writers, > > I have a concrete problem which I think is similar to a large class of > problems. I want to propose a better solution for my concrete problem, > which, if implemented, would be generally useful. Then again someone > may come up with a better method again... [ Bandwidth savings ... ] > I'm really just suggesting a couple of set operators for arrays. > > Perhaps there is already a better way of doing this? I'm open to suggestions. IMHO, your current siolution is the best, because it doesn't need any guesstimate, and is in the correct 5th normal form. I agree that it might eat a bit of disk, but given current prices of hardware, this should not be a consideration. By golly, a 40 GB disk is now about $100-200 ... Another nice way to speed up a database system is to throw RAM at it. Again, RAM is cheap, nowadays ... Wher I live, 128 MB can be got for about $50-100, depending of your source. Furthermore, your current solution is also the best in the sense that it will scale well. Not the case for array-based solutions. IMHO, arrays have a very restricted usefulness : to describe correctly objects uniquely identified by a fixed number of identical items. E. g. points, vectors and other *simple* geometrical entities. However, event that could fail : the descriptioon of a polygon needs an arbitrary number of points, and thus cannot be easily represented by an array, whatever its dimensions ... Your wishlist solution would require implementation of sets (or lists, which are not the same but sometimes a nice representation of sets) in Postgres. There is an awful lot of reasons for which this would be difficult and probably inefficient. Your current implementation is probably the best possible in a RDBMS. Hope this helps, E. Charpentier