Thread: Set Operators and Arrays Question

Set Operators and Arrays Question

From
Geoff Russell
Date:
Hello readers and writers,

I have a concrete problem which I think is similar to a large class of
problems. I want to propose a better solution for my concrete problem,
which, if implemented, would be generally useful. Then again someone
may come up with a better method again...

My concrete problem is with keeping track of fundraising letters at
the RSPCA where I do volunteer programing. There are 100,000 people on
the mailing list, and each Fundraising letter or reminder has an id and
I need to keep track of who has got what so that people don't get too many
letters, don't get reminders for fundraising appeals they weren't involved in,
etc.

For example there are (over say 5 years) perhaps 200 letters and each person
receives a very small subset of these.

       Person| Letters received
        John : 3,35,45,27
        Mary : 1,35,4,
        Ian  : 1,35,4,12,45,63
        ...

Current Solution:

My current tables look (in simplified form) like:

        create table member       (  member_id serial, ...);
        create table mailing      (  mailing_id serial, ... );
        create table memb_mailing (  member_id int,mailing_id int);

This all works fine and I can define indexes to speed up queries like
"tell me all the people who got mailing 23?", "did person X get mailing
Y?". But I'm not quite happy. Its a bit slow and takes up more disk than
I think it should.


2nd attempt:

        create table member       (  member_id serial, ...);
        create table mailing      (  mailing_id serial, ...);
        create table memb_mailing (  member_id int,mailings bool[]);

Here each of the members of the mailings boolean array represents one letter.
You initialise it with '{f,f,f,f,f.....}'. A disadvantage is that you need
to guestimate the largest letter number to build the initial string.

On the other hand you can do great things like:

        update memb_mailing set mailings[15]='t'
             where member_id=templist.member_id;

where you have previously built a temporary table templist of members.

This is little better. It uses a bit less disk, and answers queries a little
faster (even without indexes).

Wish List Solution:

Heres what I think would be really the best way to solve the problem.

        create table member       (  member_id serial, data text);
        create table mailing      (  mailing_id serial, name text);
        create table memb_mailing (  member_id int,mailings int[]);

Here the initial mailings array is empty and I say things like:

        update memb_mailing add 27 into mailings
             where member_id=templist.member_id;

This just inserts the letter number 27 into the mailings array. And we could
also say things like:

        select M.member_id from member M,memb_mailing MM
             where M.member_id=MM.member_id and
             MM.mailings contains 27;

Summary:

I'm really just suggesting a couple of set operators for arrays.

Perhaps there is already a better way of doing this? I'm open to suggestions.

Cheers
Geoff

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6 Fifth Ave           +618-8332-5069 (Home)          |
St Morris SA 5068     +618-8207-2029 (Work)          | geoff@austrics.com.au
Adelaide, AUSTRALIA   +618-8364-1543 (Fax-Home)      |


Re: Set Operators and Arrays Question

From
Peter Eisentraut
Date:
Geoff Russell writes:

> I'm really just suggesting a couple of set operators for arrays.

Well, arrays are arrays, not sets.  If you want sets, you create a table.
If you want set operators, you use selects and joins.  That is the design.

However, if you want to outsmart the system you can use the operators in
contrib/array, but don't say I didn't warn you.

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/


Re: Set Operators and Arrays Question

From
Stephan Szabo
Date:
> Wish List Solution:
>
> Heres what I think would be really the best way to solve the problem.
>
>         create table member       (  member_id serial, data text);
>         create table mailing      (  mailing_id serial, name text);
>         create table memb_mailing (  member_id int,mailings int[]);
>
> Here the initial mailings array is empty and I say things like:
>
>         update memb_mailing add 27 into mailings
>              where member_id=templist.member_id;
>
> This just inserts the letter number 27 into the mailings array. And we could
> also say things like:
>
>         select M.member_id from member M,memb_mailing MM
>              where M.member_id=MM.member_id and
>              MM.mailings contains 27;

You may want to look at the contrib/array directory for additional
functions and operators for arrays (which includes is this an element
of this array function/operator)


Re: Set Operators and Arrays Question

From
Emmanuel Charpentier
Date:
Geoff Russell wrote:
>
> Hello readers and writers,
>
> I have a concrete problem which I think is similar to a large class of
> problems. I want to propose a better solution for my concrete problem,
> which, if implemented, would be generally useful. Then again someone
> may come up with a better method again...

[ Bandwidth savings ... ]

> I'm really just suggesting a couple of set operators for arrays.
>
> Perhaps there is already a better way of doing this? I'm open to suggestions.

IMHO, your current siolution is the best, because it doesn't need any
guesstimate, and is in the correct 5th normal form.

I agree that it might eat a bit of disk, but given current prices of
hardware, this should not be a consideration. By golly, a 40 GB disk  is
now about $100-200 ... Another nice way to speed up a database system is
to throw RAM at it. Again, RAM is cheap, nowadays ... Wher I live, 128
MB can be got for about $50-100, depending of your source.

Furthermore, your current solution is also the best in the sense that it
will scale well. Not the case for array-based solutions.

IMHO, arrays have a very restricted usefulness : to describe correctly
objects uniquely identified by a fixed number of identical items. E. g.
points, vectors and other *simple* geometrical entities. However, event
that could fail : the descriptioon of a polygon needs an arbitrary
number of points, and thus cannot be easily represented by an array,
whatever its dimensions ...

Your wishlist solution would require implementation of sets (or lists,
which are not the same but sometimes a nice representation of sets) in
Postgres.

There is an awful lot of reasons for which this would be difficult and
probably inefficient. Your current implementation is probably the best
possible in a RDBMS.

Hope this helps,

                    E. Charpentier