Help normalizing table(s) - Mailing list pgsql-general

From Kevin Old
Subject Help normalizing table(s)
Date
Msg-id 1034867525.3249.31.camel@oc
Whole thread Raw
Responses Re: Help normalizing table(s)  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
List pgsql-general
Hello everyone,

First let me start off by saying that the situation I'm dealing with is
very complex.

Here's the scoop:
I receive a stream of data from one of our internal servers, which I put
into a textfile.  The fields are separated by pipes and records by
newline characters.  As of now, I'm loading the data via Perl DBI into a
table.  The catch is: each record has 502 fields and I receive about
100,000 records per hour.  I need to make these records searchable via a
web interface for the user.

With that said, I have almost everything working, but have hit a brick
wall with the searching.

I am currently using MySQL to store the data (not by choice...the users
left to maintain this only know how to use MySQL).  Because of the type
of data this is (cellular calls) it has become very difficult to keep
the records unique.  Most of the 502 fields cannot be "unique" fields
because of the replicated data that must exist in each record.  There
are 3 fields that combined can make a unique key - something like the
following 2002101411-54738-107 - the first 10 digits are the date, the
second are a call sequence number assigned to the call and the third is
the tower it took place on.  There's a catch to just using the call
sequence number, as it starts over again after it hits 1 million.

My solution to this is to store the data in tables by hour.  This keeps
the tables relatively small (50-100K records) and helps with searching.

Now on to the real questions:

I need to be able to search more than one table at a time, but just
"union"ing the queries isn't a very good option as some users want to
search 48 hours worth of data.....that would result in 48 unioned
queries!!!!

My solution to this so far is to take the fields that the users are
searching on (only about 7) and stuffing them into a "master" table
along with a unique identifier (bigint).  I attach the same unique key
to the table with the remaining data so I can join them and retrieve
stuff later.  This will allow the actually searching to be done on the
smaller table and then if results are found get the data from the larger
table.

How do I keep this table rather small (1 million records or so) and
still be able to search archived data?  Are there any other ways that I
can do this?

Any help is appreciated!!!

Kevin
kold@carolina.rr.com



pgsql-general by date:

Previous
From: "Jose Antonio Leo"
Date:
Subject: Re: RV: bad result in a query!! :-(
Next
From: "Karl Stubsjoen"
Date:
Subject: Re: Help changing varchar field