Thread: Help normalizing table(s)

Help normalizing table(s)

From
Kevin Old
Date:
Hello everyone,

First let me start off by saying that the situation I'm dealing with is
very complex.

Here's the scoop:
I receive a stream of data from one of our internal servers, which I put
into a textfile.  The fields are separated by pipes and records by
newline characters.  As of now, I'm loading the data via Perl DBI into a
table.  The catch is: each record has 502 fields and I receive about
100,000 records per hour.  I need to make these records searchable via a
web interface for the user.

With that said, I have almost everything working, but have hit a brick
wall with the searching.

I am currently using MySQL to store the data (not by choice...the users
left to maintain this only know how to use MySQL).  Because of the type
of data this is (cellular calls) it has become very difficult to keep
the records unique.  Most of the 502 fields cannot be "unique" fields
because of the replicated data that must exist in each record.  There
are 3 fields that combined can make a unique key - something like the
following 2002101411-54738-107 - the first 10 digits are the date, the
second are a call sequence number assigned to the call and the third is
the tower it took place on.  There's a catch to just using the call
sequence number, as it starts over again after it hits 1 million.

My solution to this is to store the data in tables by hour.  This keeps
the tables relatively small (50-100K records) and helps with searching.

Now on to the real questions:

I need to be able to search more than one table at a time, but just
"union"ing the queries isn't a very good option as some users want to
search 48 hours worth of data.....that would result in 48 unioned
queries!!!!

My solution to this so far is to take the fields that the users are
searching on (only about 7) and stuffing them into a "master" table
along with a unique identifier (bigint).  I attach the same unique key
to the table with the remaining data so I can join them and retrieve
stuff later.  This will allow the actually searching to be done on the
smaller table and then if results are found get the data from the larger
table.

How do I keep this table rather small (1 million records or so) and
still be able to search archived data?  Are there any other ways that I
can do this?

Any help is appreciated!!!

Kevin
kold@carolina.rr.com



Re: Help normalizing table(s)

From
"Shridhar Daithankar"
Date:
On 17 Oct 2002 at 11:12, Kevin Old wrote:

> My solution to this is to store the data in tables by hour.  This keeps
> the tables relatively small (50-100K records) and helps with searching.
>
> Now on to the real questions:
>
> I need to be able to search more than one table at a time, but just
> "union"ing the queries isn't a very good option as some users want to
> search 48 hours worth of data.....that would result in 48 unioned
> queries!!!!
> How do I keep this table rather small (1 million records or so) and
> still be able to search archived data?  Are there any other ways that I
> can do this?

Why do I feel that you are using my design? Probably because you are solving
same problem as I am solving..:-)

In postgresql, you can inherit a table from another table. You can query on
base table and gather rows from child table. So make an empty base table, make
each hourly table child of base table and query on base table. It will take
care of everything. Remove the hourly table after 48 hours..

Bye
 Shridhar

--
Killing is wrong.        -- Losira, "That Which Survives", stardate unknown


Re: Help normalizing table(s)

From
Alvaro Herrera
Date:
On Thu, Oct 17, 2002 at 09:24:59PM +0530, Shridhar Daithankar wrote:
> On 17 Oct 2002 at 11:12, Kevin Old wrote:
>
> > My solution to this is to store the data in tables by hour.  This keeps
> > the tables relatively small (50-100K records) and helps with searching.

> In postgresql, you can inherit a table from another table. You can query on
> base table and gather rows from child table. So make an empty base table, make
> each hourly table child of base table and query on base table. It will take
> care of everything. Remove the hourly table after 48 hours..

What will happen when you want to look at the last 24 hours and not 48?
And if somebody want to look at the last 72 hours?

I think this is not a good idea.  And I know this because I did the same
thing some time ago, and it was so huge a PITA that I eventually ended
merging all the tables into one.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)

Re: Help normalizing table(s)

From
"Shridhar Daithankar"
Date:
On 17 Oct 2002 at 14:30, Alvaro Herrera wrote:

> On Thu, Oct 17, 2002 at 09:24:59PM +0530, Shridhar Daithankar wrote:
> > On 17 Oct 2002 at 11:12, Kevin Old wrote:
> >
> > > My solution to this is to store the data in tables by hour.  This keeps
> > > the tables relatively small (50-100K records) and helps with searching.
>
> > In postgresql, you can inherit a table from another table. You can query on
> > base table and gather rows from child table. So make an empty base table, make
> > each hourly table child of base table and query on base table. It will take
> > care of everything. Remove the hourly table after 48 hours..
>
> What will happen when you want to look at the last 24 hours and not 48?
> And if somebody want to look at the last 72 hours?

Hey, timestamps are stored with each record right? Include time cause in the
where part as well. Some child table will succeed some won't.. Postgresql will
take care of that. Why bother?

Bye
 Shridhar

--
Knebel's Law:    It is now proved beyond doubt that smoking is one of the leading
causes of statistics.