Thread: Help normalizing table(s)
Hello everyone, First let me start off by saying that the situation I'm dealing with is very complex. Here's the scoop: I receive a stream of data from one of our internal servers, which I put into a textfile. The fields are separated by pipes and records by newline characters. As of now, I'm loading the data via Perl DBI into a table. The catch is: each record has 502 fields and I receive about 100,000 records per hour. I need to make these records searchable via a web interface for the user. With that said, I have almost everything working, but have hit a brick wall with the searching. I am currently using MySQL to store the data (not by choice...the users left to maintain this only know how to use MySQL). Because of the type of data this is (cellular calls) it has become very difficult to keep the records unique. Most of the 502 fields cannot be "unique" fields because of the replicated data that must exist in each record. There are 3 fields that combined can make a unique key - something like the following 2002101411-54738-107 - the first 10 digits are the date, the second are a call sequence number assigned to the call and the third is the tower it took place on. There's a catch to just using the call sequence number, as it starts over again after it hits 1 million. My solution to this is to store the data in tables by hour. This keeps the tables relatively small (50-100K records) and helps with searching. Now on to the real questions: I need to be able to search more than one table at a time, but just "union"ing the queries isn't a very good option as some users want to search 48 hours worth of data.....that would result in 48 unioned queries!!!! My solution to this so far is to take the fields that the users are searching on (only about 7) and stuffing them into a "master" table along with a unique identifier (bigint). I attach the same unique key to the table with the remaining data so I can join them and retrieve stuff later. This will allow the actually searching to be done on the smaller table and then if results are found get the data from the larger table. How do I keep this table rather small (1 million records or so) and still be able to search archived data? Are there any other ways that I can do this? Any help is appreciated!!! Kevin kold@carolina.rr.com
On 17 Oct 2002 at 11:12, Kevin Old wrote: > My solution to this is to store the data in tables by hour. This keeps > the tables relatively small (50-100K records) and helps with searching. > > Now on to the real questions: > > I need to be able to search more than one table at a time, but just > "union"ing the queries isn't a very good option as some users want to > search 48 hours worth of data.....that would result in 48 unioned > queries!!!! > How do I keep this table rather small (1 million records or so) and > still be able to search archived data? Are there any other ways that I > can do this? Why do I feel that you are using my design? Probably because you are solving same problem as I am solving..:-) In postgresql, you can inherit a table from another table. You can query on base table and gather rows from child table. So make an empty base table, make each hourly table child of base table and query on base table. It will take care of everything. Remove the hourly table after 48 hours.. Bye Shridhar -- Killing is wrong. -- Losira, "That Which Survives", stardate unknown
On Thu, Oct 17, 2002 at 09:24:59PM +0530, Shridhar Daithankar wrote: > On 17 Oct 2002 at 11:12, Kevin Old wrote: > > > My solution to this is to store the data in tables by hour. This keeps > > the tables relatively small (50-100K records) and helps with searching. > In postgresql, you can inherit a table from another table. You can query on > base table and gather rows from child table. So make an empty base table, make > each hourly table child of base table and query on base table. It will take > care of everything. Remove the hourly table after 48 hours.. What will happen when you want to look at the last 24 hours and not 48? And if somebody want to look at the last 72 hours? I think this is not a good idea. And I know this because I did the same thing some time ago, and it was so huge a PITA that I eventually ended merging all the tables into one. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "When the proper man does nothing (wu-wei), his thought is felt ten thousand miles." (Lao Tse)
On 17 Oct 2002 at 14:30, Alvaro Herrera wrote: > On Thu, Oct 17, 2002 at 09:24:59PM +0530, Shridhar Daithankar wrote: > > On 17 Oct 2002 at 11:12, Kevin Old wrote: > > > > > My solution to this is to store the data in tables by hour. This keeps > > > the tables relatively small (50-100K records) and helps with searching. > > > In postgresql, you can inherit a table from another table. You can query on > > base table and gather rows from child table. So make an empty base table, make > > each hourly table child of base table and query on base table. It will take > > care of everything. Remove the hourly table after 48 hours.. > > What will happen when you want to look at the last 24 hours and not 48? > And if somebody want to look at the last 72 hours? Hey, timestamps are stored with each record right? Include time cause in the where part as well. Some child table will succeed some won't.. Postgresql will take care of that. Why bother? Bye Shridhar -- Knebel's Law: It is now proved beyond doubt that smoking is one of the leading causes of statistics.