Thread: How to deal with almost recurring data?
I am creating a database which is supposed to contain many data entries (events) that differ only in the date they occur. So let's say event 1 occurs every Monday, Tuesday and Sunday between January 1st and May 30th 2005. How do I store and manage such data in a meaningful way? The simple idea would be to store the event itself in one table and have another table containing all the dates (all Mondays, Tuesdays and Sundays between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => (date, event_id). The problem is that we are dealing with several tenthousand events, resulting in several million single dates if I stored it in the described manner. That is why I would like to know if there is a better way to store and manage such information? Thanks Matt
I would say it all depends on what you want to do with the data. If you want to look up all the possible occurences for an event, it might be useful to have the simple solution you described. If you will only look up the next n occurences starting from a given date, you might be better off storing the rule to derive the dates, and then calculate them in your software, but this will make your software more complicated for sure (dealing with date arithmetics can be incredibly complex, I've been there). I would go with the simple solution as long as there is enough space in the DB to store all the occurences... HTH, Csaba. On Tue, 2004-11-16 at 15:53, matthias@cmklein.de wrote: > I am creating a database which is supposed to contain many data entries > (events) that differ only in the date they occur. > > So let's say event 1 occurs every Monday, Tuesday and Sunday between > January 1st and May 30th 2005. > > How do I store and manage such data in a meaningful way? > > The simple idea would be to store the event itself in one table and have > another table containing all the dates (all Mondays, Tuesdays and Sundays > between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => > (date, event_id). > > The problem is that we are dealing with several tenthousand events, > resulting in several million single dates if I stored it in the described > manner. > > That is why I would like to know if there is a better way to store and > manage such information? > > Thanks > > Matt > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
matthias@cmklein.de wrote: > I am creating a database which is supposed to contain many data entries > (events) that differ only in the date they occur. > > So let's say event 1 occurs every Monday, Tuesday and Sunday between > January 1st and May 30th 2005. > > How do I store and manage such data in a meaningful way? > > The simple idea would be to store the event itself in one table and have > another table containing all the dates (all Mondays, Tuesdays and Sundays > between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => > (date, event_id). > > The problem is that we are dealing with several tenthousand events, > resulting in several million single dates if I stored it in the described > manner. This seems like a reasonable approach. Don't forget that PostgreSQL is designed to store large amounts of data. Of course, if there is some pattern to the dates then you might be better off storing the pattern rather than the individual dates. -- Richard Huxton Archonet Ltd
Thanks for the answer. If I go that way, though, I am afraid I will have very long query times. And I have no idea how to handle large databases of that kind without losing performance. If I store the pattern - is there a way to still perform the query in the database? Or do I have to do that in my software? (Matching the resultset to the pattern) Thanks Matt --- Ursprüngliche Nachricht --- Datum: 16.11.2004 16:42 Von: Richard Huxton <dev@archonet.com> An: matthias@cmklein.de Betreff: Re: [GENERAL] How to deal with almost recurring data? > matthias@cmklein.de wrote: > > I am creating a database which is supposed to contain many data entries > > (events) that differ only in the date they occur. > > > > So let's say event 1 occurs every Monday, Tuesday and Sunday between > > January 1st and May 30th 2005. > > > > How do I store and manage such data in a meaningful way? > > > > The simple idea would be to store the event itself in one table and have > > another table containing all the dates (all Mondays, Tuesdays and Sundays > > between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => > > (date, event_id). > > > > The problem is that we are dealing with several tenthousand events, > > resulting in several million single dates if I stored it in the described > > manner. > > This seems like a reasonable approach. Don't forget that PostgreSQL is > designed to store large amounts of data. > > Of course, if there is some pattern to the dates then you might be > better off storing the pattern rather than the individual dates. > > -- > Richard Huxton > Archonet Ltd > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Thanks The standard case is: I enter a date and 2 parameters of the event and I get the result set. Even though with Java date arithmetics are quite simple and I could therefore extend my software in that fashion, I would much rather perform all queries entirely in the database. So in case I go about it the way I described, I will have 4 million entries in the table that stores the dates and the references to the event. Now - how do I handle that database if I want to maintain very good query times (<1-3 seconds)? I have no idea how to handle a 1GB database in terms of query performance, like what tricks to use, what to do and what to avoid. Thanks Matt --- Ursprüngliche Nachricht --- Datum: 16.11.2004 16:32 Von: Csaba Nagy <nagy@ecircle-ag.com> An: matthias@cmklein.de Betreff: Re: [GENERAL] How to deal with almost recurring data? > I would say it all depends on what you want to do with the data. > If you want to look up all the possible occurences for an event, it > might be useful to have the simple solution you described. If you will > only look up the next n occurences starting from a given date, you might > be better off storing the rule to derive the dates, and then calculate > them in your software, but this will make your software more complicated > for sure (dealing with date arithmetics can be incredibly complex, I've > been there). I would go with the simple solution as long as there is > enough space in the DB to store all the occurences... > > HTH, > Csaba. > > On Tue, 2004-11-16 at 15:53, matthias@cmklein.de wrote: > > I am creating a database which is supposed to contain many data entries > > (events) that differ only in the date they occur. > > > > So let's say event 1 occurs every Monday, Tuesday and Sunday between > > January 1st and May 30th 2005. > > > > How do I store and manage such data in a meaningful way? > > > > The simple idea would be to store the event itself in one table and have > > another table containing all the dates (all Mondays, Tuesdays and Sundays > > between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => > > (date, event_id). > > > > The problem is that we are dealing with several tenthousand events, > > resulting in several million single dates if I stored it in the described > > manner. > > > > That is why I would like to know if there is a better way to store and > > manage such information? > > > > Thanks > > > > Matt > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
When I've said date arithmetics, I was referring to Java date arithmetics... I've been there, and I can tell you that it's not pleasant to have the application go haywire 2 times a year on sommer/winter time switching, and this is not the only thing which can go wrong - and you'll only see it after half a year when it makes your production system unusable... that said, I managed finally to get it right, but don't expect that Java date arithmetics will just work. If you go for storing rules instead of dates, that will mean that you will have to program around your queries, either in the client or in some procedural language on the server. Java would be easier here I guess, but make sure you test every possible date/hour/second for the next 5 years :-) Regarding speed for the solution storing dates, I guess it could work fine if you have the right indexes... but I don't know too much about this, I'm just pretty sure it will be up to your requirements if you tune it properly. Cheers, Csaba. On Tue, 2004-11-16 at 18:38, matthias@cmklein.de wrote: > Thanks > The standard case is: I enter a date and 2 parameters of the event and I > get the result set. > Even though with Java date arithmetics are quite simple and I could > therefore extend my software in that fashion, I would much rather perform > all queries entirely in the database. > So in case I go about it the way I described, I will have 4 million > entries in the table that stores the dates and the references to the > event. > Now - how do I handle that database if I want to maintain very good query > times (<1-3 seconds)? > I have no idea how to handle a 1GB database in terms of query performance, > like what tricks to use, what to do and what to avoid. > Thanks > > Matt > > > --- Ursprüngliche Nachricht --- > Datum: 16.11.2004 16:32 > Von: Csaba Nagy <nagy@ecircle-ag.com> > An: matthias@cmklein.de > Betreff: Re: [GENERAL] How to deal with almost recurring data? > > > I would say it all depends on what you want to do with the data. > > If you want to look up all the possible occurences for an event, it > > might be useful to have the simple solution you described. If you will > > only look up the next n occurences starting from a given date, you > might > > be better off storing the rule to derive the dates, and then calculate > > them in your software, but this will make your software more > complicated > > for sure (dealing with date arithmetics can be incredibly complex, I've > > been there). I would go with the simple solution as long as there is > > enough space in the DB to store all the occurences... > > > > HTH, > > Csaba. > > > > On Tue, 2004-11-16 at 15:53, matthias@cmklein.de wrote: > > > I am creating a database which is supposed to contain many data > entries > > > (events) that differ only in the date they occur. > > > > > > So let's say event 1 occurs every Monday, Tuesday and Sunday between > > > January 1st and May 30th 2005. > > > > > > How do I store and manage such data in a meaningful way? > > > > > > The simple idea would be to store the event itself in one table and > have > > > another table containing all the dates (all Mondays, Tuesdays and > Sundays > > > between 2005-01-01 and 2005-05-30) plus a foreign key to event_ID => > > > (date, event_id). > > > > > > The problem is that we are dealing with several tenthousand events, > > > resulting in several million single dates if I stored it in the > described > > > manner. > > > > > > That is why I would like to know if there is a better way to store > and > > > manage such information? > > > > > > Thanks > > > > > > Matt > > > > > > ---------------------------(end of > broadcast)--------------------------- > > > TIP 2: you can get off all lists at once with the unregister command > > > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > >