Thread: Couple of design questions
Hello everyone, I'm just beginning a PHP / PostgreSQL project for someone and there are a couple of things where I am not sure the best design for the database. The first is that we want to store kind of a history of values for the past 3 days or so. We want to use this so that we can analyze how the values have changed over the last few days. The solution I thought of was having 4 columns in the table for each value, like this: somedata int, somedata_24 int, somedata_48 int, somedata_72 int, There are 3 different variables that we want to keep a history for in each row. So what I thought we could do is at some arbitrary time each day, copy the values from somedata into somedata_24, from somedata_24 into somedata_48, from somedata_48 into somedata_72, and just forget whatever was in somedata_72. My question is, how long would something like this take (relatively speaking, I don't know the hardware specs of the server exactly, it will be professionally hosted I believe) if there were around 20,000 rows? If it would take too long or be too taxing on resources, do you have any other ideas on how to handle something like this? The second thing is we are going to be having our users update the data in the database quite frequently. What will happen is that when User A wants to search the database for something, the system will first check and see what the oldest set of data it has is and then ask User A to fetch new data to replace the old stuff. Now since we expect very frequent use of this, we want a way to mark which data was requested from User A so that when User B comes along, the system doesn't request the same data from him. But we don't want to change the timestamp for when the data was updated until we actually get the information, in case the user decides he doesn't want to do a search and doesn't send the new data. One way I thought I could do this is by having a table something like this: dataset_id int, last_update timestamp, (or some other date/time field...I can never keep them straight in my head) locked bool Then, when I request a dataset from User A, I set the bool field to true and my SQL for finding the oldest one already only selects from datasets where locked is false. But I thought if PGSQL's transaction functions already did something like this, it would be easier and faster to use them. Will SELECT ... FOR UPDATE help me out here? Is there a way to check and see if a row is locked and ignore locked rows in another SELECT ... FOR UPDATE? Any input on a better way to handle this would also be appreciated. Thanks for the time. :) -Jesse
Hi, For the first question , having about 20,000 rows seems it will work fine. (I do have about 100,000 rows table working fine.) For the second, sure PG has enough locking some look for MVCC feature at the PG docs(chapter 10). >Will SELECT ... FOR UPDATE help me out here? "However, take into account that SELECT FOR UPDATE will modify selected rows to mark them and so will results in disk writes. Row-level locks don't affect data querying. They are used to block writers to the same row only. " >Is there a way to >check and see if a row is locked and ignore locked rows in another >SELECT... FOR UPDATE? I don't think this would be a safe way. ie: if user A made a select for update , and then forgot to submit (commit) what would happen? the selected rows would stay locked for ever !?! So IMO ,it would be safer if user A make a simple select ( showing it in an HTML form ) and then make a seperate update (using HTML input).At this case you will still have the risk of update error( if user B had made simultanious updates ), but still seems safer. Regards Omid Omoomi ________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
Jesse Scott <jscott@bmi.net> writes: > Hello everyone, > > I'm just beginning a PHP / PostgreSQL project for someone and there are a > couple of things where I am not sure the best design for the database. > > The first is that we want to store kind of a history of values for the past > 3 days or so. We want to use this so that we can analyze how the values > have changed over the last few days. The solution I thought of was having > 4 columns in the table for each value, like this: > > somedata int, > somedata_24 int, > somedata_48 int, > somedata_72 int, > > There are 3 different variables that we want to keep a history for in each > row. So what I thought we could do is at some arbitrary time each day, > copy the values from somedata into somedata_24, from somedata_24 into > somedata_48, from somedata_48 into somedata_72, and just forget whatever > was in somedata_72. My question is, how long would something like this > take (relatively speaking, I don't know the hardware specs of the server > exactly, it will be professionally hosted I believe) if there were around > 20,000 rows? If it would take too long or be too taxing on resources, do > you have any other ideas on how to handle something like this? <snip> Would it not be better to have something like: uid int, -- identified piece of data log_date date, -- day the data was logged data int, -- data to be stored The primary key will be composed of the uid and the log_data together. Now there is not need to age the data. If you want data from 24 hours ago, you take the current date, subtract one day, then do select based on that. Same with 48 hours, etc. Now all you have to do is periodically delete data older than a few days. But this method lets you choose arbitrarily how many days to keep. And if you also index the data by the log_date, the deleting old data will be very fast. <snip> -- Prasanth Kumar kumar1@home.com