Re: Populating huge tables each day - Mailing list pgsql-general
From | Ben-Nes Yonatan |
---|---|
Subject | Re: Populating huge tables each day |
Date | |
Msg-id | 42C7C956.2080301@canaan.co.il Whole thread Raw |
In response to | Re: Populating huge tables each day ("Jim C. Nasby" <decibel@decibel.org>) |
Responses |
Trigger help
|
List | pgsql-general |
> On Tue, Jun 28, 2005 at 10:36:58AM -0700, Dann Corbit wrote: > >>>Nope, truncate is undoubtedly faster. But it also means you would have >>>downtime as you mentioned. If it were me, I'd probably make the >>>trade-off of using a delete inside a transaction. >> >>For every record in a bulk loaded table? > > Sure. If the data's only being loaded once a day, it probably doesn't > matter if that delete takes 10 minutes. > > >>If it were that important that both servers be available all the time, I >>would bulk load into a second table with the same shape and then rename >>when completed. > > Interesting idea, though the problem is that AFAIK everything will block > on the rename. If everything didn't block though, this might be a better > way to do it, although it potentially complicates the code greatly > (think about needing to add indexes, rebuild RI, etc.) Sorry for the lack of answers i was away and unable to answer... I thought about the idea of loading everything into a temporary table and then renaming it but indeed as Jim mentioned it will also complicate everything alot more... but im not sure that indeed everything will be blocked during that time if it will all be under a transaction or am i wrong here? Here are some answer to Dann questions from earlier mail: How many tables are to be replicated? * 2 tables + another one which will get modified a little. What is the total number of expected rows for each table? * 1 table about 3.5 million and the second is quite hard to determine but its about 40,000, the third one modification will probably be about 10 new rows each day maximum so its really not a problem (it will require one query on all of the data at the end though...). How fast are the tables expected to grow? * im not sure that i understand your question but if you ask it about the time that it takes the tables to get to their final size then its supposed to take minutes i guess cause it will probably load everything and i want to cut that time as much as i can (ill run it under "nice"). When must the new data become available online? * right after the changes will complete (its not really a time in seconds or a specific hour). Are all of the tables in the database populated from a foreign source or just some of them? * just those 3. Do you also have access to the source data in its database format, or only as text dumps? * only as text or XML, i prefer text cause i want to use COPY though any diffrent thoughts will be accepted gladly. ** The goal of the site is to create a portal of products for end-users. Im startring to wonder maybe i shouldnt upload the "New Data" to a "Temp table" but instead upload it directly to the "Main table" and just add a status field which will tell that its "in progress", when deleting ill delete all of those rows which their status is "active" and afterwards update all of the rows with the status of "in progress" to "active". By this solution ill also be able to save from deletion rows which ill want to keep from deletion by just changing their status to something else then "active" or "in progress". Also ill save the need to transfer all of the data from the "Temp table" to the "Main table". 2 points bother me here... 1. by uploading the data to the "Main table" before deleting its content i create a table with 7 million rows which will stress the system more for every query that ill run on it (like the one which i need for the deletion of the old rows). 2. im not sure if ill be able to restart the counting of the indexing when there will be data at the table (after all at this method there wont be any period of time without data at the "Main table"). I guess that what i really want to know is how much all of this process will stress the server... and what can i do to let the server work on it in a way that it wont disturb the rest of the processes. Thanks alot again, Ben-Nes Yonatan Canaan Surfing ltd. http://www.canaan.net.il
pgsql-general by date: