Re: COPY command details - Mailing list pgsql-general
From | Benjamin Arai |
---|---|
Subject | Re: COPY command details |
Date | |
Msg-id | 460C9CE8.7070408@araisoft.com Whole thread Raw |
In response to | Re: COPY command details (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: COPY command details
|
List | pgsql-general |
I have one system which I have used partitioning. For this particular case I have tons of data over about (50 years). What I did is wrote small loader that breaks data in tables based on date, so I have tables like abc_2000, abc_2001 etc. The loading script is only a couple hundred lines of code. The only part that was a little bit of work was to allow for easy access to the data for the data for the devs. I did this by writing a few PL functions to automatically union the tables and produce results. So the function like getData(startData,enddate) would run a union query for the respective date ranges. Benjamin P.S. If I am doing anything that can be improved for the data access portion please let me know and feel free to voice your opinions. I am always looking for new ways to make this particular database faster. Gerald Timothy G Quimpo wrote: > On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote: > >> Rebuilding an index can't be the PostgreSQL solution for all >> cases. I am dealing with databases in the hundreds of gigs >> range and I am adding about 10gigs of data a week. At >> some point its going to take longer than a week to rebuild >> all of the indexes in the database. >> >> On the other hand, if I am to partition the data into >> several tables then it might not be such a big deal since >> I am only adding and never deleting... This makes it a >> little more of a pain in the ass. >> > > I am leaning toward a policy of always partitioning large > tables. I haven't found the time to do it properly yet, > thinking about it, hoping that someone who'se done it will > chime in with their rules of thumb. > > Like Benjamin, I have a database that is close to 600GB > for 2.25 years of data, and if I were to add the other > 4 years of data that we have archived away, will easily go > into the terabyte range. There are a few individual tables > which approach 100GB all by themselves. > > As it is, I can't afford to do reindex or even backup > (pg_dump or any other method) or other administrative tasks > on those tables since the processes take too long (there are > workarounds, i could backup single tables at slack times, > which would allow me to do a complete backup (but not > self-consistent as a set) over the course of a week or so. > > So I'm leaning toward partitioning, perhaps selecting > partition rules so that no table will be larger than > around 5GB, at which point, reindex or admin procedures > that take exclusive locks now become only minor > inconveniences rather than showstoppers. > > How do people take consistent backups of very large > databases on Linux/FreeBSD? I'm aware of PITR, but > might not be able to set aside a box with enough > drives for it. LVM Snapshot? performance issues with > LVM, etc? > > tiger > >
pgsql-general by date: