Re: COPY command details - Mailing list pgsql-general

From Benjamin Arai
Subject Re: COPY command details
Date
Msg-id 460C9CE8.7070408@araisoft.com
Whole thread Raw
In response to Re: COPY command details  (Bruce Momjian <bruce@momjian.us>)
Responses Re: COPY command details
List pgsql-general
I have one system which I have used partitioning.  For this particular
case I have tons of data over about (50 years).  What I did is wrote
small loader that breaks data in tables based on date, so I have tables
like abc_2000, abc_2001 etc.  The loading script is only a couple
hundred lines of code.  The only part that was a little bit of work was
to allow for easy access to the data for the data for the devs.  I did
this by writing a few PL functions to automatically union  the tables
and produce results. So the function like getData(startData,enddate)
would run a union query for the respective date ranges.

Benjamin

P.S. If I am doing anything that can be improved for the data access
portion please let me know and feel free to voice your opinions.  I am
always looking for new ways to make this particular database faster.

Gerald Timothy G Quimpo wrote:
> On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote:
>
>> Rebuilding an index can't be the PostgreSQL solution for all
>> cases. I am dealing with databases in the hundreds of gigs
>> range and I am adding about 10gigs of data a week.  At
>> some point its going to take longer than a week to rebuild
>> all of the indexes in the database.
>>
>> On the other hand, if I am to partition the data into
>> several tables then it might not be such a big deal since
>> I am only adding and never deleting... This makes it a
>> little more of a pain in the ass.
>>
>
> I am leaning toward a policy of always partitioning large
> tables.  I haven't found the time to do it properly yet,
> thinking about it, hoping that someone who'se done it will
> chime in with their rules of thumb.
>
> Like Benjamin, I have a database that is close to 600GB
> for 2.25 years of data, and if I were to add the other
> 4 years of data that we have archived away, will easily go
> into the terabyte range.  There are a few individual tables
> which approach 100GB all by themselves.
>
> As it is, I can't afford to do reindex or even backup
> (pg_dump or any other method) or other administrative tasks
> on those tables since the processes take too long (there are
> workarounds, i could backup single tables at slack times,
> which would allow me to do a complete backup (but not
> self-consistent as a set) over the course of a week or so.
>
> So I'm leaning toward partitioning, perhaps selecting
> partition rules so that no table will be larger than
> around 5GB, at which point, reindex or admin procedures
> that take exclusive locks now become only minor
> inconveniences rather than showstoppers.
>
> How do people take consistent backups of very large
> databases on Linux/FreeBSD?  I'm aware of PITR, but
> might not be able to set aside a box with enough
> drives for it.  LVM Snapshot? performance issues with
> LVM, etc?
>
> tiger
>
>

pgsql-general by date:

Previous
From: Benjamin Arai
Date:
Subject: Re: COPY command details
Next
From: Tiger Quimpo
Date:
Subject: Re: COPY command details