Re: COPY command details - Mailing list pgsql-general

From Tiger Quimpo
Subject Re: COPY command details
Date
Msg-id 1175235100.969.52.camel@tiger-work.ramcarnet.com
Whole thread Raw
In response to Re: COPY command details  (Benjamin Arai <benjamin@araisoft.com>)
List pgsql-general
On Thu, 2007-03-29 at 22:15 -0700, Benjamin Arai wrote:
> I have one system which I have used partitioning.  For this particular
> case I have tons of data over about (50 years).  What I did is wrote
> small loader that breaks data in tables based on date, so I have tables
> like abc_2000, abc_2001 etc.  The loading script is only a couple
> hundred lines of code.  The only part that was a little bit of work was
> to allow for easy access to the data for the data for the devs.  I did
> this by writing a few PL functions to automatically union  the tables
> and produce results. So the function like getData(startData,enddate)
> would run a union query for the respective date ranges.

From reading on the list, I think the current recommendations
are:

  have a parent table and then create the partitions as
  descendants (INHERITS) of the parent table.

  Use constraints to ensure that nothing gets inserted
  into the parents, and that only the correct data gets
  inserted into the descendants (i.e., the constraints
  enforce that 2002 data won't insert into the 2001 partition,
  the insert would fail if you tried that).

  Turn constraint_exclusion on in postgresql.conf (helps
  the planner ignore partitions that don't qualify), so
  it doesn't need to scan partitions where there will be no
  matches anyway.

  Use rules on the parent to redirect inserts/updates/deletes
  to the right partition (or trigger, i'm having some problems
  with both rules and triggers, so I may have the loader
  insert straight to the right partition instead, although
  I'm still trying to get the rules right.

The advantage of this (parent table plus descendant partitions)
is that you can query from the parent, and the descendants will
be automatically queried, you'll get all matching rows from any
descendants that have them.  No need for those unions or
pl/pgsql functions to do the unions for you.

That said, I need to make my current code work with smaller
test data sets so I can isolate the problems I'm having with
rules, etc.  E.g., my original plan was to detect automatically
(in a before trigger) if a partition for the to-be-loaded data
set already exists, and if it doesn't, create the partitions
and the rules.  I then load the data into the parent table and
expect the newly created rules to redirect the data into the
new partitions. Either my rules are wrogn, or there's something
about transaction semantics that's making that fail since the
rows are inserting into the parent table (no constraint there
yet to disallow inserts into the parent) instead of the right
partition.

To The List:
Are schema changes (such as adding rules and creating child
partitions) part of the same transaction or do they happen
(magically) outside the current transaction, in a new
transaction (e.g., so that the inserts would fail because
the inserts are running in an older transaction that can't
see the new rule or new partition table yet).

As I said, I might just be doing something wrogn there.
I've set that project aside for now since it's not
urgent (the current unpartitioned system works well enough,
I just avoid reindex, vacuum, vacuum full and pg_dump), but
I'll start working on it again when I get some free time in
a week or two (Holy Week, holiday where I am :-).

tiger




pgsql-general by date:

Previous
From: Tiger Quimpo
Date:
Subject: Re: COPY command details
Next
From: Richard Huxton
Date:
Subject: Re: coalesce for null AND empty strings