Re: Data archiving/warehousing idea - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Data archiving/warehousing idea
Date
Msg-id 1170329141.3681.501.camel@silverbirch.site
Whole thread Raw
In response to Re: Data archiving/warehousing idea  (Chris Dunlop <chris@onthe.net.au>)
List pgsql-hackers
On Thu, 2007-02-01 at 15:03 +1100, Chris Dunlop wrote:

> > A different approach discussed earlier involves greatly
> > restricting the way in which the table is used. This table
> > could only be written to if an exclusive lock is held; on
> > error or ABORT, the table is truncated.
> 
> You're talking about the "no-WAL" concept?  Not quite the same
> thing I think, but perhaps complimentary to the ARCHIVE idea: I
> wouldn't expect an ARCHIVE table to need to generate any WAL
> entries as it would be read only.

Setting an option to reduce the size of the row headers needs to be done
before its loaded, not after. If you mark a table as APPEND-only it
would be possible to save 4 bytes off the row header. 

Saving a further 8-bytes off the row header by dropping the xmin and
commandid fields isn't a very workable solution since those fields
provide visibility within a transaction and across transactions. You'd
end up with a table that could only be loaded by one transaction and
only touched by one command within that transaction. That would require
bookkeeping on the relation itself (in pg_class) and an
AccessExclusiveLock. In those circumstances you would be able to save on
writing WAL as well. You'd be basically saying that it could only be
loaded by a special utility and would be read-only once loaded.

I'm not crazy about those ideas, even though I think I suggested them
some time back. They're very special case and would probably require
many functions to handle multiple cases, so that additional complexity
and cost would effect all users.

My main doubt relates to the data you're storing. If the events you're
interested in have so little information associated with them that they
are a thin table then storing them at all is questionable. I've been
involved at the design stage of a number of data warehouses and the
amount of data eventually stored is typically < 10% of the number people
first thought of, with number of rows decreasing drastically and the
number of columns increasing slightly as people try to gain additional
value from their data storage. So row overhead should be less of a
concern.

It is possible to save on WAL by doing COPY LOCK or using a table with
different resilience characteristics, both ideas are already on the
TODO. There are also better, less intrusive ways of reducing data volume
and improving load performance.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: pg_restore fails with a custom backup file
Next
From: "Simon Riggs"
Date:
Subject: Re: max_locks_per_transactions ...