Home > mailing lists

Re: Postgres for a "data warehouse", 5-10 TB - Mailing list pgsql-performance

From	Marti Raudsepp
Subject	Re: Postgres for a "data warehouse", 5-10 TB
Date	September 13, 2011 12:45:04
Msg-id	CABRT9RAJvq0bzBnqtEzc=80H7oYPYC4LggKWZOYcDbRRMnskJA@mail.gmail.com Whole thread
In response to	Re: Postgres for a "data warehouse", 5-10 TB (Robert Klemme <shortcutter@googlemail.com>)
Responses	Re: Postgres for a "data warehouse", 5-10 TB
List	pgsql-performance

Tree view

On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:
> In the case of PG this particular example will work:
> 1. TX inserts new PK row
> 2. TX tries to insert same PK row => blocks
> 1. TX commits
> 2. TX fails with PK violation
> 2. TX does the update (if the error is caught)

That goes against the point I was making in my earlier comment. In
order to implement this error-catching logic, you'll have to allocate
a new subtransaction (transaction ID) for EVERY ROW you insert. If
you're going to be loading billions of rows this way, you will invoke
the wrath of the "vacuum freeze" process, which will seq-scan all
older tables and re-write every row that it hasn't touched yet. You'll
survive it if your database is a few GB in size, but in the terabyte
land that's unacceptable. Transaction IDs are a scarce resource there.

In addition, such blocking will limit the parallelism you will get
from multiple inserters.

Regards,
Marti

pgsql-performance by date:

From: Anthony Presley
Date: 13 September 2011, 08:56:57
Subject: PG 9.x prefers slower Hash Joins?

From: Robert Klemme
Date: 13 September 2011, 13:34:19
Subject: Re: Postgres for a "data warehouse", 5-10 TB

Re: Postgres for a "data warehouse", 5-10 TB - Mailing list pgsql-performance

Previous

Next