Re: Benchmark Data requested - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Benchmark Data requested
Date
Msg-id 1202221496.4252.680.camel@ebony.site
Whole thread Raw
In response to Re: Benchmark Data requested  (Dimitri Fontaine <dfontaine@hi-media.com>)
Responses Re: Benchmark Data requested
Re: Benchmark Data requested
List pgsql-performance
On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote:
> Hi,
>
> Le lundi 04 février 2008, Jignesh K. Shah a écrit :
> > Single stream loader of PostgreSQL takes hours to load data. (Single
> > stream load... wasting all the extra cores out there)
>
> I wanted to work on this at the pgloader level, so CVS version of pgloader is
> now able to load data in parallel, with a python thread per configured
> section (1 section = 1 data file = 1 table is often the case).
> Not configurable at the moment, but I plan on providing a "threads" knob which
> will default to 1, and could be -1 for "as many thread as sections".

That sounds great. I was just thinking of asking for that :-)

I'll look at COPY FROM internals to make this faster. I'm looking at
this now to refresh my memory; I already had some plans on the shelf.

> > Multiple table loads ( 1 per table) spawned via script  is bit better
> > but hits wal problems.
>
> pgloader will too hit the WAL problem, but it still may have its benefits, or
> at least we will soon (you can already if you take it from CVS) be able to
> measure if the parallel loading at the client side is a good idea perf. wise.

Should be able to reduce lock contention, but not overall WAL volume.

> [...]
> > I have not even started Partitioning of tables yet since with the
> > current framework, you have to load the tables separately into each
> > tables which means for the TPC-H data you need "extra-logic" to take
> > that table data and split it into each partition child table. Not stuff
> > that many people want to do by hand.
>
> I'm planning to add ddl-partitioning support to pgloader:
>   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00460.php
>
> The basic idea is for pgloader to ask PostgreSQL about constraint_exclusion,
> pg_inherits and pg_constraint and if pgloader recognize both the CHECK
> expression and the datatypes involved, and if we can implement the CHECK in
> python without having to resort to querying PostgreSQL, then we can run a
> thread per partition, with as many COPY FROM running in parallel as there are
> partition involved (when threads = -1).
>
> I'm not sure this will be quicker than relying on PostgreSQL trigger or rules
> as used for partitioning currently, but ISTM Jignesh quoted § is just about
> that.

Much better than triggers and rules, but it will be hard to get it to
work.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


pgsql-performance by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Benchmark Data requested
Next
From: Matthew
Date:
Subject: Re: Benchmark Data requested