Home > mailing lists

Re: Tuning massive UPDATES and GROUP BY's? - Mailing list pgsql-performance

From	Marti Raudsepp
Subject	Re: Tuning massive UPDATES and GROUP BY's?
Date	March 12, 2011 13:08:07
Msg-id	AANLkTi=dGuvB_EuU43CbFUeuaG57-WXvypw+A1WKHhZ8@mail.gmail.com Whole thread
In response to	Re: Tuning massive UPDATES and GROUP BY's? (fork <forkandwait@gmail.com>)
Responses	Re: Tuning massive UPDATES and GROUP BY's?
List	pgsql-performance

Tree view

On Fri, Mar 11, 2011 at 21:06, fork <forkandwait@gmail.com> wrote:
> Like the following?  Will it rebuild the indexes in a sensical way?

Don't insert data into an indexed table. A very important point with
bulk-loading is that you should load all the data first, then create
the indexes. Running multiple (different) CREATE INDEX queries in
parallel can additionally save a lot of time. Also don't move data
back and forth between the tables, just drop the original when you're
done.

Doing this should give a significant performance win. Partitioning
them to fit in cache should improve it further, but I'm not sure
anymore that it's worthwhile considering the costs and extra
maintenance.

> Is there a rule of thumb on tradeoffs in a partitioned table?

The only certain thing is that you'll lose "group" aggregate and
"merge join" query plans. If you only see "HashAggregate" plans when
you EXPLAIN your GROUP BY queries then it probably won't make much of
a difference.

> I would use the partition column whatever I am most likely
> to cluster by in a single big table, right?

Yes.

Regards,
Marti

pgsql-performance by date:

From: "Kevin Grittner"
Date: 11 March 2011, 18:32:18
Subject: Re: ANTI-JOIN needs table, index scan not possible?

From: "John Surcombe"
Date: 13 March 2011, 07:31:48
Subject: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

Re: Tuning massive UPDATES and GROUP BY's? - Mailing list pgsql-performance

Previous

Next