Home > mailing lists

parallel query evaluation - Mailing list pgsql-performance

From	Oliver Seidel
Subject	parallel query evaluation
Date	November 10, 2012 14:05:09
Msg-id	214a89820818a3da44bcadb4ab829c14@os10000.net Whole thread Raw
Responses	Re: parallel query evaluation
List	pgsql-performance

Tree view

Hi,

I have

             create table x ( att bigint, val bigint, hash varchar(30)
);

with 693million rows.  The query

             create table y as select att, val, count(*) as cnt from x
group by att, val;

ran for more than 2000 minutes and used 14g memory on an 8g physical
RAM machine -- eventually I stopped it.  Doing

             create table y ( att bigint, val bigint, cnt int );
             and something a bit like: for i in `seq 0 255` | xargs -n 1
-P 6
                         psql -c "insert into y select att, val,
count(*) from x where att%256=$1 group by att, val" test

runs 6 out of 256 in 10 minutes -- meaning the whole problem can be
done in just under 3 hours.

Question 1: do you see any reason why the second method would yield a
different result from the first method?
Question 2: is that method generalisabl so that it could be included in
the base system without manual shell glue?

Thanks,

Oliver

pgsql-performance by date:

From: aasat
Date: 10 November 2012, 14:02:31
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2

From: Petr Praus
Date: 10 November 2012, 14:28:17
Subject: Re: Re: Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries

parallel query evaluation - Mailing list pgsql-performance

Previous

Next