On Fri, Dec 19, 2014 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> And here is a new version.
Here is another new version, with lots of bugs fixed. The worker
shutdown sequence is now much more robust, although I think there may
still be a bug or two lurking, and I fixed a bunch of other things
too. There's now a function called parallel_count() in the
parallel_dummy extension contained herein, which does a parallel count
of a relation you choose:
rhaas=# select count(*) from pgbench_accounts;
count
---------
4000000
(1 row)
Time: 396.635 ms
rhaas=# select parallel_count('pgbench_accounts'::regclass, 0);
NOTICE: PID 2429 counted 4000000 tuples
parallel_count
----------------
4000000
(1 row)
Time: 234.445 ms
rhaas=# select parallel_count('pgbench_accounts'::regclass, 4);
NOTICE: PID 2499 counted 583343 tuples
CONTEXT: parallel worker, pid 2499
NOTICE: PID 2500 counted 646478 tuples
CONTEXT: parallel worker, pid 2500
NOTICE: PID 2501 counted 599813 tuples
CONTEXT: parallel worker, pid 2501
NOTICE: PID 2502 counted 611389 tuples
CONTEXT: parallel worker, pid 2502
NOTICE: PID 2429 counted 1558977 tuples
parallel_count
----------------
4000000
(1 row)
Time: 150.004 ms
rhaas=# select parallel_count('pgbench_accounts'::regclass, 8);
NOTICE: PID 2429 counted 1267566 tuples
NOTICE: PID 2504 counted 346236 tuples
CONTEXT: parallel worker, pid 2504
NOTICE: PID 2505 counted 345077 tuples
CONTEXT: parallel worker, pid 2505
NOTICE: PID 2506 counted 355325 tuples
CONTEXT: parallel worker, pid 2506
NOTICE: PID 2507 counted 350872 tuples
CONTEXT: parallel worker, pid 2507
NOTICE: PID 2508 counted 338855 tuples
CONTEXT: parallel worker, pid 2508
NOTICE: PID 2509 counted 336903 tuples
CONTEXT: parallel worker, pid 2509
NOTICE: PID 2511 counted 326716 tuples
CONTEXT: parallel worker, pid 2511
NOTICE: PID 2510 counted 332450 tuples
CONTEXT: parallel worker, pid 2510
parallel_count
----------------
4000000
(1 row)
Time: 166.347 ms
This example table (pgbench_accounts, scale 40, ~537 MB) is small
enough that parallelism doesn't really make sense; you can see from
the notice messages above that the master manages to count a quarter
of the table before the workers get themselves up and running. The
pointer is rather to show how the infrastructure works and that it can
be used to write code to do practically useful tasks in a surprisingly
small number of lines of code; parallel_count is only maybe ~100 lines
on top of the base patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company