Re: PGSQL 9.3 - Materialized View - multithreading - Mailing list pgsql-performance

From Nicolas Paris
Subject Re: PGSQL 9.3 - Materialized View - multithreading
Date
Msg-id CA+ssMOTd=CRq6oitEXBncrEuTk-A7eRgZBPwRpPGe9=StfhELg@mail.gmail.com
Whole thread Raw
In response to Re: PGSQL 9.3 - Materialized View - multithreading  ("Graeme B. Bell" <grb@skogoglandskap.no>)
Responses Re: PGSQL 9.3 - Materialized View - multithreading  ("Graeme B. Bell" <grb@skogoglandskap.no>)
List pgsql-performance
Hello,
Thanks for this clear explanation !

Then I have a sub-question :
Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences)
Is it faster to :
1) parallel refresh  600 time A, then 600 time B etc,
OR
2) parallel refresh  600 time A,B,C,D,E,F

I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency
 and bad performance ?

Thanks

Nicolas PARIS


2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote:

> Hello,
>
> My question is about multiprocess and materialized View.
> http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
> (anderstand 8 cpu cores -> 8 refresh process  in the same time)

Hi Nick,

out of DB solution:

1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already.

2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times.

(In BASH):
  for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands

3. Install Gnu Parallel     and type:

parallel < 3600commands

4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j.
It will also give you a live progress report if you use --progress.
e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null

parallel -j 8 --progress  < 3600commands > /dev/null

5. If you want to make debugging easier use the parameter --tag to tag output for each command.

Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-)

Hope this helps & have a nice day,

Graeme.






pgsql-performance by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: Batch update query performance
Next
From: Heikki Linnakangas
Date:
Subject: Re: Batch update query performance