Re: PGSQL 9.3 - Materialized View - multithreading - Mailing list pgsql-performance

From Graeme B. Bell
Subject Re: PGSQL 9.3 - Materialized View - multithreading
Date
Msg-id D8DF4544-81A4-4465-8AD4-25DA3CC20596@skogoglandskap.no
Whole thread Raw
In response to Re: PGSQL 9.3 - Materialized View - multithreading  (Nicolas Paris <niparisco@gmail.com>)
Responses Re: PGSQL 9.3 - Materialized View - multithreading  (Nicolas Paris <niparisco@gmail.com>)
List pgsql-performance
Hi again Nick.

Glad it helped.

Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can
re-usethe data from cache. 

Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and
fast.
It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't
behappening here, judging by your description. 

If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view
refresh.

Graeme.

On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco@gmail.com> wrote:

> Hello,
> Thanks for this clear explanation !
>
> Then I have a sub-question :
> Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with
somedifferences)  
> Is it faster to :
> 1) parallel refresh  600 time A, then 600 time B etc,
> OR
> 2) parallel refresh  600 time A,B,C,D,E,F
>
> I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same
tableimplies concurency 
>  and bad performance ?
>
> Thanks
>
> Nicolas PARIS
>
>
> 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
> On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote:
>
> > Hello,
> >
> > My question is about multiprocess and materialized View.
> > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a
multithreadway 
> > (anderstand 8 cpu cores -> 8 refresh process  in the same time)
>
> Hi Nick,
>
> out of DB solution:
>
> 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with
selectand format() if you don't have a list already. 
>
> 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at
unknowntimes. 
>
> (In BASH):
>   for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done >
3600commands
>
> 3. Install Gnu Parallel     and type:
>
> parallel < 3600commands
>
> 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it
manuallywith -j. 
> It will also give you a live progress report if you use --progress.
> e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
>
> parallel -j 8 --progress  < 3600commands > /dev/null
>
> 5. If you want to make debugging easier use the parameter --tag to tag output for each command.
>
> Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql
...:-) 
>
> Hope this helps & have a nice day,
>
> Graeme.
>
>
>
>
>
>



pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Batch update query performance
Next
From: Tatsuo Ishii
Date:
Subject: Re: performance degradation after launching postgres cluster using pgpool-II