Re: PGSQL 9.3 - Materialized View - multithreading - Mailing list pgsql-performance

From Graeme B. Bell
Subject Re: PGSQL 9.3 - Materialized View - multithreading
Date
Msg-id 821FA4DC-E084-4A5F-A8D0-50722D5B11E3@skogoglandskap.no
Whole thread Raw
In response to Re: PGSQL 9.3 - Materialized View - multithreading  (Nicolas Paris <niparisco@gmail.com>)
List pgsql-performance
- http://wiki.postgresql.org/wiki/Performance_Optimization
- run it on the most powerful machine you can find
- get some more memory
- get a big (512-1TB) SSD drive
- avoid recalculating the same things over and over. if your views have many similar elements, then calculate those
firstinto a partial result, then build the final views from the partial result. 
- make sure your source tables are fully indexed and have good statistics
- run all the views once with \timing and keep track of how long they took. Fix the slow ones.

G


On 07 Apr 2014, at 15:56, Nicolas Paris <niparisco@gmail.com> wrote:

> Excellent.
>
> Maybe the last sub-question :
>
> Those 3600 mat views do have indexes.
> I guess I will get better performances in dropping indexes first, then refresh, then re-creating indexes.
>
> Are there other way to improve performances (like mat views storage parameters), because this routines will be at
night,and need to be finished quickly. 
>
> Thanks
>
> Nicolas PARIS
>
>
> 2014-04-07 14:59 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
>
> Hi again Nick.
>
> Glad it helped.
>
> Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can
re-usethe data from cache. 
>
> Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and
fast.
> It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That
shouldn'tbe happening here, judging by your description. 
>
> If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view
refresh.
>
> Graeme.
>
> On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco@gmail.com> wrote:
>
> > Hello,
> > Thanks for this clear explanation !
> >
> > Then I have a sub-question :
> > Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times
withsome differences) 
> > Is it faster to :
> > 1) parallel refresh  600 time A, then 600 time B etc,
> > OR
> > 2) parallel refresh  600 time A,B,C,D,E,F
> >
> > I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the
sametable implies concurency 
> >  and bad performance ?
> >
> > Thanks
> >
> > Nicolas PARIS
> >
> >
> > 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
> > On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > My question is about multiprocess and materialized View.
> > > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> > > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a
multithreadway 
> > > (anderstand 8 cpu cores -> 8 refresh process  in the same time)
> >
> > Hi Nick,
> >
> > out of DB solution:
> >
> > 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with
selectand format() if you don't have a list already. 
> >
> > 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at
unknowntimes. 
> >
> > (In BASH):
> >   for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done >
3600commands
> >
> > 3. Install Gnu Parallel     and type:
> >
> > parallel < 3600commands
> >
> > 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control
itmanually with -j. 
> > It will also give you a live progress report if you use --progress.
> > e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
> >
> > parallel -j 8 --progress  < 3600commands > /dev/null
> >
> > 5. If you want to make debugging easier use the parameter --tag to tag output for each command.
> >
> > Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql
...:-) 
> >
> > Hope this helps & have a nice day,
> >
> > Graeme.
> >
> >
> >
> >
> >
> >
>
>



pgsql-performance by date:

Previous
From: Nicolas Paris
Date:
Subject: Re: PGSQL 9.3 - Materialized View - multithreading
Next
From: Ryan Johnson
Date:
Subject: Re: SSI slows down over time