Thread: Tuning Postgres for single user manipulating large amounts of data
Hi, Im using Postgres 8.3 on a Macbook Pro Labtop. I using the database with just one db connection to build a lucene search index from some of the data, and Im trying to improve performance. The key thing is that I'm only a single user but manipulating large amounts of data , i.e processing tables with upto 10 million rows in them, so I think want to configure Postgres so that it can create large temporary tables in memory Ive tried changes various paramters such as shared_buffers, work_mem and checkpoint_segments but I don't really understand what they values are, and the documentation seems to be aimed towards configuring for multiple users, and my changes make things worse. For example my machine has 2GB of memory and I read if using as a dedicated server you should set shared memory to 40% of total memory, but when I increase to more than 30MB Postgres will not start complaining about my SHMMAX limit. Paul
> Hi, Im using Postgres 8.3 on a Macbook Pro Labtop. > I using the database with just one db connection to build a lucene > search index from some of the data, and Im trying to improve > performance. The key thing is that I'm only a single user but > manipulating large amounts of data , i.e processing tables with upto 10 > million rows in them, so I think want to configure Postgres so that it > can create large temporary tables in memory I'm not sure what exactly you mean by "manipulating data" - does that mean reading or writing? I'd definitely increase shared_buffers, work_mem (don't be afraid to set work_mem to say 32MB or 64MB - this should be safe with a single connection, although it depends on the queries). To improve writes, increase checkpoint_segments etc. If you really don't need extra safety - e.g. if you have all the data backed up and just need to run some ad-hoc analysis (and it does not matter if it crashes as you can recover it from backup), you can disable fsync. This will make writes much faster, but it won't be safe in case of crash. DON'T DO THIS IF YOU NEED TO KEEP YOUR DATA SAFE! But as you're runninng the app on your laptop, I guess you can live with frync=off. You'll loose the consistency but you'll get better performance. > Ive tried changes various paramters such as shared_buffers, work_mem and > checkpoint_segments but I don't really understand what they values are, > and the documentation seems to be aimed towards configuring for multiple > users, and my changes make things worse. For example my machine has 2GB > of memory and I read if using as a dedicated server you should set > shared memory to 40% of total memory, but when I increase to more than > 30MB Postgres will not start complaining about my SHMMAX limit. You're heading in the right direction I think, but you're hitting kernel limits. A process can't allocate more shared memory (shared buffers) than SHMMAX limit, so you need to bump this up. See this - http://www.postgresql.org/docs/9.0/static/kernel-resources.html There's even a section for MacOS X (which is the OS you're running, I guess). Tomas
On 09/12/2010 15:51, tv@fuzzy.cz wrote: >> Hi, Im using Postgres 8.3 on a Macbook Pro Labtop. >> I using the database with just one db connection to build a lucene >> search index from some of the data, and Im trying to improve >> performance. The key thing is that I'm only a single user but >> manipulating large amounts of data , i.e processing tables with upto 10 >> million rows in them, so I think want to configure Postgres so that it >> can create large temporary tables in memory > I'm not sure what exactly you mean by "manipulating data" - does that mean > reading or writing? > > I'd definitely increase shared_buffers, work_mem (don't be afraid to set > work_mem to say 32MB or 64MB - this should be safe with a single > connection, although it depends on the queries). To improve writes, > increase checkpoint_segments etc. > > If you really don't need extra safety - e.g. if you have all the data > backed up and just need to run some ad-hoc analysis (and it does not > matter if it crashes as you can recover it from backup), you can disable > fsync. This will make writes much faster, but it won't be safe in case of > crash. > > DON'T DO THIS IF YOU NEED TO KEEP YOUR DATA SAFE! > > But as you're runninng the app on your laptop, I guess you can live with > frync=off. You'll loose the consistency but you'll get better performance. > Hi ( sorry for the double posting, thought Id use the wrong email address but both have been posted anyway). As far as the db is concerned Im just reading data then writing the data to a lucene search index (which is outside of the database) , but my labtop is jut a test machine I want to run the same code on production. Why would work_mem not be safe at 64MB if I have 2GB of memory - what have I got to be careful of. Paul
On Thu, 2010-12-09 at 17:40 +0000, Paul Taylor wrote:
> what
> have I got to be careful of.
I think that was in reference to turning fsync off, not work_mem values.
> Hi ( sorry for the double posting, thought Id use the wrong email > address but both have been posted anyway). As far as the db is concerned > Im just reading data then writing the data to a lucene search index (which > is outside of the database) , but my labtop is jut a test machine I want > to run the same code on production. Why would work_mem not be safe at 64MB > if I have 2GB of memory - what have I got to be careful of. > > Paul Well, the problem with work_mem is that it's 'per operation' so a query may actually need several work_mem segments. And it's not just sorting, a query containing a hash join, hash aggregate and a sort may consume up to 3x work_mem memory. And if you have a lot of concurrent users running such queries, you may easily run out of memory - in that case the feared OOM killer comes and kills one of the processes (usually postmaster, which means the database goes bottoms up). Not sure how OOM works on MacOS. But as you said there will be single user running queries on the database, you can set the work_mem limit pretty high. Depends on the queries though - a complicated query may consume a lot of memory. Tomas