Thread: Tuning Postgres for Single connection use
Any rules of thumb for work_mem
, maintenance_work_mem
, shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.
Cheers,
Gavin
On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 15/04/14 09:46, Nick Eubank wrote:Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.
Cheers,
Gavin
On 04/14/2014 05:46 PM, Nick Eubank wrote: > > Any rules of thumb for |work_mem|, |maintenance_work_mem|, > |shared_buffer|, etc. for a database that DOESN'T anticipate > concurrent connections and that is doing lots of aggregate functions > on large tables? All the advice I can find online on tuning (this > <http://wiki.postgresql.org/wiki/Performance_Optimization>, this > <http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf>, > this > <http://www.revsys.com/writings/postgresql-performance.html> etc.) is > written for people anticipating lots of concurrent connections. > > I'm a social scientist looking to use Postgres not as a database to be > shared by multiple users, but rather as my own tool for manipulating a > massive data set (I have 5 billion transaction records (600gb in csv) > and want to pull out unique user pairs, estimate aggregates for > individual users, etc.). This also means almost no writing, except to > creation of new tables based on selections from the main table. > > I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if > that's important. > > First up would probably be "don't run on Windows". shared_buffers above 512Mb causes performance to degrade on Windows, while that threshold is much higher on *nix. cheers andrew
Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
I've added potentially useful advice below.
On 15/04/14 11:39, Nick Eubank wrote:
Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :)Yeah, I know the feeling!
On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:On 15/04/14 09:46, Nick Eubank wrote:Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.
Cheers,
Gavin
I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL!
Hopefully, someone more knowledgeable will give you some good advice specific to your O/S.
For tables that don't change, consider a packing density of 100%.
Take care in how you design your tables, and the column types.
Consider carefully the queries you are likely to use, so you can design appropriate indexes.
Some advice will depend on the schema you plan to use, and the type of queries.
Cheers,
Gavin
In this list, please bottom post!
I've added potentially useful advice below.
On 15/04/14 11:39, Nick Eubank wrote:Yeah, I know the feeling!Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :)
On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:On 15/04/14 09:46, Nick Eubank wrote:Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.
Cheers,
Gavin
I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL!
Hopefully, someone more knowledgeable will give you some good advice specific to your O/S.
For tables that don't change, consider a packing density of 100%.
Take care in how you design your tables, and the column types.
Consider carefully the queries you are likely to use, so you can design appropriate indexes.
Some advice will depend on the schema you plan to use, and the type of queries.
Cheers,
Gavin
On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank <nickeubank@gmail.com> wrote:Any rules of thumb for
work_mem
,maintenance_work_mem
,shared_buffer
, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering.And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous.If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide.Cheers,Jeff
On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote:I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering.And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous.If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide.Cheers,JeffQuick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in "SHOW ALL" -- integer constraint?). Is there a work around, or should I tweak something else accordingly?
Thanks!Nick(Properly bottom posted this time?)
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Nick Eubank Sent: Tuesday, April 15, 2014 11:12 AM To: Jeff Janes Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Tuning Postgres for Single connection use On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote: On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank <nickeubank@gmail.com> wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrentconnections and that is doing lots of aggregate functions on large tables? All the advice I can find online ontuning (this, this, this etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own toolfor manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out uniqueuser pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creationof new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of doublebuffering. And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems. In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts togo nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous. If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session notsystem wide. Cheers, Jeff Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the valuesjust don't change in "SHOW ALL" -- integer constraint?). Is there a work around, or should I tweak something else accordingly? Thanks! Nick (Properly bottom posted this time?) [Schnabel, Robert D.] Nick, I asked the same question a while ago about work_mem on Windows. See this thread: http://www.postgresql.org/message-id/17895.1315869622@sss.pgh.pa.us Bob
Jeff Janes <jeff.janes@gmail.com> writes: > On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank <nickeubank@gmail.com> wrote: >> Quick followup Jeff: it seems that I can't set work_mem above about 1gb >> (can't get to 2gb. When I update config, the values just don't change in >> "SHOW ALL" -- integer constraint?). Is there a work around, or should I >> tweak something else accordingly? > What version are you using? What is the exact line you put in your config > file? Did you get any errors when using that config file? Are you sure > you actually reloaded the server, so that it reread the config file, rather > than just changing the file and then not applying the change? > I usually set work_mem within a psql connection, in which case you need to > quote the setting if you use units: > set work_mem="3GB"; FWIW, it's generally considered a seriously *bad* idea to set work_mem as high as 1GB in postgresql.conf: you're promising that each query running on the server can use 1GB per sort or hash step. You probably don't have the machine resources to honor that promise. (If you do, I'd like to have your IT budget ;-)) Recommended practice is to keep the global setting conservatively small, and bump it up locally in your session (with SET) for individual queries that need the very large value. But having said that, Postgres doesn't try to enforce any such practice. My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl reload", or else the setting is too large for your platform, in which case there should have been a complaint in the postmaster log. As noted elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit Windows. regards, tom lane
Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank <nickeubank@gmail.com> wrote:
>> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
>> (can't get to 2gb. When I update config, the values just don't change in
>> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
>> tweak something else accordingly?
> What version are you using? What is the exact line you put in your config
> file? Did you get any errors when using that config file? Are you sure
> you actually reloaded the server, so that it reread the config file, rather
> than just changing the file and then not applying the change?
> I usually set work_mem within a psql connection, in which case you need to
> quote the setting if you use units:
> set work_mem="3GB";
FWIW, it's generally considered a seriously *bad* idea to set work_mem as
high as 1GB in postgresql.conf: you're promising that each query running
on the server can use 1GB per sort or hash step. You probably don't have
the machine resources to honor that promise. (If you do, I'd like to have
your IT budget ;-)) Recommended practice is to keep the global setting
conservatively small, and bump it up locally in your session (with SET)
for individual queries that need the very large value.
But having said that, Postgres doesn't try to enforce any such practice.
My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl
reload", or else the setting is too large for your platform, in which case
there should have been a complaint in the postmaster log. As noted
elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
Windows.
regards, tom lane