Thread: Tuning Postgres for Single connection use

Tuning Postgres for Single connection use

From

Nick Eubank

Date:

14 April 2014, 21:46:34

Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.

I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.

I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.

Thanks!

Re: Tuning Postgres for Single connection use

From

Gavin Flower

Date:

14 April 2014, 22:42:13

On 15/04/14 09:46, Nick Eubank wrote:

Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!

Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!

Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.

Cheers,
Gavin

Re: Tuning Postgres for Single connection use

From

Nick Eubank

Date:

14 April 2014, 23:39:32

Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :)

On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 15/04/14 09:46, Nick Eubank wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!

Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.

Cheers,
Gavin

Re: Tuning Postgres for Single connection use

From

Andrew Dunstan

Date:

15 April 2014, 00:12:41

On 04/14/2014 05:46 PM, Nick Eubank wrote:
>
> Any rules of thumb for |work_mem|, |maintenance_work_mem|,
> |shared_buffer|, etc. for a database that DOESN'T anticipate
> concurrent connections and that is doing lots of aggregate functions
> on large tables? All the advice I can find online on tuning (this
> <http://wiki.postgresql.org/wiki/Performance_Optimization>, this
> <http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf>,
> this
> <http://www.revsys.com/writings/postgresql-performance.html> etc.) is
> written for people anticipating lots of concurrent connections.
>
> I'm a social scientist looking to use Postgres not as a database to be
> shared by multiple users, but rather as my own tool for manipulating a
> massive data set (I have 5 billion transaction records (600gb in csv)
> and want to pull out unique user pairs, estimate aggregates for
> individual users, etc.). This also means almost no writing, except to
> creation of new tables based on selections from the main table.
>
> I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if
> that's important.
>
>


First up would probably be "don't run on Windows". shared_buffers above
512Mb causes performance to degrade on Windows, while that threshold is
much higher on *nix.

cheers

andrew

Re: Tuning Postgres for Single connection use

From

Jeff Janes

Date:

15 April 2014, 00:19:28

On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank <nickeubank@gmail.com> wrote:

Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.

In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide.

Cheers,

Jeff

Re: Tuning Postgres for Single connection use

From

Gavin Flower

Date:

15 April 2014, 00:29:22

In this list, please bottom post!

I've added potentially useful advice below.

On 15/04/14 11:39, Nick Eubank wrote:

Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :)

On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 15/04/14 09:46, Nick Eubank wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!

Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.

Cheers,
Gavin

Yeah, I know the feeling!

I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL!

Hopefully, someone more knowledgeable will give you some good advice specific to your O/S.

For tables that don't change, consider a packing density of 100%.

Take care in how you design your tables, and the column types.

Consider carefully the queries you are likely to use, so you can design appropriate indexes.

Some advice will depend on the schema you plan to use, and the type of queries.

Cheers,
Gavin

Re: Tuning Postgres for Single connection use

From

Nick Eubank

Date:

15 April 2014, 00:50:35

Terrific -- thanks Gavin and Jeff! That's incredibly helpful for a n00b like me!

On Mon, Apr 14, 2014 at 5:29 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

In this list, please bottom post!

I've added potentially useful advice below.

On 15/04/14 11:39, Nick Eubank wrote:
Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :)

On Monday, April 14, 2014, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 15/04/14 09:46, Nick Eubank wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.
Thanks!
Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software!

Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2.

Cheers,
Gavin
Yeah, I know the feeling!

I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL!

Hopefully, someone more knowledgeable will give you some good advice specific to your O/S.

For tables that don't change, consider a packing density of 100%.

Take care in how you design your tables, and the column types.

Consider carefully the queries you are likely to use, so you can design appropriate indexes.

Some advice will depend on the schema you plan to use, and the type of queries.

Cheers,
Gavin

Re: Tuning Postgres for Single connection use

From

Nick Eubank

Date:

15 April 2014, 16:12:46

On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank <nickeubank@gmail.com> wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table.
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.

In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide.

Cheers,

Jeff

Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in "SHOW ALL" -- integer constraint?). Is there a work around, or should I tweak something else accordingly?

Thanks!

Nick

(Properly bottom posted this time?)

Re: Tuning Postgres for Single connection use

From

Jeff Janes

Date:

15 April 2014, 16:43:48

On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank <nickeubank@gmail.com> wrote:

On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.

In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide.

Cheers,

Jeff

Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in "SHOW ALL" -- integer constraint?). Is there a work around, or should I tweak something else accordingly?

What version are you using? What is the exact line you put in your config file? Did you get any errors when using that config file? Are you sure you actually reloaded the server, so that it reread the config file, rather than just changing the file and then not applying the change?

I usually set work_mem within a psql connection, in which case you need to quote the setting if you use units:

set work_mem="3GB";

But if you set it system wide in the config file the quotes should not be needed.

Thanks!

Nick

(Properly bottom posted this time?)

Looked good to me.

Cheers,

Jeff

Re: Tuning Postgres for Single connection use

From

"Schnabel, Robert D."

Date:

15 April 2014, 16:43:56

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Nick Eubank
Sent: Tuesday, April 15, 2014 11:12 AM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Tuning Postgres for Single connection use

On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank <nickeubank@gmail.com> wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate
concurrentconnections and that is doing lots of aggregate functions on large tables? All the advice I can find online
ontuning (this, this, this etc.) is written for people anticipating lots of concurrent connections.

I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own
toolfor manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out
uniqueuser pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to
creationof new tables based on selections from the main table. 

I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible.  This minimizes the amount of
doublebuffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems.

In the scenario you describe, it is probably no big deal if you guess too high.  Monitor the process, if it it starts
togo nuts just kill it and start again with a lower work_mem.  If it is a single user system, you can afford to be
adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session
notsystem wide.

Cheers,

Jeff

Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the
valuesjust don't change in "SHOW ALL" -- integer constraint?). Is there a work around, or should I tweak something else
accordingly? 

Thanks!

Nick

(Properly bottom posted this time?) 

[Schnabel, Robert D.] 

Nick,

I asked the same question a while ago about work_mem on Windows.  See this thread:
http://www.postgresql.org/message-id/17895.1315869622@sss.pgh.pa.us

Bob

Re: Tuning Postgres for Single connection use

From

Tom Lane

Date:

15 April 2014, 17:05:39

Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank <nickeubank@gmail.com> wrote:
>> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
>> (can't get to 2gb. When I update config, the values just don't change in
>> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
>> tweak something else accordingly?

> What version are you using?  What is the exact line you put in your config
> file?  Did you get any errors when using that config file?  Are you sure
> you actually reloaded the server, so that it reread the config file, rather
> than just changing the file and then not applying the change?

> I usually set work_mem within a psql connection, in which case you need to
> quote the setting if you use units:
> set work_mem="3GB";

FWIW, it's generally considered a seriously *bad* idea to set work_mem as
high as 1GB in postgresql.conf: you're promising that each query running
on the server can use 1GB per sort or hash step.  You probably don't have
the machine resources to honor that promise.  (If you do, I'd like to have
your IT budget ;-))  Recommended practice is to keep the global setting
conservatively small, and bump it up locally in your session (with SET)
for individual queries that need the very large value.

But having said that, Postgres doesn't try to enforce any such practice.
My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl
reload", or else the setting is too large for your platform, in which case
there should have been a complaint in the postmaster log.  As noted
elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
Windows.

            regards, tom lane

Tuning Postgres for Single connection use

From

Nick Eubank

Date:

15 April 2014, 18:52:18

On Tuesday, April 15, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank <nickeubank@gmail.com> wrote:
>> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
>> (can't get to 2gb. When I update config, the values just don't change in
>> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
>> tweak something else accordingly?

> What version are you using? What is the exact line you put in your config
> file? Did you get any errors when using that config file? Are you sure
> you actually reloaded the server, so that it reread the config file, rather
> than just changing the file and then not applying the change?

> I usually set work_mem within a psql connection, in which case you need to
> quote the setting if you use units:
> set work_mem="3GB";

FWIW, it's generally considered a seriously *bad* idea to set work_mem as
high as 1GB in postgresql.conf: you're promising that each query running
on the server can use 1GB per sort or hash step. You probably don't have
the machine resources to honor that promise. (If you do, I'd like to have
your IT budget ;-)) Recommended practice is to keep the global setting
conservatively small, and bump it up locally in your session (with SET)
for individual queries that need the very large value.

But having said that, Postgres doesn't try to enforce any such practice.
My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl
reload", or else the setting is too large for your platform, in which case
there should have been a complaint in the postmaster log. As noted
elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
Windows.

regards, tom lane

Thanks Tom -- quick follow up: I know that 1gb work_mem is a terrible idea for normal postgres users with lots of concurrent users, but for my situations where there will only ever be one connection running one query, why is that a problem on a machine with 16gb of ram.

Re:Robert -- thanks for that clarification!