Thread: Configuration tips for very large database

Configuration tips for very large database

From
Nico Sabbi
Date:
Hello,
I've been away from  postgres for several years, so please forgive me if
I forgot nearly everything:-)

I've just inherited a database collecting environmental data. There's a
background process continually inserting records (not so often, to say
the truth) and a web interface to query data.
At the moment the record count of the db is 250M and growing all the
time. The 3 main tables have just 3 columns.

Queries get executed very very slowly, say 20 minutes. The most evident
problem I see is that io wait load is almost always 90+% while querying
data, 30-40% when "idle" (so to say).
Obviously disk access is to blame, but I'm a bit surprised because the
cluster where this db is running is not at all old iron: it's a vmware
VM with 16GB ram, 4cpu 2.2Ghz, 128GB disk (half of which used). The disk
system underlying vmware is quite powerful, this postgres is the only
system that runs slowly in this cluster.
I can increase resources if necessary, but..

Even before analyzing queries (that I did) I'd like to know if someone
has already succeeded in running postgres with 200-300M records with
queries running much faster than this. I'd like to compare the current
configuration with a super-optimized one to identify the parameters that
need to be changed.
Any link to a working configuration would be very appreciated.

Thanks for any help,
   Nico


Re: Configuration tips for very large database

From
Kevin Grittner
Date:
Nico Sabbi <nicola.sabbi@poste.it> wrote:

> Queries get executed very very slowly, say 20 minutes.

> I'd like to know if someone has already succeeded in running
> postgres with 200-300M records with queries running much faster
> than this.

If you go to the http://wcca.wicourts.gov/ web site, bring up any
case, and click the "Court Record Events" button, it will search a
table with hundreds of millions of rows.  The table is not
partitioned, but has several indexes on it which are useful for
queries such as the one that is used when you click the button.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Configuration tips for very large database

From
Nico Sabbi
Date:
On 02/12/2015 11:38 PM, Kevin Grittner wrote:
>
> If you go to the http://wcca.wicourts.gov/ web site, bring up any
> case, and click the "Court Record Events" button, it will search a
> table with hundreds of millions of rows.  The table is not
> partitioned, but has several indexes on it which are useful for
> queries such as the one that is used when you click the button.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Impressive. Can you give any hint on the configuration and on the
underlying hardware?


Re: Configuration tips for very large database

From
Kevin Grittner
Date:
Nico Sabbi <nicola.sabbi@poste.it> wrote:

> Can you give any hint on the configuration and on the underlying
> hardware?

Well, this particular web site has millions of hits per day
(running up to about 20 queries per hit) from thousands of
concurrent web users, while accepting logical replication from
thousands of OLTP users via logical replication, so you probably
don't need equivalent hardware.  If I recall correctly it is
running 32 cores with 512GB RAM running two PostgreSQL clusters,
each multiple TB, and each having a RAID 5 array of 40 drives,
plus separate controllers and RAID for OS and WAL.

For server configuration, see these Wiki pages for the general
tuning techniques used:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

The best course to solve your problem would probably be to review
those and see what might apply, and if you still have a problem
pick a specific slow-running query and use the process described
here:

https://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Configuration tips for very large database

From
"Mathis, Jason"
Date:
I can't speak to the numbers postgresql can or cannot do but the numbers above sound very very doable. If you can get a hold of greg smith's postgresql high performance, I always liked his method of tuning buffers and checkpoints using the background writer stats. All of which can help with the IO load and caching. 

good luck! 

 

On Thu, Feb 12, 2015 at 4:55 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Nico Sabbi <nicola.sabbi@poste.it> wrote:

> Can you give any hint on the configuration and on the underlying
> hardware?

Well, this particular web site has millions of hits per day
(running up to about 20 queries per hit) from thousands of
concurrent web users, while accepting logical replication from
thousands of OLTP users via logical replication, so you probably
don't need equivalent hardware.  If I recall correctly it is
running 32 cores with 512GB RAM running two PostgreSQL clusters,
each multiple TB, and each having a RAID 5 array of 40 drives,
plus separate controllers and RAID for OS and WAL.

For server configuration, see these Wiki pages for the general
tuning techniques used:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

The best course to solve your problem would probably be to review
those and see what might apply, and if you still have a problem
pick a specific slow-running query and use the process described
here:

https://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Configuration tips for very large database

From
Claudio Freire
Date:
On Thu, Feb 12, 2015 at 7:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Nico Sabbi <nicola.sabbi@poste.it> wrote:
>
>> Queries get executed very very slowly, say 20 minutes.
>
>> I'd like to know if someone has already succeeded in running
>> postgres with 200-300M records with queries running much faster
>> than this.
>
> If you go to the http://wcca.wicourts.gov/ web site, bring up any
> case, and click the "Court Record Events" button, it will search a
> table with hundreds of millions of rows.  The table is not
> partitioned, but has several indexes on it which are useful for
> queries such as the one that is used when you click the button.

I have a table with ~800M rows, wide ones, that runs reporting queries
quite efficiently (usually seconds).

Of course, the queries don't traverse the whole table. That wouldn't
be efficient. That's probably the key there, don't make you database
process the whole thing every time if you expect it to be scalable.

What kind of queries are you running that have slowed down?

Post an explain analyze so people can diagnose. Possibly it's a
query/indexing issue rather than a hardware one.


Re: Configuration tips for very large database

From
"ktm@rice.edu"
Date:
On Thu, Feb 12, 2015 at 11:25:54PM +0100, Nico Sabbi wrote:
> Hello,
> I've been away from  postgres for several years, so please forgive
> me if I forgot nearly everything:-)
>
> I've just inherited a database collecting environmental data.
> There's a background process continually inserting records (not so
> often, to say the truth) and a web interface to query data.
> At the moment the record count of the db is 250M and growing all the
> time. The 3 main tables have just 3 columns.
>
> Queries get executed very very slowly, say 20 minutes. The most
> evident problem I see is that io wait load is almost always 90+%
> while querying data, 30-40% when "idle" (so to say).
> Obviously disk access is to blame, but I'm a bit surprised because
> the cluster where this db is running is not at all old iron: it's a
> vmware VM with 16GB ram, 4cpu 2.2Ghz, 128GB disk (half of which
> used). The disk system underlying vmware is quite powerful, this
> postgres is the only system that runs slowly in this cluster.
> I can increase resources if necessary, but..
>
> Even before analyzing queries (that I did) I'd like to know if
> someone has already succeeded in running postgres with 200-300M
> records with queries running much faster than this. I'd like to
> compare the current configuration with a super-optimized one to
> identify the parameters that need to be changed.
> Any link to a working configuration would be very appreciated.
>
> Thanks for any help,
>   Nico
>

Hi Nico,

No one has mentioned the elephant in the room, but a database can
be very I/O intensive and you may not be getting the performance
you need from your virtual disk running on your VMware disk subsystem.
What do IOmeter or other disk performance evaluation software report?

Regards,
Ken


Re: Configuration tips for very large database

From
"Graeme B. Bell"
Date:
>>
>
> Hi Nico,
>
> No one has mentioned the elephant in the room, but a database can
> be very I/O intensive and you may not be getting the performance
> you need from your virtual disk running on your VMware disk subsystem.
> What do IOmeter or other disk performance evaluation software report?
>
> Regards,
> Ken

Anecdatum:

Moving from a contended VMware hard-disk based filesystem running over the network, to a bare metal RAID10 SSD,
resultedin many DB operations running 20-30x faster. 

Table sizes circa 10-20G, millions of rows.

Graeme.

Re: Configuration tips for very large database

From
Nico Sabbi
Date:
On 02/13/2015 12:19 AM, Claudio Freire wrote:
> I have a table with ~800M rows, wide ones, that runs reporting queries
> quite efficiently (usually seconds).
>
> Of course, the queries don't traverse the whole table. That wouldn't
> be efficient. That's probably the key there, don't make you database
> process the whole thing every time if you expect it to be scalable.
>
> What kind of queries are you running that have slowed down?
>
> Post an explain analyze so people can diagnose. Possibly it's a
> query/indexing issue rather than a hardware one.
>

Thanks everybody for the answers. At the moment I don't have the queries
at hand (saturday:-) ).
I'll post them next week.

I'd really like to avoid data partitioning if possible. It's a thing
that gives me a strong stomach ache.