Thread: High Disk write and space taken by PostgreSQL

High Disk write and space taken by PostgreSQL

From
J Ramesh Kumar
Date:

Hi,

My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.


MySQLPostgreSQL
Inserts Per Second*15001500
Updates Per Second*6.56.5
Disk Write Per Second*0.9 MB6.2 MB
Database Size Increased Per day* 13 GB36 GB

* approx values

Why this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).

Thanks,
Ramesh

Re: High Disk write and space taken by PostgreSQL

From
David Barton
Date:
Hi Ramesh,

Are you able to provide a table schema?  Were you using MyISAM or InnoDB on MySQL?

If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one?  It may be full of empty tuples.  Is there any period where you could try a full vacuum?

What are your indexes?  Is the size in the indexes or the database tables?

At the current rate of insertion, that table is going to get very large very quickly.  Do you have anything deleting the rows afterwards?  I have no experience with databases past 50M rows, so my questions are just so you can line up the right info for when the real experts get online :-)

Regards, David

On 16/08/12 11:23, J Ramesh Kumar wrote:

Hi,

My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.


MySQLPostgreSQL
Inserts Per Second*15001500
Updates Per Second*6.56.5
Disk Write Per Second*0.9 MB6.2 MB
Database Size Increased Per day* 13 GB36 GB

* approx values

Why this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).

Thanks,
Ramesh

Re: High Disk write and space taken by PostgreSQL

From
J Ramesh Kumar
Date:
Hi David Barton,

Please find the information below. 

Are you able to provide a table schema? 

There are 109 different types of table. I am maintaining some tables are daily tables and some tables are ID based. So totally we have created around 350 tables and dropped around 350 tables. I will drop the old table and I don't delete any records. I am maintaing only last 30 days tables. I dropped tables which are older than 30 days. All the tables are only have basic data types like int, smallint, bigint, varchar.

 
Were you using MyISAM or InnoDB on MySQL?

I am using MyISAM tables in MySQL. 


What are your indexes?  Is the size in the indexes or the database tables?

The size I mentioned is the total folder size of the data directory. There is no difference in the database schema / index between MySQL and PostgreSQL.

If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one?

I don't take backup and restore.

 Is there any period where you could try a full vacuum?

Since my app only doing inserts and drops(no delete), I believe the vacuum will not give any advantage. So I have the below configuration in my database. Event the updates only performed in a very small table which has 5 int + 1 small int + 1 real fields.  

# To avoid freqent autovacuum
autovacuum_freeze_max_age = 2000000000
vacuum_freeze_min_age = 10000000
vacuum_freeze_table_age = 150000000
 
Thanks,
Ramesh

On Thu, Aug 16, 2012 at 9:06 AM, David Barton <dave@oneit.com.au> wrote:
Hi Ramesh,

Are you able to provide a table schema?  Were you using MyISAM or InnoDB on MySQL?

If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one?  It may be full of empty tuples.  Is there any period where you could try a full vacuum?

What are your indexes?  Is the size in the indexes or the database tables?

At the current rate of insertion, that table is going to get very large very quickly.  Do you have anything deleting the rows afterwards?  I have no experience with databases past 50M rows, so my questions are just so you can line up the right info for when the real experts get online :-)

Regards, David


On 16/08/12 11:23, J Ramesh Kumar wrote:

Hi,

My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.


MySQLPostgreSQL
Inserts Per Second*15001500
Updates Per Second*6.56.5
Disk Write Per Second*0.9 MB6.2 MB
Database Size Increased Per day* 13 GB36 GB

* approx values

Why this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).

Thanks,
Ramesh


Re: High Disk write and space taken by PostgreSQL

From
Scott Marlowe
Date:
Please use plain text on the list, some folks don't have mail readers
that can handle html easily.

On Wed, Aug 15, 2012 at 10:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
>
> Hi David Barton,
>
> Please find the information below.
>
>> Are you able to provide a table schema?
>
>
> There are 109 different types of table. I am maintaining some tables are
> daily tables and some tables are ID based. So totally we have created around
> 350 tables and dropped around 350 tables. I will drop the old table and I
> don't delete any records. I am maintaing only last 30 days tables. I dropped
> tables which are older than 30 days. All the tables are only have basic data
> types like int, smallint, bigint, varchar.
>
>
>>
>> Were you using MyISAM or InnoDB on MySQL?
>
>
> I am using MyISAM tables in MySQL.

Well that explains a lot.  MyISAM is not transaction or crash safe.
On a machine with decent hardware (i.e. it doesn't lie about fsync)
you can pull the plugs out the back of your postgresql server and any
committed transactions will still be there.  Your myisam tables in
mysql will be corrupted and data may or may not be there that you
inserted.

MyISAM is great if your data is easily reproduceable or not that
important.  If it's important etc then it's not such a great choice.

Because of the overhead of being transactionally safe, postgresql
actually writes everything twice, once to a write ahead log, and then
flushed out to the actual tables.  It is quite likely that at your
very high write rate you have a LOT of transactional logs.

>> If you back up the database & restore clean, what is the size comparison
>> of the database filed on the restored copy to the existing one?
>
>
> I don't take backup and restore.

That's not the question.  What David is wondering is if you have a lot
of table bloat, for instance from a lot of updates or deletes.
PostgreSQL uses an in-store MVCC system that can bloat your tables
with a lot of deletes / updates happening at once or really fast.  So
it's more of a troubleshooting suggestion.  I'm guessing that since
you don't backup your data it's not that important, so mysql with
myisam may be a better choice in some ways.

OTOH if you need to run complex reporting queries, MySQL's query
planner is dumb as a stump and will likely run very poorly or be
missing features postgresql has like CTEs and what not.  Trade off,
neither db is perfect for everything, but know that complex queries in
mysql can often take many orders of magnitude longer than in pgsql.

>>  Is there any period where you could try a full vacuum?
>
>
> Since my app only doing inserts and drops(no delete), I believe the vacuum
> will not give any advantage. So I have the below configuration in my
> database. Event the updates only performed in a very small table which has 5
> int + 1 small int + 1 real fields.

Ahhh but updates are the basically delete / inserts in disguise, so if
there's enough, then yes, vacuum full would make a difference.

Basically the difference you are seeing is the difference between a
database (postgresql) and a data store (mysql + myisam).  I wonder
what you'd see if you tried mysql with innodb tables, which are
transaction and crash safe like postgresql.  I'm guessing there would
be something a bit closer to parity there.


Re: High Disk write and space taken by PostgreSQL

From
J Ramesh Kumar
Date:
Dear Scott Marlowe,

Thanks for the details. 

As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ?

>>> Ahhh but updates are the basically delete / inserts in disguise, so if there's enough, then yes, vacuum full would make a difference.

The table which get update has very less data ie, only has 900 rows. Out of 10500 tables, only one table is getting update frequently. Is there any way to vacuum a specific table instead of whole database ?

Thanks,
Ramesh

On Thu, Aug 16, 2012 at 10:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Please use plain text on the list, some folks don't have mail readers
that can handle html easily.

On Wed, Aug 15, 2012 at 10:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
>
> Hi David Barton,
>
> Please find the information below.
>
>> Are you able to provide a table schema?
>
>
> There are 109 different types of table. I am maintaining some tables are
> daily tables and some tables are ID based. So totally we have created around
> 350 tables and dropped around 350 tables. I will drop the old table and I
> don't delete any records. I am maintaing only last 30 days tables. I dropped
> tables which are older than 30 days. All the tables are only have basic data
> types like int, smallint, bigint, varchar.
>
>
>>
>> Were you using MyISAM or InnoDB on MySQL?
>
>
> I am using MyISAM tables in MySQL.

Well that explains a lot.  MyISAM is not transaction or crash safe.
On a machine with decent hardware (i.e. it doesn't lie about fsync)
you can pull the plugs out the back of your postgresql server and any
committed transactions will still be there.  Your myisam tables in
mysql will be corrupted and data may or may not be there that you
inserted.

MyISAM is great if your data is easily reproduceable or not that
important.  If it's important etc then it's not such a great choice.

Because of the overhead of being transactionally safe, postgresql
actually writes everything twice, once to a write ahead log, and then
flushed out to the actual tables.  It is quite likely that at your
very high write rate you have a LOT of transactional logs.

>> If you back up the database & restore clean, what is the size comparison
>> of the database filed on the restored copy to the existing one?
>
>
> I don't take backup and restore.

That's not the question.  What David is wondering is if you have a lot
of table bloat, for instance from a lot of updates or deletes.
PostgreSQL uses an in-store MVCC system that can bloat your tables
with a lot of deletes / updates happening at once or really fast.  So
it's more of a troubleshooting suggestion.  I'm guessing that since
you don't backup your data it's not that important, so mysql with
myisam may be a better choice in some ways.

OTOH if you need to run complex reporting queries, MySQL's query
planner is dumb as a stump and will likely run very poorly or be
missing features postgresql has like CTEs and what not.  Trade off,
neither db is perfect for everything, but know that complex queries in
mysql can often take many orders of magnitude longer than in pgsql.

>>  Is there any period where you could try a full vacuum?
>
>
> Since my app only doing inserts and drops(no delete), I believe the vacuum
> will not give any advantage. So I have the below configuration in my
> database. Event the updates only performed in a very small table which has 5
> int + 1 small int + 1 real fields.

Ahhh but updates are the basically delete / inserts in disguise, so if
there's enough, then yes, vacuum full would make a difference.

Basically the difference you are seeing is the difference between a
database (postgresql) and a data store (mysql + myisam).  I wonder
what you'd see if you tried mysql with innodb tables, which are
transaction and crash safe like postgresql.  I'm guessing there would
be something a bit closer to parity there.

Re: High Disk write and space taken by PostgreSQL

From
Claudio Freire
Date:


On Thu, Aug 16, 2012 at 1:30 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
# To avoid freqent autovacuum
autovacuum_freeze_max_age = 2000000000
vacuum_freeze_min_age = 10000000
vacuum_freeze_table_age = 150000000

In general, I'm no expert, but I've heard, increasing freeze_max_age isn't wise. It's there to be decreased, and the risk is data corruption.

You should check PG's docs to be sure, but I think the default is usually safe and fast enough.

And, if you have updates (anywhere), avoiding autovacuum may not be a good idea either. Autovacuum won't bother you on tables you don't update, so I think you're optimizing prematurely here. If you're worrying about it, just increase its naptime.

You'll most definitely need to vacuum pg's catalog with that many (and regular) schema changes, and autovacuum also takes care of that.

You may also want to set asynchronous_commits, to better match MyISAM's characteristics. Or even, just for benchmarking, fsync=off (I wouldn't do it in production though).

Anyway, seeing the schema of at least one of the biggest growing tables would probably help figuring out why the disk usage growth. Index bloat comes to mind.


On Thu, Aug 16, 2012 at 1:30 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
What are your indexes?  Is the size in the indexes or the database tables?

The size I mentioned is the total folder size of the data directory. There is no difference in the database schema / index between MySQL and PostgreSQL.

You have a problem right there. Postgres and Mysql are completely different beasts, you *will* need to tailor indices specifically for each of them. You'll find, probably, many indices you needed in MySQL are no longer needed with postgres (because it has a much more sophisticated planner).

Re: High Disk write and space taken by PostgreSQL

From
Ondrej Ivanič
Date:
Hi,

On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
> As you said, MySQL with MyISAM is better choice for my app. Because I don't
> need transaction/backup. May be I'll try with InnoDB and find the disk
> write/space difference. Is there any similar methods available in postgresql
> like MyISAM engine ?

You can try unlogged tables:
http://www.postgresql.org/docs/9.1/static/sql-createtable.html

If specified, the table is created as an unlogged table. Data written
to unlogged tables is not written to the write-ahead log (see Chapter
29), which makes them considerably faster than ordinary tables.
However, they are not crash-safe: an unlogged table is automatically
truncated after a crash or unclean shutdown. The contents of an
unlogged table are also not replicated to standby servers. Any indexes
created on an unlogged table are automatically unlogged as well;
however, unlogged GiST indexes are currently not supported and cannot
be created on an unlogged table.

>
>>>> Ahhh but updates are the basically delete / inserts in disguise, so if
>>>> there's enough, then yes, vacuum full would make a difference.
>
> The table which get update has very less data ie, only has 900 rows. Out of
> 10500 tables, only one table is getting update frequently. Is there any way
> to vacuum a specific table instead of whole database ?

You can run "vacuum <table name>" but I doubt if that makes sense to
run it manually when you have 1500 tx / sec. Postgres has HOT updates
which have high change to reuse existing space:

From 8.3 release notes:
Heap-Only Tuples (HOT) accelerate space reuse for most UPDATEs and
DELETEs (Pavan Deolasee, with ideas from many others)
UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs.
Previously only VACUUM could reclaim space taken by dead tuples. With
HOT dead tuple space can be automatically reclaimed at the time of
INSERT or UPDATE if no changes are made to indexed columns. This
allows for more consistent performance. Also, HOT avoids adding
duplicate index entries.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)


Re: High Disk write and space taken by PostgreSQL

From
Claudio Freire
Date:
On Thu, Aug 16, 2012 at 2:40 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
>>>> Ahhh but updates are the basically delete / inserts in disguise, so if
>>>> there's enough, then yes, vacuum full would make a difference.
>
> The table which get update has very less data ie, only has 900 rows. Out of
> 10500 tables, only one table is getting update frequently. Is there any way
> to vacuum a specific table instead of whole database ?

Just let autovacuum figure it out. It's smart enough not to touch
insert-only tables last I checked, and you can set I/O limits to make
sure it doesn't interfere.

If you don't care about possible data corruption if the system
crashes, you can set fsync=off and get many of the performance
benefits. But you don't have ways to reduce disk usage other than
dropping indices (and you may have unused indices, do check their
statistics), and making sure autovacuum is running where it's needed.

A backup/restore or a vacuum full + reindex will get rid of all bloat.
If your DB size goes down considerably after that, you have bloat. If
not, you don't. You can even do that with a single (old) table to
check it out.


Re: High Disk write and space taken by PostgreSQL

From
Bruce Momjian
Date:
On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
> Hi,
>
> On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
> > As you said, MySQL with MyISAM is better choice for my app. Because I don't
> > need transaction/backup. May be I'll try with InnoDB and find the disk
> > write/space difference. Is there any similar methods available in postgresql
> > like MyISAM engine ?
>
> You can try unlogged tables:
> http://www.postgresql.org/docs/9.1/static/sql-createtable.html
>
> If specified, the table is created as an unlogged table. Data written
> to unlogged tables is not written to the write-ahead log (see Chapter
> 29), which makes them considerably faster than ordinary tables.
> However, they are not crash-safe: an unlogged table is automatically
> truncated after a crash or unclean shutdown. The contents of an
> unlogged table are also not replicated to standby servers. Any indexes
> created on an unlogged table are automatically unlogged as well;
> however, unlogged GiST indexes are currently not supported and cannot
> be created on an unlogged table.

I would set full_page_writes = off too.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: High Disk write and space taken by PostgreSQL

From
Bruce Momjian
Date:
On Thu, Aug 16, 2012 at 10:53:21AM -0400, Bruce Momjian wrote:
> On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
> > Hi,
> >
> > On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
> > > As you said, MySQL with MyISAM is better choice for my app. Because I don't
> > > need transaction/backup. May be I'll try with InnoDB and find the disk
> > > write/space difference. Is there any similar methods available in postgresql
> > > like MyISAM engine ?
> >
> > You can try unlogged tables:
> > http://www.postgresql.org/docs/9.1/static/sql-createtable.html
> >
> > If specified, the table is created as an unlogged table. Data written
> > to unlogged tables is not written to the write-ahead log (see Chapter
> > 29), which makes them considerably faster than ordinary tables.
> > However, they are not crash-safe: an unlogged table is automatically
> > truncated after a crash or unclean shutdown. The contents of an
> > unlogged table are also not replicated to standby servers. Any indexes
> > created on an unlogged table are automatically unlogged as well;
> > however, unlogged GiST indexes are currently not supported and cannot
> > be created on an unlogged table.
>
> I would set full_page_writes = off too.

Better yet, read our documentation about non-durable settting:

    http://www.postgresql.org/docs/9.1/static/non-durability.html

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: High Disk write and space taken by PostgreSQL

From
Scott Marlowe
Date:
On Wed, Aug 15, 2012 at 11:40 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
> Dear Scott Marlowe,
>
> Thanks for the details.
>
> As you said, MySQL with MyISAM is better choice for my app. Because I don't
> need transaction/backup.

That's not exactly what I said.  Remember that if you need to run
complex queries postgresql is still likely the better candidate.

> May be I'll try with InnoDB and find the disk
> write/space difference. Is there any similar methods available in postgresql
> like MyISAM engine ?

Unlogged tables as mentioned by others.


Re: High Disk write and space taken by PostgreSQL

From
"anarazel@anarazel.de"
Date:

Bruce Momjian <bruce@momjian.us> schrieb:

>On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
>> Hi,
>>
>> On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com>
>wrote:
>> > As you said, MySQL with MyISAM is better choice for my app. Because
>I don't
>> > need transaction/backup. May be I'll try with InnoDB and find the
>disk
>> > write/space difference. Is there any similar methods available in
>postgresql
>> > like MyISAM engine ?
>>
>> You can try unlogged tables:
>> http://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>
>> If specified, the table is created as an unlogged table. Data written
>> to unlogged tables is not written to the write-ahead log (see Chapter
>> 29), which makes them considerably faster than ordinary tables.
>> However, they are not crash-safe: an unlogged table is automatically
>> truncated after a crash or unclean shutdown. The contents of an
>> unlogged table are also not replicated to standby servers. Any
>indexes
>> created on an unlogged table are automatically unlogged as well;
>> however, unlogged GiST indexes are currently not supported and cannot
>> be created on an unlogged table.
>
>I would set full_page_writes = off too.
Why? There shouldn't be any such writes on unlogged tables.

Andres

Please excuse the brevity and formatting - I am writing this on my mobile phone.


Re: High Disk write and space taken by PostgreSQL

From
Bruce Momjian
Date:
On Thu, Aug 16, 2012 at 06:07:26PM +0200, anarazel@anarazel.de wrote:
>
>
> Bruce Momjian <bruce@momjian.us> schrieb:
>
> >On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
> >> Hi,
> >>
> >> On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com>
> >wrote:
> >> > As you said, MySQL with MyISAM is better choice for my app. Because
> >I don't
> >> > need transaction/backup. May be I'll try with InnoDB and find the
> >disk
> >> > write/space difference. Is there any similar methods available in
> >postgresql
> >> > like MyISAM engine ?
> >>
> >> You can try unlogged tables:
> >> http://www.postgresql.org/docs/9.1/static/sql-createtable.html
> >>
> >> If specified, the table is created as an unlogged table. Data written
> >> to unlogged tables is not written to the write-ahead log (see Chapter
> >> 29), which makes them considerably faster than ordinary tables.
> >> However, they are not crash-safe: an unlogged table is automatically
> >> truncated after a crash or unclean shutdown. The contents of an
> >> unlogged table are also not replicated to standby servers. Any
> >indexes
> >> created on an unlogged table are automatically unlogged as well;
> >> however, unlogged GiST indexes are currently not supported and cannot
> >> be created on an unlogged table.
> >
> >I would set full_page_writes = off too.
> Why? There shouldn't be any such writes on unlogged tables.

True.  I was thinking more of the logged tables, and the system tables.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: High Disk write and space taken by PostgreSQL

From
Merlin Moncure
Date:
On Wed, Aug 15, 2012 at 11:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
>
> Hi David Barton,
>
> Please find the information below.
>
>> Are you able to provide a table schema?
>
>
> There are 109 different types of table. I am maintaining some tables are
> daily tables and some tables are ID based. So totally we have created around
> 350 tables and dropped around 350 tables. I will drop the old table and I
> don't delete any records. I am maintaing only last 30 days tables. I dropped
> tables which are older than 30 days. All the tables are only have basic data
> types like int, smallint, bigint, varchar.
>
>
>>
>> Were you using MyISAM or InnoDB on MySQL?
>
>
> I am using MyISAM tables in MySQL.

You can't compare a non-MVCC system such as MyISAM with a MVCC one.
MVCC systems have to store extra accounting information in order to
manage transactions and multiple versions of the same record for SQL
updates.  MVCC isn't all bad: for example you get much better
performance in the face of highly concurrent activity.  MyISAM does
full table locks which are not scalable at all.  The penalty for MVCC
storage may in some cases seem quite high if your tables have very
narrow records.

BTW, I am suspicious that your claim that you 'don't need'
transactions is correct, especially in the long term.

Anyways, there are several techniques to try and mitigate data growth
in postgres -- arrays for example.

merlin