Thread: High Disk write and space taken by PostgreSQL
Hi,
My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.
MySQL | PostgreSQL | |
Inserts Per Second* | 1500 | 1500 |
Updates Per Second* | 6.5 | 6.5 |
Disk Write Per Second* | 0.9 MB | 6.2 MB |
Database Size Increased Per day* | 13 GB | 36 GB |
* approx values
Why this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).
Thanks,
Ramesh
Hi Ramesh,
Are you able to provide a table schema? Were you using MyISAM or InnoDB on MySQL?
If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one? It may be full of empty tuples. Is there any period where you could try a full vacuum?
What are your indexes? Is the size in the indexes or the database tables?
At the current rate of insertion, that table is going to get very large very quickly. Do you have anything deleting the rows afterwards? I have no experience with databases past 50M rows, so my questions are just so you can line up the right info for when the real experts get online :-)
Regards, David
Are you able to provide a table schema? Were you using MyISAM or InnoDB on MySQL?
If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one? It may be full of empty tuples. Is there any period where you could try a full vacuum?
What are your indexes? Is the size in the indexes or the database tables?
At the current rate of insertion, that table is going to get very large very quickly. Do you have anything deleting the rows afterwards? I have no experience with databases past 50M rows, so my questions are just so you can line up the right info for when the real experts get online :-)
Regards, David
On 16/08/12 11:23, J Ramesh Kumar wrote:
Hi,My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.
MySQL PostgreSQL Inserts Per Second* 1500 1500 Updates Per Second* 6.5 6.5 Disk Write Per Second* 0.9 MB 6.2 MB Database Size Increased Per day* 13 GB 36 GB * approx valuesWhy this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).Thanks,Ramesh
Hi David Barton,
Please find the information below.
Are you able to provide a table schema?
There are 109 different types of table. I am maintaining some tables are daily tables and some tables are ID based. So totally we have created around 350 tables and dropped around 350 tables. I will drop the old table and I don't delete any records. I am maintaing only last 30 days tables. I dropped tables which are older than 30 days. All the tables are only have basic data types like int, smallint, bigint, varchar.
Were you using MyISAM or InnoDB on MySQL?
What are your indexes? Is the size in the indexes or the database tables?
If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one?
I don't take backup and restore.
Is there any period where you could try a full vacuum?
Since my app only doing inserts and drops(no delete), I believe the vacuum will not give any advantage. So I have the below configuration in my database. Event the updates only performed in a very small table which has 5 int + 1 small int + 1 real fields.
# To avoid freqent autovacuum
autovacuum_freeze_max_age = 2000000000
vacuum_freeze_min_age = 10000000
vacuum_freeze_table_age = 150000000
Ramesh
On Thu, Aug 16, 2012 at 9:06 AM, David Barton <dave@oneit.com.au> wrote:
Hi Ramesh,
Are you able to provide a table schema? Were you using MyISAM or InnoDB on MySQL?
If you back up the database & restore clean, what is the size comparison of the database filed on the restored copy to the existing one? It may be full of empty tuples. Is there any period where you could try a full vacuum?
What are your indexes? Is the size in the indexes or the database tables?
At the current rate of insertion, that table is going to get very large very quickly. Do you have anything deleting the rows afterwards? I have no experience with databases past 50M rows, so my questions are just so you can line up the right info for when the real experts get online :-)
Regards, DavidOn 16/08/12 11:23, J Ramesh Kumar wrote:
Hi,My application has high data intensive operations (high number of inserts 1500 per sec.). I switched my application from MySQL to PostgreSQL. When I take performance comparison report between mysql and pgsql, I found that, there are huge difference in disk writes and disk space taken. Below stats shows the difference between MySQL and PostgreSQL.
MySQL PostgreSQL Inserts Per Second* 1500 1500 Updates Per Second* 6.5 6.5 Disk Write Per Second* 0.9 MB 6.2 MB Database Size Increased Per day* 13 GB 36 GB * approx valuesWhy this huge difference in disk writes and disk space utilization? How can I reduce the disk write and space ? Kindly help me. Please let me know, if you require any other information(such as postgres.conf).Thanks,Ramesh
Please use plain text on the list, some folks don't have mail readers that can handle html easily. On Wed, Aug 15, 2012 at 10:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > > Hi David Barton, > > Please find the information below. > >> Are you able to provide a table schema? > > > There are 109 different types of table. I am maintaining some tables are > daily tables and some tables are ID based. So totally we have created around > 350 tables and dropped around 350 tables. I will drop the old table and I > don't delete any records. I am maintaing only last 30 days tables. I dropped > tables which are older than 30 days. All the tables are only have basic data > types like int, smallint, bigint, varchar. > > >> >> Were you using MyISAM or InnoDB on MySQL? > > > I am using MyISAM tables in MySQL. Well that explains a lot. MyISAM is not transaction or crash safe. On a machine with decent hardware (i.e. it doesn't lie about fsync) you can pull the plugs out the back of your postgresql server and any committed transactions will still be there. Your myisam tables in mysql will be corrupted and data may or may not be there that you inserted. MyISAM is great if your data is easily reproduceable or not that important. If it's important etc then it's not such a great choice. Because of the overhead of being transactionally safe, postgresql actually writes everything twice, once to a write ahead log, and then flushed out to the actual tables. It is quite likely that at your very high write rate you have a LOT of transactional logs. >> If you back up the database & restore clean, what is the size comparison >> of the database filed on the restored copy to the existing one? > > > I don't take backup and restore. That's not the question. What David is wondering is if you have a lot of table bloat, for instance from a lot of updates or deletes. PostgreSQL uses an in-store MVCC system that can bloat your tables with a lot of deletes / updates happening at once or really fast. So it's more of a troubleshooting suggestion. I'm guessing that since you don't backup your data it's not that important, so mysql with myisam may be a better choice in some ways. OTOH if you need to run complex reporting queries, MySQL's query planner is dumb as a stump and will likely run very poorly or be missing features postgresql has like CTEs and what not. Trade off, neither db is perfect for everything, but know that complex queries in mysql can often take many orders of magnitude longer than in pgsql. >> Is there any period where you could try a full vacuum? > > > Since my app only doing inserts and drops(no delete), I believe the vacuum > will not give any advantage. So I have the below configuration in my > database. Event the updates only performed in a very small table which has 5 > int + 1 small int + 1 real fields. Ahhh but updates are the basically delete / inserts in disguise, so if there's enough, then yes, vacuum full would make a difference. Basically the difference you are seeing is the difference between a database (postgresql) and a data store (mysql + myisam). I wonder what you'd see if you tried mysql with innodb tables, which are transaction and crash safe like postgresql. I'm guessing there would be something a bit closer to parity there.
Dear Scott Marlowe,
Thanks for the details.
As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ?
>>> Ahhh but updates are the basically delete / inserts in disguise, so if there's enough, then yes, vacuum full would make a difference.
The table which get update has very less data ie, only has 900 rows. Out of 10500 tables, only one table is getting update frequently. Is there any way to vacuum a specific table instead of whole database ?
Thanks,
Ramesh
On Thu, Aug 16, 2012 at 10:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Please use plain text on the list, some folks don't have mail readers
that can handle html easily.Well that explains a lot. MyISAM is not transaction or crash safe.
On Wed, Aug 15, 2012 at 10:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
>
> Hi David Barton,
>
> Please find the information below.
>
>> Are you able to provide a table schema?
>
>
> There are 109 different types of table. I am maintaining some tables are
> daily tables and some tables are ID based. So totally we have created around
> 350 tables and dropped around 350 tables. I will drop the old table and I
> don't delete any records. I am maintaing only last 30 days tables. I dropped
> tables which are older than 30 days. All the tables are only have basic data
> types like int, smallint, bigint, varchar.
>
>
>>
>> Were you using MyISAM or InnoDB on MySQL?
>
>
> I am using MyISAM tables in MySQL.
On a machine with decent hardware (i.e. it doesn't lie about fsync)
you can pull the plugs out the back of your postgresql server and any
committed transactions will still be there. Your myisam tables in
mysql will be corrupted and data may or may not be there that you
inserted.
MyISAM is great if your data is easily reproduceable or not that
important. If it's important etc then it's not such a great choice.
Because of the overhead of being transactionally safe, postgresql
actually writes everything twice, once to a write ahead log, and then
flushed out to the actual tables. It is quite likely that at your
very high write rate you have a LOT of transactional logs.That's not the question. What David is wondering is if you have a lot
>> If you back up the database & restore clean, what is the size comparison
>> of the database filed on the restored copy to the existing one?
>
>
> I don't take backup and restore.
of table bloat, for instance from a lot of updates or deletes.
PostgreSQL uses an in-store MVCC system that can bloat your tables
with a lot of deletes / updates happening at once or really fast. So
it's more of a troubleshooting suggestion. I'm guessing that since
you don't backup your data it's not that important, so mysql with
myisam may be a better choice in some ways.
OTOH if you need to run complex reporting queries, MySQL's query
planner is dumb as a stump and will likely run very poorly or be
missing features postgresql has like CTEs and what not. Trade off,
neither db is perfect for everything, but know that complex queries in
mysql can often take many orders of magnitude longer than in pgsql.Ahhh but updates are the basically delete / inserts in disguise, so if
>> Is there any period where you could try a full vacuum?
>
>
> Since my app only doing inserts and drops(no delete), I believe the vacuum
> will not give any advantage. So I have the below configuration in my
> database. Event the updates only performed in a very small table which has 5
> int + 1 small int + 1 real fields.
there's enough, then yes, vacuum full would make a difference.
Basically the difference you are seeing is the difference between a
database (postgresql) and a data store (mysql + myisam). I wonder
what you'd see if you tried mysql with innodb tables, which are
transaction and crash safe like postgresql. I'm guessing there would
be something a bit closer to parity there.
On Thu, Aug 16, 2012 at 1:30 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
# To avoid freqent autovacuumautovacuum_freeze_max_age = 2000000000vacuum_freeze_min_age = 10000000vacuum_freeze_table_age = 150000000
In general, I'm no expert, but I've heard, increasing freeze_max_age isn't wise. It's there to be decreased, and the risk is data corruption.
You should check PG's docs to be sure, but I think the default is usually safe and fast enough.
And, if you have updates (anywhere), avoiding autovacuum may not be a good idea either. Autovacuum won't bother you on tables you don't update, so I think you're optimizing prematurely here. If you're worrying about it, just increase its naptime.
You'll most definitely need to vacuum pg's catalog with that many (and regular) schema changes, and autovacuum also takes care of that.
You may also want to set asynchronous_commits, to better match MyISAM's characteristics. Or even, just for benchmarking, fsync=off (I wouldn't do it in production though).
Anyway, seeing the schema of at least one of the biggest growing tables would probably help figuring out why the disk usage growth. Index bloat comes to mind.
On Thu, Aug 16, 2012 at 1:30 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote:
The size I mentioned is the total folder size of the data directory. There is no difference in the database schema / index between MySQL and PostgreSQL.What are your indexes? Is the size in the indexes or the database tables?
You have a problem right there. Postgres and Mysql are completely different beasts, you *will* need to tailor indices specifically for each of them. You'll find, probably, many indices you needed in MySQL are no longer needed with postgres (because it has a much more sophisticated planner).
Hi, On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > As you said, MySQL with MyISAM is better choice for my app. Because I don't > need transaction/backup. May be I'll try with InnoDB and find the disk > write/space difference. Is there any similar methods available in postgresql > like MyISAM engine ? You can try unlogged tables: http://www.postgresql.org/docs/9.1/static/sql-createtable.html If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well; however, unlogged GiST indexes are currently not supported and cannot be created on an unlogged table. > >>>> Ahhh but updates are the basically delete / inserts in disguise, so if >>>> there's enough, then yes, vacuum full would make a difference. > > The table which get update has very less data ie, only has 900 rows. Out of > 10500 tables, only one table is getting update frequently. Is there any way > to vacuum a specific table instead of whole database ? You can run "vacuum <table name>" but I doubt if that makes sense to run it manually when you have 1500 tx / sec. Postgres has HOT updates which have high change to reuse existing space: From 8.3 release notes: Heap-Only Tuples (HOT) accelerate space reuse for most UPDATEs and DELETEs (Pavan Deolasee, with ideas from many others) UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs. Previously only VACUUM could reclaim space taken by dead tuples. With HOT dead tuple space can be automatically reclaimed at the time of INSERT or UPDATE if no changes are made to indexed columns. This allows for more consistent performance. Also, HOT avoids adding duplicate index entries. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
On Thu, Aug 16, 2012 at 2:40 AM, J Ramesh Kumar <rameshj1977@gmail.com> wrote: >>>> Ahhh but updates are the basically delete / inserts in disguise, so if >>>> there's enough, then yes, vacuum full would make a difference. > > The table which get update has very less data ie, only has 900 rows. Out of > 10500 tables, only one table is getting update frequently. Is there any way > to vacuum a specific table instead of whole database ? Just let autovacuum figure it out. It's smart enough not to touch insert-only tables last I checked, and you can set I/O limits to make sure it doesn't interfere. If you don't care about possible data corruption if the system crashes, you can set fsync=off and get many of the performance benefits. But you don't have ways to reduce disk usage other than dropping indices (and you may have unused indices, do check their statistics), and making sure autovacuum is running where it's needed. A backup/restore or a vacuum full + reindex will get rid of all bloat. If your DB size goes down considerably after that, you have bloat. If not, you don't. You can even do that with a single (old) table to check it out.
On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: > Hi, > > On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > > As you said, MySQL with MyISAM is better choice for my app. Because I don't > > need transaction/backup. May be I'll try with InnoDB and find the disk > > write/space difference. Is there any similar methods available in postgresql > > like MyISAM engine ? > > You can try unlogged tables: > http://www.postgresql.org/docs/9.1/static/sql-createtable.html > > If specified, the table is created as an unlogged table. Data written > to unlogged tables is not written to the write-ahead log (see Chapter > 29), which makes them considerably faster than ordinary tables. > However, they are not crash-safe: an unlogged table is automatically > truncated after a crash or unclean shutdown. The contents of an > unlogged table are also not replicated to standby servers. Any indexes > created on an unlogged table are automatically unlogged as well; > however, unlogged GiST indexes are currently not supported and cannot > be created on an unlogged table. I would set full_page_writes = off too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Aug 16, 2012 at 10:53:21AM -0400, Bruce Momjian wrote: > On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: > > Hi, > > > > On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > > > As you said, MySQL with MyISAM is better choice for my app. Because I don't > > > need transaction/backup. May be I'll try with InnoDB and find the disk > > > write/space difference. Is there any similar methods available in postgresql > > > like MyISAM engine ? > > > > You can try unlogged tables: > > http://www.postgresql.org/docs/9.1/static/sql-createtable.html > > > > If specified, the table is created as an unlogged table. Data written > > to unlogged tables is not written to the write-ahead log (see Chapter > > 29), which makes them considerably faster than ordinary tables. > > However, they are not crash-safe: an unlogged table is automatically > > truncated after a crash or unclean shutdown. The contents of an > > unlogged table are also not replicated to standby servers. Any indexes > > created on an unlogged table are automatically unlogged as well; > > however, unlogged GiST indexes are currently not supported and cannot > > be created on an unlogged table. > > I would set full_page_writes = off too. Better yet, read our documentation about non-durable settting: http://www.postgresql.org/docs/9.1/static/non-durability.html -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Aug 15, 2012 at 11:40 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > Dear Scott Marlowe, > > Thanks for the details. > > As you said, MySQL with MyISAM is better choice for my app. Because I don't > need transaction/backup. That's not exactly what I said. Remember that if you need to run complex queries postgresql is still likely the better candidate. > May be I'll try with InnoDB and find the disk > write/space difference. Is there any similar methods available in postgresql > like MyISAM engine ? Unlogged tables as mentioned by others.
Bruce Momjian <bruce@momjian.us> schrieb: >On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: >> Hi, >> >> On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> >wrote: >> > As you said, MySQL with MyISAM is better choice for my app. Because >I don't >> > need transaction/backup. May be I'll try with InnoDB and find the >disk >> > write/space difference. Is there any similar methods available in >postgresql >> > like MyISAM engine ? >> >> You can try unlogged tables: >> http://www.postgresql.org/docs/9.1/static/sql-createtable.html >> >> If specified, the table is created as an unlogged table. Data written >> to unlogged tables is not written to the write-ahead log (see Chapter >> 29), which makes them considerably faster than ordinary tables. >> However, they are not crash-safe: an unlogged table is automatically >> truncated after a crash or unclean shutdown. The contents of an >> unlogged table are also not replicated to standby servers. Any >indexes >> created on an unlogged table are automatically unlogged as well; >> however, unlogged GiST indexes are currently not supported and cannot >> be created on an unlogged table. > >I would set full_page_writes = off too. Why? There shouldn't be any such writes on unlogged tables. Andres Please excuse the brevity and formatting - I am writing this on my mobile phone.
On Thu, Aug 16, 2012 at 06:07:26PM +0200, anarazel@anarazel.de wrote: > > > Bruce Momjian <bruce@momjian.us> schrieb: > > >On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: > >> Hi, > >> > >> On 16 August 2012 15:40, J Ramesh Kumar <rameshj1977@gmail.com> > >wrote: > >> > As you said, MySQL with MyISAM is better choice for my app. Because > >I don't > >> > need transaction/backup. May be I'll try with InnoDB and find the > >disk > >> > write/space difference. Is there any similar methods available in > >postgresql > >> > like MyISAM engine ? > >> > >> You can try unlogged tables: > >> http://www.postgresql.org/docs/9.1/static/sql-createtable.html > >> > >> If specified, the table is created as an unlogged table. Data written > >> to unlogged tables is not written to the write-ahead log (see Chapter > >> 29), which makes them considerably faster than ordinary tables. > >> However, they are not crash-safe: an unlogged table is automatically > >> truncated after a crash or unclean shutdown. The contents of an > >> unlogged table are also not replicated to standby servers. Any > >indexes > >> created on an unlogged table are automatically unlogged as well; > >> however, unlogged GiST indexes are currently not supported and cannot > >> be created on an unlogged table. > > > >I would set full_page_writes = off too. > Why? There shouldn't be any such writes on unlogged tables. True. I was thinking more of the logged tables, and the system tables. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Aug 15, 2012 at 11:30 PM, J Ramesh Kumar <rameshj1977@gmail.com> wrote: > > Hi David Barton, > > Please find the information below. > >> Are you able to provide a table schema? > > > There are 109 different types of table. I am maintaining some tables are > daily tables and some tables are ID based. So totally we have created around > 350 tables and dropped around 350 tables. I will drop the old table and I > don't delete any records. I am maintaing only last 30 days tables. I dropped > tables which are older than 30 days. All the tables are only have basic data > types like int, smallint, bigint, varchar. > > >> >> Were you using MyISAM or InnoDB on MySQL? > > > I am using MyISAM tables in MySQL. You can't compare a non-MVCC system such as MyISAM with a MVCC one. MVCC systems have to store extra accounting information in order to manage transactions and multiple versions of the same record for SQL updates. MVCC isn't all bad: for example you get much better performance in the face of highly concurrent activity. MyISAM does full table locks which are not scalable at all. The penalty for MVCC storage may in some cases seem quite high if your tables have very narrow records. BTW, I am suspicious that your claim that you 'don't need' transactions is correct, especially in the long term. Anyways, there are several techniques to try and mitigate data growth in postgres -- arrays for example. merlin