Thread: perf problem with huge table
Hi all, i am trying to move my app from M$sql to PGsql, but i need a bit of help :) on M$sql, i had certain tables that was made as follow (sorry pseudo code) contab_y date amt uid contab_yd date amt uid contab_ymd date amt uid and so on.. this was used to "solidify" (aggregate..btw sorry for my terrible english) the data on it.. so basically, i get contab_y date = 2010 amt = 100 uid = 1 contab_ym date = 2010-01 amt = 10 uid = 1 ---- date = 2010-02 amt = 90 uid = 1 contab_ymd date=2010-01-01 amt = 1 uid = 1 ---- blabla in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i cando some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check what i have try is to go for a "normal" approach, using just a table that contains all the data, and some proper indexing. The issue is that this table can contains easilly 100M rows :) that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums. I am here to ask for an advice to PGsql experts: what do you think i can do to better manage this situation? there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? any advice is really appreciated!
On 2/10/2010 5:13 PM, rama wrote: > in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y > if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i cando some cool intersection between > the different table using some huge (but fast) queries. > > > Now, the matter is that this design is hard to mantain, and the tables are difficult to check > > what i have try is to go for a "normal" approach, using just a table that contains all the data, and some proper indexing. > The issue is that this table can contains easilly 100M rows :) > that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating thesums. > > > I am here to ask for an advice to PGsql experts: > what do you think i can do to better manage this situation? > there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? > any advice is really appreciated! > Look into table partitioning http://www.postgresql.org/docs/current/static/ddl-partitioning.html Its similar to what you are doing but it simplifies queries and logic to access large data sets. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by ourproprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mailimmediately. Thank you.
Hi Rama
I'm actually looking at going in the other direction ....
I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale.
In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does.
My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff:
- Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases.
- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-)
- There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it
I am interested to hear about your progress.
Cheers
Dave
I'm actually looking at going in the other direction ....
I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale.
In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does.
My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff:
- Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases.
- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-)
- There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it
I am interested to hear about your progress.
Cheers
Dave
On Wed, Feb 10, 2010 at 4:13 PM, rama <rama.rama@tiscali.it> wrote:
Hi all,
i am trying to move my app from M$sql to PGsql, but i need a bit of help :)
on M$sql, i had certain tables that was made as follow (sorry pseudo code)
contab_y
date
amt
uid
contab_yd
date
amt
uid
contab_ymd
date
amt
uid
and so on..
this was used to "solidify" (aggregate..btw sorry for my terrible english) the data on it..
so basically, i get
contab_y
date = 2010
amt = 100
uid = 1
contab_ym
date = 2010-01
amt = 10
uid = 1
----
date = 2010-02
amt = 90
uid = 1
contab_ymd
date=2010-01-01
amt = 1
uid = 1
----
blabla
in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y
if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between
the different table using some huge (but fast) queries.
Now, the matter is that this design is hard to mantain, and the tables are difficult to check
what i have try is to go for a "normal" approach, using just a table that contains all the data, and some proper indexing.
The issue is that this table can contains easilly 100M rows :)
that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums.
I am here to ask for an advice to PGsql experts:
what do you think i can do to better manage this situation?
there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know?
any advice is really appreciated!
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Just a nit, but Oracle implements MVCC. 90% of the databases out there do.
I find partitioning pretty useful in this scenario if the data allows is. Aging out data just means dropping a partition rather than a delete statement.
Hi Rama
I'm actually looking at going in the other direction ....
I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale.
In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does.
My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff:
Just a nit, but Oracle implements MVCC. 90% of the databases out there do.
- Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases.
- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-)
- There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it .
I find partitioning pretty useful in this scenario if the data allows is. Aging out data just means dropping a partition rather than a delete statement.
On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison <jlewison1@gmail.com> wrote:
Sorry, I spoke imprecisely. What I meant was the difference in how the rows are stored internally .... in Oracle, the main tablespace contains only the newest version of a row, which is (where possible) updated in place - queries in a transaction that can still "see" an older version have to pull it from the UNDO tablespace (rollback segments in Oracle 8 and older).
In Postgres, all versions of all rows are in the main table, and have validity ranges associated with them ("this version of this row existed between transaction ids x and y"). Once a version goes out of scope, it has to be garbage collected by the vacuuming process so the space can be re-used.
In general, this means Oracle is faster *if* you're only doing lots of small transactions (consider how these different models handle an update to a single field in a single row) but it is more sensitive to the scale of transactions .... doing a really big transaction against a database with an OLTP workload can upset Oracle's digestion as it causes a lot of UNDO lookups, PG's performance is a lot more predictable in this regard.
Both models have benefits and drawbacks ... when designing a schema for performance it's important to understand these differences.
Just a nit, but Oracle implements MVCC. 90% of the databases out there do.
Sorry, I spoke imprecisely. What I meant was the difference in how the rows are stored internally .... in Oracle, the main tablespace contains only the newest version of a row, which is (where possible) updated in place - queries in a transaction that can still "see" an older version have to pull it from the UNDO tablespace (rollback segments in Oracle 8 and older).
In Postgres, all versions of all rows are in the main table, and have validity ranges associated with them ("this version of this row existed between transaction ids x and y"). Once a version goes out of scope, it has to be garbage collected by the vacuuming process so the space can be re-used.
In general, this means Oracle is faster *if* you're only doing lots of small transactions (consider how these different models handle an update to a single field in a single row) but it is more sensitive to the scale of transactions .... doing a really big transaction against a database with an OLTP workload can upset Oracle's digestion as it causes a lot of UNDO lookups, PG's performance is a lot more predictable in this regard.
Both models have benefits and drawbacks ... when designing a schema for performance it's important to understand these differences.
I find partitioning pretty useful in this scenario if the data allows is. Aging out data just means dropping a partition rather than a delete statement.
Forgot to say this - yes, absolutely agree .... dropping a table is a lot cheaper than a transactional delete.
In general, I think partitioning is more important / beneficial with PG's style of MVCC than with Oracle or SQL-Server (which I think is closer to Oracle than PG).
Cheers
Dave
On Wed, Feb 10, 2010 at 4:48 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Yes, absolutely. It's not unusual to see the UNDO tablespace increase in size by several gigs for a large bulk load.
Speaking of rollback segments I'm assuming that since all storage for non-visible row versions is in the main table that PostgreSQL has no equivalent for an ORA-01555.
- Jon
On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison <jlewison1@gmail.com> wrote:
Just a nit, but Oracle implements MVCC. 90% of the databases out there do.
Sorry, I spoke imprecisely. What I meant was the difference in how the rows are stored internally .... in Oracle, the main tablespace contains only the newest version of a row, which is (where possible) updated in place - queries in a transaction that can still "see" an older version have to pull it from the UNDO tablespace (rollback segments in Oracle 8 and older).
In Postgres, all versions of all rows are in the main table, and have validity ranges associated with them ("this version of this row existed between transaction ids x and y"). Once a version goes out of scope, it has to be garbage collected by the vacuuming process so the space can be re-used.
In general, this means Oracle is faster *if* you're only doing lots of small transactions (consider how these different models handle an update to a single field in a single row) but it is more sensitive to the scale of transactions .... doing a really big transaction against a database with an OLTP workload can upset Oracle's digestion as it causes a lot of UNDO lookups, PG's performance is a lot more predictable in this regard.
Both models have benefits and drawbacks ... when designing a schema for performance it's important to understand these differences.
Yes, absolutely. It's not unusual to see the UNDO tablespace increase in size by several gigs for a large bulk load.
Speaking of rollback segments I'm assuming that since all storage for non-visible row versions is in the main table that PostgreSQL has no equivalent for an ORA-01555.
- Jon
Actually, in a way it does .... "No space left on device" or similar ;-)
Cheers
Dave
P.S. for those not familiar with Oracle, ORA-01555 translates to "your query / transaction is kinda old and I've forgotten the data, so I'm just going to throw an error at you now". If you're reading, your SELECT randomly fails, if you're writing it forces a rollback of your transaction.
Cheers
Dave
P.S. for those not familiar with Oracle, ORA-01555 translates to "your query / transaction is kinda old and I've forgotten the data, so I'm just going to throw an error at you now". If you're reading, your SELECT randomly fails, if you're writing it forces a rollback of your transaction.
On Wed, Feb 10, 2010 at 6:09 PM, Jon Lewison <jlewison1@gmail.com> wrote:
Speaking of rollback segments I'm assuming that since all storage for non-visible row versions is in the main table that PostgreSQL has no equivalent for an ORA-01555.
- Jon
rama wrote: > in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y > if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i cando some cool intersection between > the different table using some huge (but fast) queries. > > Now, the matter is that this design is hard to mantain, and the tables are difficult to check > You sound like you're trying to implement something like materialized views one at a time; have you considered adopting the more general techniques used to maintain those so that you're not doing custom development each time for the design? http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views http://www.pgcon.org/2008/schedule/events/69.en.html I think that sort of approach is more practical than it would have been for you in MySQL, so maybe this wasn't on your list of possibilities before. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
> Hi all, > > i am trying to move my app from M$sql to PGsql, but i need a bit of help > :) Except from all the other good advises about partitioning the dataset and such there is another aspect to "keep in mind". When you have a large dataset and your queries become "IO-bound" the "tuple density" is going to hit you in 2 ways. Your dataset seems to have a natural clustering around the time, which is also what you would use for the partitioning. That also means that if you sort of have the clustering of data on disk you would have the tuples you need to satisfy a query on the "same page" or pages "close to". The cost of checking visibillity for a tuple is to some degree a function of the "tuple size", so if you can do anything to increase the tuple density that will most likely benefit speed in two ways: * You increace the likelyhood that the next tuple was in the same page and then dont result in a random I/O seek. * You increace the total amount of tuples you have sitting in your system cache in the same amount of pages (memory) so they dont result in a random I/O seek. So .. if you are carrying around columns you "dont really need", then throw them away. (that could be colums that trivially can be computed bases on other colums), but you need to do your own testing here. To stress the first point theres a sample run on a fairly old desktop with one SATA drive. testtable has the "id integer" and a "data" which is 486 bytes of text. testtable2 has the "id integer" and a data integer. both filled with 10M tuples and PG restarted and rand drop caches before to simulate "totally disk bound system". testdb=# select count(id) from testtable where id > 8000000 and id < 8500000; count -------- 499999 (1 row) Time: 7909.464 ms testdb=# select count(id) from testtable2 where id > 8000000 and id < 8500000; count -------- 499999 (1 row) Time: 2149.509 ms In this sample.. 4 times faster, the query does not touch the "data" column. (on better hardware you'll most likely see better results). If the columns are needed, you can push less frequently used columns to a 1:1 relation.. but that gives you some administrative overhead, but then you can desice at query time if you want the extra random seeks to access data. You have the same picture the "other way around" if your queries are accession data sitting in TOAST, you'll be paying "double random IO"-cost for getting the tuple. So it is definately a tradeoff, that should be done with care. I've monkeypatched my own PG using this patch to toy around with criteria to send the "less frequently used data" to a TOAST table. http://article.gmane.org/gmane.comp.db.postgresql.devel.general/135158/match= Google "vertical partition" for more, this is basically what it is. (I belive this could benefit my own application, so I'm also trying to push some interest into the area). -- Jesper
Dave Crooke wrote: > On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison <jlewison1@gmail.com > <mailto:jlewison1@gmail.com>> wrote: > > > > Just a nit, but Oracle implements MVCC. 90% of the databases out > there do. > > > Sorry, I spoke imprecisely. What I meant was the difference in how the > rows are stored internally .... in Oracle, the main tablespace contains > only the newest version of a row, which is (where possible) updated in > place - queries in a transaction that can still "see" an older version > have to pull it from the UNDO tablespace (rollback segments in Oracle 8 > and older). > > In Postgres, all versions of all rows are in the main table, and have > validity ranges associated with them ("this version of this row existed > between transaction ids x and y"). Once a version goes out of scope, it > has to be garbage collected by the vacuuming process so the space can be > re-used. > > In general, this means Oracle is faster *if* you're only doing lots of > small transactions (consider how these different models handle an update > to a single field in a single row) but it is more sensitive to the scale > of transactions .... doing a really big transaction against a database > with an OLTP workload can upset Oracle's digestion as it causes a lot of > UNDO lookups, PG's performance is a lot more predictable in this regard. > > Both models have benefits and drawbacks ... when designing a schema for > performance it's important to understand these differences. > > > I find partitioning pretty useful in this scenario if the data > allows is. Aging out data just means dropping a partition rather > than a delete statement. > > > Forgot to say this - yes, absolutely agree .... dropping a table is a > lot cheaper than a transactional delete. > > In general, I think partitioning is more important / beneficial with > PG's style of MVCC than with Oracle or SQL-Server (which I think is > closer to Oracle than PG). I would like to disagree here a little bit Where Oracle's table partitioning is coming in very handy is for example when you have to replace the data of a big (read-only) table on a regularly basis (typically the replicated data from another system). In this case, you just create a partitioned table of exact the same columns/indexes whatsoever as the data table. To load, you then do load the data into the partitioned table, i.e. - truncate the partitioned table, disable constraints, drop indexes - load the data into the partitioned table - rebuild all indexes etc. on the partitioned table during all this time (even if it takes hours) the application can still access the data in the data table without interfering the bulk load. Once you have prepared the data in the partitioned table, you - exchange the partition with the data table wich is a dictionary operation, that means, the application is (if ever) only blocked during this operation which is in the sub-seconds range. If you have to do this with convetional updates or deletes/inserts resp. then this might not even be possible in the given timeframe. just as an example Leo p.s. just to make it a little bit clearer about the famous ORA-01555: Oracle is not "forgetting" the data as the Oracle RDBMS is of course also ACID-compliant. The ORA-01555 can happen - when the rollback tablespace is really to small to hold all the data changed in the transaction (which I consider a configuration error) - when a long running (read) transaction is trying to change a record which is already updated AND COMMITTED by another transaction. The key here is, that a second transaction has changed a record which is also needed by the first transaction and the second transaction commited the work. Committing the change means, the data in the rollback segment is no longer needed, as it can be read directly from the data block (after all it is commited and this means valid and visible to other transactions). If the first transaction now tries to read the data from the rollback segment to see the unchanged state, it will still succeed (it is still there, nothing happend until now to the rollback segment). The problem of the ORA-01555 shows up only, if now a third transaction needs space in the rollback segment. As the entry from the first/second transaction is marked committed (and therefore no longer needed), it is perfectly valid for transaction #3 to grab this rollback segment and to store its old value there. If THEN (and only then) comes transaction #1 again, asking for the old, unchanged value when the transaction started, THEN the famous ORA-01555 is raised as this value is now overwritten by transaction #3. Thats why in newer versions you have to set the retention time of the rollback blocks/segments to a value bigger than your expected longest transaction. This will decrease the likelihood of the ORA-01555 drastically (but it is still not zero, as you could easily construct an example where it still will fail with ORA-0155 as a transaction can still run longer than you expected or the changed data is bigger the the whole rollback tablespace) > > > Cheers > Dave > > >