Thread: Transaction table
Dear All, I have a query in postgresql if any one can support. A transaction table in a vehicle tracking application is inserted with the current position of each vehicle at regular interval(seconds). This transaction table consists of 12 columns, which are of the type varchar, time, numeric or double precision. A new transactiontable is created every day. Total number of records at the end of the day is around 1 million records. Data isonly inserted in to this table and there is no update or delete. This table is indexed using 2 columns. Now, we are expectingthis transaction table to grow by 10 times in near future. In this regard, we would like to know if this same structureof the transaction table and the indexing would be sufficient for quick retrivel of data or do we have to partitionthis table? If so what kind of partition would be suitable? Awaiting positive reply. Regards, Deepa. The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/
On Sat, Mar 20, 2010 at 2:47 AM, Deepa Thulasidasan <deepatulsidasan@yahoo.co.in> wrote: > Dear All, > > I have a query in postgresql if any one can support. > > A transaction table in a vehicle tracking application is inserted with the current position of each vehicle at regularinterval (seconds). > This transaction table consists of 12 columns, which are of the type varchar, time, numeric or double precision. A newtransaction table is created every day. Total number of records at the end of the day is around 1 million records. Datais only inserted in to this table and there is no update or delete. This table is indexed using 2 columns. Now, we areexpecting this transaction table to grow by 10 times in near future. In this regard, we would like to know if this samestructure of the transaction table and the indexing would be sufficient for quick retrivel of data or do we have topartition this table? If so what kind of partition would be suitable? You generally wanna partition on the one (or maybe two) fields you'll regularly use to restrict your data set. For many systems like this that is a partition on date. Sometimes you can partition on two things, and if it makes sense to do so your current usage patterns will show it. Normally one axis of partitioning is fine.
On Sat, Mar 20, 2010 at 5:26 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sat, Mar 20, 2010 at 2:47 AM, Deepa ThulasidasanYou generally wanna partition on the one (or maybe two) fields you'll
<deepatulsidasan@yahoo.co.in> wrote:
> Dear All,
>
> I have a query in postgresql if any one can support.
>
> A transaction table in a vehicle tracking application is inserted with the current position of each vehicle at regular interval (seconds).
> This transaction table consists of 12 columns, which are of the type varchar, time, numeric or double precision. A new transaction table is created every day. Total number of records at the end of the day is around 1 million records. Data is only inserted in to this table and there is no update or delete. This table is indexed using 2 columns. Now, we are expecting this transaction table to grow by 10 times in near future. In this regard, we would like to know if this same structure of the transaction table and the indexing would be sufficient for quick retrivel of data or do we have to partition this table? If so what kind of partition would be suitable?
regularly use to restrict your data set. For many systems like this
that is a partition on date. Sometimes you can partition on two
things, and if it makes sense to do so your current usage patterns
will show it. Normally one axis of partitioning is fine.
That'll help with reporting, how are you inserting the data now? Are you using individual inserts, or are you loading in batches. Typically, if you can buffer some of the inserts into a group on the application side and then load them you'll be better off then just doing straight inserts for every 'event'. Then, you can combine that with partitioning to report over multiple days pretty easily.
--Scott Mead
(Just realized that not only are there lots of Scott's on this list... we have multiple Scott M's! :)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sat, Mar 20, 2010 at 4:47 AM, Deepa Thulasidasan <deepatulsidasan@yahoo.co.in> wrote: > transaction table to grow by 10 times in near future. In this regard, we would like to know if this same structure of thetransaction table and the indexing would be sufficient for quick retrivel of data or do we have to partition this table?If so what kind of partition would be suitable? My experience has been that when the tables are approaching the 100 million record mark things tend to slow down. Running reindex and vacuum on those tables also takes much longer since you tend not to have enough memory to do those operations efficiently. I like to partition tables such that they end up having under 10 million records each. I just (a few hours ago...) finished partitioning and migrating the data from a single table that had about 120 million records into 100 partitions of about 1.2 million rows each. For this particular case, I just partitioned on a mod 100 operation of one of the ID keys on which I do the bulk of my searches. Like the two Scott M's recommended, figure out your usage patterns and partition across those vectors to optimize those searches. I would not worry about optimizing the insert pattern. You really *never* delete this data? I would suspect then that having a partitioning scheme where the number of partitions can grow over time is going to be important to you.
Vick Khera wrote: > You really *never* delete this data? I would suspect then that having > a partitioning scheme where the number of partitions can grow over > time is going to be important to you. > he said a new table is created each day, but nothing about these daily tables being partitions in a larger view. I don't know if that means the old daily tables are deleted eventually or just kept forever. he then said the daily table will be growing by 10X, I don't know if these means 10 times more vehicles or 10 times the number of daily trackpoints per vehicle. he said this daily table has two indexes, I suspect these are vehicle number, and time of track point, but I'm just guessing. if it is by vehicle and by time, and the number of vehicles is multiplying, he could partition by vehicle if the daily table becomes oto large, or he could partition by hour. partitioning per vehicle would allow putting them in different tablespaces, which could be on different disk drives and spread the write load out across more spindles, while hourly would concentrate all the writes on the same sub-table.
On Sun, Mar 21, 2010 at 1:30 PM, Vick Khera <vivek@khera.org> wrote: > Like the two Scott M's recommended, figure out your usage patterns and > partition across those vectors to optimize those searches. I would > not worry about optimizing the insert pattern. Note that once the partitions get small enough, on bigger hardware, there's often little need to index at that level anymore. If you're pulling all the records from a 50 meg db file, it'll be read in well under a second. Even if you hit a few partitions, it's still pretty fast since it's at worst a sequential scan, or more likely a read from OS level cache.