Thread: reorder table data
Hi, I have a 40GB database with a few tables containing approx 10 million rows. Most of the data in these tables is inactive and there is only a few rows which get used for our benchmark run each day. We cannot delete the inactive data since it might be required for a particular run some day while it is active data that is used mostly. Problem here is that the update and select queries are very slow becuase of this background data. I am looking for some way to reorganize the data in such a way that the active data gets accessed much faster as compared to inactive data. I am using btree indices which assumes that all the rows have the same probability. Is there an index or some other way to order this data so that active data can be accessed most efficiently (want to reduce the effect of background data as much as possible).?? Also, what is the best way to find out which rows are getting accessed in a table for a particular run?? Regards, Vinita Bansal _________________________________________________________________ Find,Compare,Buy & Sell! http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all on eBay!
vinita bansal wrote: > Hi, > > I have a 40GB database with a few tables containing approx 10 million > rows. Most of the data in these tables is inactive and there is only a > few rows which get used for our benchmark run each day. We cannot delete > the inactive data since it might be required for a particular run some > day while it is active data that is used mostly. Is there some pattern to which data is being accessed. For example, in an accounts system it might be rows with paid=false. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > vinita bansal wrote: > >> Hi, >> >> I have a 40GB database with a few tables containing approx 10 million >> rows. Most of the data in these tables is inactive and there is only >> a few rows which get used for our benchmark run each day. We cannot >> delete the inactive data since it might be required for a particular >> run some day while it is active data that is used mostly. > > > Is there some pattern to which data is being accessed. For example, in > an accounts system it might be rows with paid=false. If the above is the case you could create an expression index specifically for your clause. You could also archive out the old information into another schema and access it when required using UNIONS. Sincerely, Joshua D. Drake Command Prompt, Inc. -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedication Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Hi, There is no particular pattern but it is generally the update queries of the form "update tableName set colName='something'" that are taking a lot of time incase there is a lot of background data. Also, I would not like to change my application to access data from another schema when required. I want this to be handled at database level wherein everything in database itself is organised to make access faster. Regards, Vinita Bansal >From: "Joshua D. Drake" <jd@commandprompt.com> >To: Richard Huxton <dev@archonet.com> >CC: vinita bansal <sagivini@hotmail.com>, pgsql-general@postgresql.org >Subject: Re: [GENERAL] reorder table data >Date: Tue, 19 Apr 2005 11:25:06 -0700 > >Richard Huxton wrote: >>vinita bansal wrote: >> >>>Hi, >>> >>>I have a 40GB database with a few tables containing approx 10 million >>>rows. Most of the data in these tables is inactive and there is only a >>>few rows which get used for our benchmark run each day. We cannot delete >>>the inactive data since it might be required for a particular run some >>>day while it is active data that is used mostly. >> >> >>Is there some pattern to which data is being accessed. For example, in an >>accounts system it might be rows with paid=false. > >If the above is the case you could create an expression index specifically >for your clause. > >You could also archive out the old information into another schema and >access it when required using UNIONS. > >Sincerely, > >Joshua D. Drake >Command Prompt, Inc. > >-- >Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 >PostgreSQL Replication, Consulting, Custom Programming, 24x7 support >Managed Services, Shared and Dedication Hosting >Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ _________________________________________________________________ Find,Compare,Buy & Sell! http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all on eBay!
vinita bansal wrote: > Hi, > > There is no particular pattern but it is generally the update queries of > the form "update tableName set colName='something'" that are taking a > lot of time incase there is a lot of background data. Well, that query will obviously update the entire table, but if you can't predict which rows will be changed all you can do is index the appropriate column(s) you select against. > Also, I would not > like to change my application to access data from another schema when > required. I want this to be handled at database level wherein everything > in database itself is organised to make access faster. Not without some pattern to work to - if you can't say which rows will be accessed next, then how can your database know? Can you provide an actual example of a query you find too slow, how long it takes and what it's EXPLAIN ANALYSE is? -- Richard Huxton Archonet Ltd
Oh Sorry!! I gave a wrong example for update. It includes "where clause" as well which esentially mean that not all the rows will be modified each time. If suppose I already know (assuming that I can find out which rows will bw accessed) is there a way to organize the table data such that updates and selects become fast. We cannot make changes to the application to handle such a situation (by creating two different tables one for active and another for inactive data) wherein it will access active data mostly and access inactive data from other table only when required since it is very complex and changing it will require lots of effort. Regards, Vinita >From: Richard Huxton <dev@archonet.com> >To: vinita bansal <sagivini@hotmail.com> >CC: jd@commandprompt.com, pgsql-general@postgresql.org >Subject: Re: [GENERAL] reorder table data >Date: Wed, 20 Apr 2005 11:03:19 +0100 > >vinita bansal wrote: >>Hi, >> >>There is no particular pattern but it is generally the update queries of >>the form "update tableName set colName='something'" that are taking a lot >>of time incase there is a lot of background data. > >Well, that query will obviously update the entire table, but if you can't >predict which rows will be changed all you can do is index the appropriate >column(s) you select against. > > > Also, I would not >>like to change my application to access data from another schema when >>required. I want this to be handled at database level wherein everything >>in database itself is organised to make access faster. > >Not without some pattern to work to - if you can't say which rows will be >accessed next, then how can your database know? > >Can you provide an actual example of a query you find too slow, how long it >takes and what it's EXPLAIN ANALYSE is? > >-- > Richard Huxton > Archonet Ltd _________________________________________________________________ Find,Compare,Buy & Sell! http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all on eBay!
vinita bansal wrote: > Oh Sorry!! > I gave a wrong example for update. It includes "where clause" as well > which esentially mean that not all the rows will be modified each time. > If suppose I already know (assuming that I can find out which rows will > bw accessed) is there a way to organize the table data such that updates > and selects become fast. > > We cannot make changes to the application to handle such a situation (by > creating two different tables one for active and another for inactive > data) wherein it will access active data mostly and access inactive data > from other table only when required since it is very complex and > changing it will require lots of effort. OK - so show one or more typical queries and their EXPLAIN ANALYSE outputs and we can see where the bottleneck is. -- Richard Huxton Archonet Ltd