Thread: reorder table data

reorder table data

From

"vinita bansal"

Date:

19 April 2005, 14:13:08

Hi,

I have a 40GB database with a few tables containing approx 10 million rows.
Most of the data in these tables is inactive and there is only a few rows
which get used for our benchmark run each day. We cannot delete the inactive
data since it might be required for a particular run some day while it is
active data that is used mostly. Problem here is that the update and select
queries are very slow becuase of this background data. I am looking for some
way to reorganize the data in such a way that the active data gets accessed
much faster as compared to inactive data. I am using btree indices which
assumes that all the rows have the same probability. Is there an index or
some other way to order this data so that active data can be accessed most
efficiently (want to reduce the effect of background data as much as
possible).??

Also, what is the best way to find out which rows are getting accessed in a
table for a particular run??

Regards,
Vinita Bansal

_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!

Re: reorder table data

From

Richard Huxton

Date:

19 April 2005, 15:10:53

vinita bansal wrote:
> Hi,
>
> I have a 40GB database with a few tables containing approx 10 million
> rows.  Most of the data in these tables is inactive and there is only a
> few rows which get used for our benchmark run each day. We cannot delete
> the inactive data since it might be required for a particular run some
> day while it is active data that is used mostly.

Is there some pattern to which data is being accessed. For example, in
an accounts system it might be rows with paid=false.

--
   Richard Huxton
   Archonet Ltd

Re: reorder table data

From

"Joshua D. Drake"

Date:

19 April 2005, 15:26:21

Richard Huxton wrote:
> vinita bansal wrote:
>
>> Hi,
>>
>> I have a 40GB database with a few tables containing approx 10 million
>> rows.  Most of the data in these tables is inactive and there is only
>> a few rows which get used for our benchmark run each day. We cannot
>> delete the inactive data since it might be required for a particular
>> run some day while it is active data that is used mostly.
>
>
> Is there some pattern to which data is being accessed. For example, in
> an accounts system it might be rows with paid=false.

If the above is the case you could create an expression index
specifically for your clause.

You could also archive out the old information into another schema and
access it when required using UNIONS.

Sincerely,

Joshua D. Drake
Command Prompt, Inc.

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: reorder table data

From

"vinita bansal"

Date:

20 April 2005, 01:39:42

Hi,

There is no particular pattern but it is generally the update queries of the
form "update tableName set colName='something'" that are taking a lot of
time incase there is a lot of background data. Also, I would not like to
change my application to access data from another schema when required. I
want this to be handled at database level wherein everything in database
itself is organised to make access faster.

Regards,
Vinita Bansal
>From: "Joshua D. Drake" <jd@commandprompt.com>
>To: Richard Huxton <dev@archonet.com>
>CC: vinita bansal <sagivini@hotmail.com>, pgsql-general@postgresql.org
>Subject: Re: [GENERAL] reorder table data
>Date: Tue, 19 Apr 2005 11:25:06 -0700
>
>Richard Huxton wrote:
>>vinita bansal wrote:
>>
>>>Hi,
>>>
>>>I have a 40GB database with a few tables containing approx 10 million
>>>rows.  Most of the data in these tables is inactive and there is only a
>>>few rows which get used for our benchmark run each day. We cannot delete
>>>the inactive data since it might be required for a particular run some
>>>day while it is active data that is used mostly.
>>
>>
>>Is there some pattern to which data is being accessed. For example, in an
>>accounts system it might be rows with paid=false.
>
>If the above is the case you could create an expression index specifically
>for your clause.
>
>You could also archive out the old information into another schema and
>access it when required using UNIONS.
>
>Sincerely,
>
>Joshua D. Drake
>Command Prompt, Inc.
>
>--
>Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
>PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
>Managed Services, Shared and Dedication Hosting
>Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!

Re: reorder table data

From

Richard Huxton

Date:

20 April 2005, 07:03:35

vinita bansal wrote:
> Hi,
>
> There is no particular pattern but it is generally the update queries of
> the form "update tableName set colName='something'" that are taking a
> lot of time incase there is a lot of background data.

Well, that query will obviously update the entire table, but if you
can't predict which rows will be changed all you can do is index the
appropriate column(s) you select against.

 > Also, I would not
> like to change my application to access data from another schema when
> required. I want this to be handled at database level wherein everything
> in database itself is organised to make access faster.

Not without some pattern to work to - if you can't say which rows will
be accessed next, then how can your database know?

Can you provide an actual example of a query you find too slow, how long
it takes and what it's EXPLAIN ANALYSE is?

--
   Richard Huxton
   Archonet Ltd

Re: reorder table data

From

"vinita bansal"

Date:

20 April 2005, 10:03:04

Oh Sorry!!
I gave a wrong example for update. It includes "where clause" as well which
esentially mean that not all the rows will be modified each time. If suppose
I already know (assuming that I can find out which rows will bw accessed) is
there a way to organize the table data such that updates and selects become
fast.

We cannot make changes to the application to handle such a situation (by
creating two different tables one for active and another for inactive data)
wherein it will access active data mostly and access inactive data from
other table only when required since it is very complex and changing it will
require lots of effort.

Regards,
Vinita

>From: Richard Huxton <dev@archonet.com>
>To: vinita bansal <sagivini@hotmail.com>
>CC: jd@commandprompt.com, pgsql-general@postgresql.org
>Subject: Re: [GENERAL] reorder table data
>Date: Wed, 20 Apr 2005 11:03:19 +0100
>
>vinita bansal wrote:
>>Hi,
>>
>>There is no particular pattern but it is generally the update queries of
>>the form "update tableName set colName='something'" that are taking a lot
>>of time incase there is a lot of background data.
>
>Well, that query will obviously update the entire table, but if you can't
>predict which rows will be changed all you can do is index the appropriate
>column(s) you select against.
>
> > Also, I would not
>>like to change my application to access data from another schema when
>>required. I want this to be handled at database level wherein everything
>>in database itself is organised to make access faster.
>
>Not without some pattern to work to - if you can't say which rows will be
>accessed next, then how can your database know?
>
>Can you provide an actual example of a query you find too slow, how long it
>takes and what it's EXPLAIN ANALYSE is?
>
>--
>   Richard Huxton
>   Archonet Ltd

_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!

Re: reorder table data

From

Richard Huxton

Date:

20 April 2005, 12:34:37

vinita bansal wrote:
> Oh Sorry!!
> I gave a wrong example for update. It includes "where clause" as well
> which esentially mean that not all the rows will be modified each time.
> If suppose I already know (assuming that I can find out which rows will
> bw accessed) is there a way to organize the table data such that updates
> and selects become fast.
>
> We cannot make changes to the application to handle such a situation (by
> creating two different tables one for active and another for inactive
> data) wherein it will access active data mostly and access inactive data
> from other table only when required since it is very complex and
> changing it will require lots of effort.

OK - so show one or more typical queries and their EXPLAIN ANALYSE
outputs and we can see where the bottleneck is.

--
   Richard Huxton
   Archonet Ltd