Thread: One table or many tables for data set

One table or many tables for data set

From
"Castle, Lindsay"
Date:
Hi all,

I'm working on a project that has a data set of approximately 6million rows
with about 12,000 different elements, each element has 7 columns of data.

I'm wondering what would be faster from a scanning perspective (SELECT
statements with some calculations) for this type of set up;
    one table for all the data
    one table for each data element (12,000 tables)
    one table per subset of elements (eg all elements that start with
"a" in a table)

The data is static once its in the database, only new records are added on a
regular basis.

I'd like to run quite a few different formulated scans in the longer term so
having efficient scans is a high priority.

Can I do anything with Indexing to help with performance?  I suspect for the
majority of scans I will need to evaluate an outcome based on 4 or 5 of the
7 columns of data.

Thanks in advance :-)

Linz

Re: One table or many tables for data set

From
Rod Taylor
Date:
On Tue, 2003-07-22 at 20:34, Castle, Lindsay wrote:
> Hi all,
>
> I'm working on a project that has a data set of approximately 6million rows
> with about 12,000 different elements, each element has 7 columns of data.

Are these 7 columns the same for each element?

Attachment

Re: One table or many tables for data set

From
Joe Conway
Date:
Castle, Lindsay wrote:
> I'm working on a project that has a data set of approximately 6million rows
> with about 12,000 different elements, each element has 7 columns of data.
>
> I'm wondering what would be faster from a scanning perspective (SELECT
> statements with some calculations) for this type of set up;
>     one table for all the data
>     one table for each data element (12,000 tables)
>     one table per subset of elements (eg all elements that start with
> "a" in a table)
>

I, for one, am having difficulty understanding exactly what your data
looks like, so it's hard to give advice. Maybe some concrete examples of
what you are calling "rows", "elements", and "columns" would help.

Does each of 6 million rows have 12000 elements, each with 7 columns? Or
do you mean that out of 6 million rows, there are 12000 distinct kinds
of elements?

> Can I do anything with Indexing to help with performance?  I suspect for the
> majority of scans I will need to evaluate an outcome based on 4 or 5 of the
> 7 columns of data.
>

Again, this isn't clear to me -- but maybe I'm just being dense ;-)
Does this mean you expect 4 or 5 items in your WHERE clause?

Joe


Re: One table or many tables for data set

From
"Castle, Lindsay"
Date:
Apologies, let me clear this up a bit (hopefully) :-)

The data structure looks like this:
    element
    date
    num1
    num2
    num3
    num4
    units

There are approx 12,000 distinct elements for a total of about 6 million
rows of data.

The scanning technology I want to use may need a different number of rows
and different columns depending on the scan formula;
    eg scan1 may need num1, num2 and num3 from the last 200 rows for
element "x"
       scan2 may need num1, units from the last 10 rows for element "y"

I can either do the scans and calculate what i need within SQL or drag the
data out and process it outside of SQL, my preference is to go inside SQL as
I've assumed that would be faster and less development work.

If I went with the many tables design I would not expect to need to join
between tables, there is no relationship between the different elements that
I need to cater for.

Cheers,

Linz


Castle, Lindsay wrote and <snipped>:
> I'm working on a project that has a data set of approximately 6million
rows
> with about 12,000 different elements, each element has 7 columns of data.
>
> I'm wondering what would be faster from a scanning perspective (SELECT
> statements with some calculations) for this type of set up;
>     one table for all the data
>     one table for each data element (12,000 tables)
>     one table per subset of elements (eg all elements that start with
> "a" in a table)
>

I, for one, am having difficulty understanding exactly what your data
looks like, so it's hard to give advice. Maybe some concrete examples of
what you are calling "rows", "elements", and "columns" would help.

Does each of 6 million rows have 12000 elements, each with 7 columns? Or
do you mean that out of 6 million rows, there are 12000 distinct kinds
of elements?

> Can I do anything with Indexing to help with performance?  I suspect for
the
> majority of scans I will need to evaluate an outcome based on 4 or 5 of
the
> 7 columns of data.
>

Again, this isn't clear to me -- but maybe I'm just being dense ;-)
Does this mean you expect 4 or 5 items in your WHERE clause?

Re: One table or many tables for data set

From
Joe Conway
Date:
Castle, Lindsay wrote:
> The data structure looks like this:
>     element
>     date
>     num1
>     num2
>     num3
>     num4
>     units
>
> There are approx 12,000 distinct elements for a total of about 6 million
> rows of data.

Ahh, that helps! So are the elements evenly distributed, i.e. are there
approx 500 rows of each element? If so, it should be plenty quick to put
all the data in one table with an index on "element" (and maybe a
multicolumn key, depending on other factors).

> The scanning technology I want to use may need a different number of rows
> and different columns depending on the scan formula;
>     eg scan1 may need num1, num2 and num3 from the last 200 rows for
> element "x"
>        scan2 may need num1, units from the last 10 rows for element "y"

When you say "last X rows", do you mean sorted by "date"? If so, you
might want that index to be on (element, date). Then do:

SELECT num1, num2, num3 FROM mytable WHERE element = 'an_element' order
by date DESC LIMIT 20;

Replace num1, num2, num3 by whatever columns you want, and "LIMIT X" as
the number of rows you want.

HTH,

Joe


Re: One table or many tables for data set

From
Rod Taylor
Date:
Ok.. Unless I'm missing something, the data will be static (or near
static).  It also sounds as if the structure is common for elements, so
you probably only want 2 tables.

One with 6 million rows and any row information.  The other with 6
million * 12000 rows with the element data linking to the row
information line with an identifier, and have an 'element type' (I
assume there are 12000 types of elements -- or something of that
nature).

Unique constraint on (row_identifier, element_type)

The speed you achieve will be based on what indexes you create.

If you spend most of your time with one or a few (5% or less of the
structure) element types, create a partial index for those element types
only, and a partial index for all of the others.

If you have a standard mathematical operation on num1, num2, etc. you
may want to make use of functional indexes to index the result of the
calculation.

Be sure to create the tables WITHOUT OIDS and be prepared for the
dataload to take a while, and CLUSTER the table based on your most
commonly used index (once they've been setup).

To help with speed, we would need to see EXPLAIN ANALYZE results and the
query being performed.

On Tue, 2003-07-22 at 21:00, Castle, Lindsay wrote:
> All rows have the same structure, the data itself will be different for each
> row, the structure is something like this:
>
>     element
>     date
>     num1
>     num2
>     num3
>     num4
>     units
>
> Thanks,
>
>
> Lindsay Castle
> EDS Australia
> Midrange & Distributed Tools
> Infrastructure Tools AP
> Ph:   +61 (0)8 8464 7101
> Fax:  +61 (0)8 8464 2135
>
>
> -----Original Message-----
> From: Rod Taylor [mailto:rbt@rbt.ca]
> Sent: Wednesday, 23 July 2003 10:24 AM
> To: Castle, Lindsay
> Cc: Postgresql Performance
> Subject: Re: One table or many tables for data set
>
>
> On Tue, 2003-07-22 at 20:34, Castle, Lindsay wrote:
> > Hi all,
> >
> > I'm working on a project that has a data set of approximately 6million
> rows
> > with about 12,000 different elements, each element has 7 columns of data.
>
> Are these 7 columns the same for each element?
>

Attachment

Re: One table or many tables for data set

From
"Castle, Lindsay"
Date:
Thanks Joe,

This certainly helps me get going on the right path.


Lindsay Castle
EDS Australia
Midrange & Distributed Tools
Infrastructure Tools AP
Ph:   +61 (0)8 8464 7101
Fax:  +61 (0)8 8464 2135


-----Original Message-----
From: Joe Conway [mailto:mail@joeconway.com]
Sent: Wednesday, 23 July 2003 11:06 AM
To: Castle, Lindsay
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] One table or many tables for data set


Castle, Lindsay wrote:
> The data structure looks like this:
>     element
>     date
>     num1
>     num2
>     num3
>     num4
>     units
>
> There are approx 12,000 distinct elements for a total of about 6 million
> rows of data.

Ahh, that helps! So are the elements evenly distributed, i.e. are there
approx 500 rows of each element? If so, it should be plenty quick to put
all the data in one table with an index on "element" (and maybe a
multicolumn key, depending on other factors).

> The scanning technology I want to use may need a different number of rows
> and different columns depending on the scan formula;
>     eg scan1 may need num1, num2 and num3 from the last 200 rows for
> element "x"
>        scan2 may need num1, units from the last 10 rows for element "y"

When you say "last X rows", do you mean sorted by "date"? If so, you
might want that index to be on (element, date). Then do:

SELECT num1, num2, num3 FROM mytable WHERE element = 'an_element' order
by date DESC LIMIT 20;

Replace num1, num2, num3 by whatever columns you want, and "LIMIT X" as
the number of rows you want.

HTH,

Joe

Re: One table or many tables for data set

From
Rod Taylor
Date:
On Tue, 2003-07-22 at 21:50, Rod Taylor wrote:
> Ok.. Unless I'm missing something, the data will be static (or near
> static).  It also sounds as if the structure is common for elements, so
> you probably only want 2 tables.

I misunderstood. Do what Joe suggested.

Attachment

Re: One table or many tables for data set

From
"Castle, Lindsay"
Date:
Thanks Rod

My explanations will be better next time. :-)


-----Original Message-----
From: Rod Taylor [mailto:rbt@rbt.ca]
Sent: Wednesday, 23 July 2003 11:41 AM
To: Castle, Lindsay
Cc: Postgresql Performance
Subject: Re: One table or many tables for data set


On Tue, 2003-07-22 at 21:50, Rod Taylor wrote:
> Ok.. Unless I'm missing something, the data will be static (or near
> static).  It also sounds as if the structure is common for elements, so
> you probably only want 2 tables.

I misunderstood. Do what Joe suggested.