Re: improvement suggestions for performance design - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: improvement suggestions for performance design
Date
Msg-id 468CEB32.30109@enterprisedb.com
Whole thread Raw
In response to improvement suggestions for performance design  (tfinneid@ifi.uio.no)
Responses Re: improvement suggestions for performance design  (tfinneid@ifi.uio.no)
List pgsql-performance
I would strongly suggest that you use a proper relational schema,
instead of storing everything in two tables. I don't know your
application, but a schema like that is called an Entity-Attribute-Value
(though your entity seems to be just posx and posy) and it should raise
a big red flag in the mind of any database designer. In particular,
constructing queries against an EAV schema is a major pain in the ass.
This has been discussed before on postgresql lists as well, you might
want to search and read the previous discussions.

Ignoring the EAV issue for a moment, it's hard to give advice without
knowing what kind of queries are going to executed. Are the lookups
always going to be by id? By posx/posy perhaps? By attribute?

tfinneid@ifi.uio.no wrote:
> Hi
>
> I have the following scenario for a database that I need to design, and
> would like some hints on what to improve or do differently to achieve the
> desired performance goal, disregarding hardware and postgres tuning.
>
> The premise is an attribute database that stores about 100 different
> attribute types as attribute values. Every X seconds, Y number of new
> attribute values are stored in the database. X is constant and currently
> between 6 and 20 seconds, depending on the setup. In the future X could
> become as low as 3 seconds. Y can, within the next 5-10 years, become as
> high as 200 000.
>
> That means that for example, every 6 seconds 100 000 attributes needs to
> be written to the database.
>
> At the same time, somewhere between 5-20 users needs to read parts of
> those newly written attributes, maybe in total 30 000 attributes.
>
> This continues for the duration of the field operation, which could be
> 18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.
>
> Now here is how I suggest doing this:
>
> 1- the tables
>
> table attribute_values:
>     id         int
>     attr_type     int  ( references attribute_types(id) )
>     posX        int
>     posY        int
>     data_type    int
>     value        varchar(50)
>
> table attribute_types:
>     id        int
>     name        varchar(200);
>
>
>
> 2- function
>
>    a function that receives an array of data and inserts each attribute.
>    perhaps one array per attribute data (type, posX, posY, data_type,
>    value) so five arrays as in parameters ot the function
>
> 3- java client
>
>    the client receives the data from a corba request, and splits it
>    into, say 4 equally sized blocks and executes 4 threads that insert
>    each block (this seems to be more efficient than just using one
>    thread.)
>
> Now I am wondering if this is the most efficient way of doing it?
>
> - I know that I could group the attributes so that each type of attribute
> gets its own table with all attributes in one row. But I am not sure if
> that is any more efficient than ont attribute per row since I pass
> everything to the function as an array.
> With the above design a change in attribute types only requires changing
> the data in a table instead of having to modify the client, the function
> and the tables.
>
> - I am also wondering if writing the client and function in C would create
> a more efficient solution.
>
> any comments?
>
> ps, I am currently running postgres 8.1, but could probably use 8.2 if it
> is needed for functionality or performance reasons. It will run on a sparc
> machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
> necessary and SCSI disks ( perhaps in raid 0 ).


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: tfinneid@ifi.uio.no
Date:
Subject: improvement suggestions for performance design
Next
From: tfinneid@ifi.uio.no
Date:
Subject: Re: improvement suggestions for performance design