Re: which is more scalable for the database? - Mailing list pgsql-general

From Timasmith
Subject Re: which is more scalable for the database?
Date
Msg-id 1173757150.216710.220780@30g2000cwc.googlegroups.com
Whole thread Raw
In response to Re: which is more scalable for the database?  (Shane Ambler <pgsql@Sheeky.Biz>)
List pgsql-general
On Mar 8, 2:26 pm, p...@Sheeky.Biz (Shane Ambler) wrote:
> Timasmithwrote:
> > Suppose I have a database table with 20 fields which are lookups to a
> > single table.
>
> > configtable(configtable_id, a_field, something_lookup_id,
> > another_lookup_id, ...)
> > lookup(lookup_id, value, description, ...)
>
> > what is going to be faster to map the rows to an Object which needs
> > the 'value' for every field ending in lookup_id
>
> How long is ball of string 'a' compared to ball of string 'b'?
>
> That will depend on a lot on how large the lookup table will be (100
> rows or 10 million rows?)

  100-500,000 rows

and on how long your descriptions are.

  10-60, perhaps averaging around 20 characters

Will
> you want all 20 descriptions for each query you run?

  Yes, it is the price I pay with Hibernate for vastly simplifying
data access

If your
> descriptions are 200 characters each then maybe you can reduce some data
> transfer by caching these at the client instead of transferring them
> each time you retrieve a result. But then how will you keep your local
> cache up to date?

  The data I would cache would change infrequently in the production
environment and there would be
  no expectation it would be 'immediate' on change.
  I would provide services to flush on demand, or on save (of
reference data), and/or through a scheduler etc.

>
> I would suggest the view for a couple of reasons - first it will
> simplify your queries as you won't need all the join details in every
> query you run. Second the select for the query can be parsed and stored
> whereas separate selects will be parsed each time.

  If I was caching on the client, I would select all data only once
and there would be no queries, it would only be
  select * from sometable
  and the client (actually server side bean), would populate the
object with the missing lookups.

>
> A lot of this will depend on what you are doing and what you are using
> to do it. If you are using php then you have less options for caching
> than say a client program written in c. php may run pretty fast but it
> won't outrun a compiled c program.

   Java hashtable.


>
>
>
>
>
> > a) View
>
> > select c.configtable_id, l1.value as something_lookup, l2.value as
> > another_lookup
> > from configtable c,
> >        lookup l1,
> >        lookup l2
> > where c.something_lookup_id = l1.lookup_id
> >     and c.another_lookup_id = l2.lookup_id
>
> > foreach row
> >    map values to object
> > end
>
> > b) Cache all lookup values and populate
>
> > select c.* from configtable
>
> > foreach row
> >    map values to object
> >    if lookup_id
> >        find value from hashtable and map value to object
> >    endif
> > end
>
> > It seems that the latter *might* be better to scale outward better,
> > as
> > you could add application servers to do the caching/mapping and you
> > only select from a single table?
>
> Maybe but then postgresql may do the lookups quicker than what you have
> available at the client end.

Right, but could it keep up, how much extra effort is it to do the
indexed lookups (almost all in memory), for the client.

>
> The only way you will really know is to load up some sample data and
> test each method yourself.

Its very hard to simulate though due to production hardware having 16
cpus, Gigs of memory, huge databases, hundereds of concurrent users -
I just dont have access to that kind of environment.

>
> How far are you expecting to scale? The simplest method may just work
> fast enough that all the extra caching and setup/maintenance of this
> will outweigh any benefit.

I would like to scale to 10,000s of power users on a complex OLTP
system.  Adding servers to expand out is generally feasible, but one
can only scale upward so far before reaching a hardware peak.

>
> --
>
> Shane Ambler
> p...@Sheeky.Biz
>
> Get Sheeky @http://Sheeky.Biz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match- Hide quoted text -
>
> - Show quoted text -



pgsql-general by date:

Previous
From: Dave Thompson
Date:
Subject: plperl question: how to specifiy perl version?
Next
From: "Nathanial Mayweather"
Date:
Subject: query advice