Thread: which is more scalable for the database?

which is more scalable for the database?

From
"Timasmith"
Date:
Suppose I have a database table with 20 fields which are lookups to a
single table.

configtable(configtable_id, a_field, something_lookup_id,
another_lookup_id, ...)
lookup(lookup_id, value, description, ...)


what is going to be faster to map the rows to an Object which needs
the 'value' for every field ending in lookup_id


a) View


select c.configtable_id, l1.value as something_lookup, l2.value as
another_lookup
from configtable c,
       lookup l1,
       lookup l2
where c.something_lookup_id = l1.lookup_id
    and c.another_lookup_id = l2.lookup_id


foreach row
   map values to object
end


b) Cache all lookup values and populate


select c.* from configtable


foreach row
   map values to object
   if lookup_id
       find value from hashtable and map value to object
   endif
end


It seems that the latter *might* be better to scale outward better,
as
you could add application servers to do the caching/mapping and you
only select from a single table?


Re: which is more scalable for the database?

From
Richard Huxton
Date:
Timasmith wrote:
> Suppose I have a database table with 20 fields which are lookups to a
> single table.
>
> configtable(configtable_id, a_field, something_lookup_id,
> another_lookup_id, ...)
> lookup(lookup_id, value, description, ...)
>
>
> what is going to be faster to map the rows to an Object which needs
> the 'value' for every field ending in lookup_id
[snip]
> b) Cache all lookup values and populate

> It seems that the latter *might* be better to scale outward better,
> as
> you could add application servers to do the caching/mapping and you
> only select from a single table?

It *might* scale better and be faster. It might not. The only way to
know is to test it. When you do compare, don't forget to allow for
handling changes to the lookup table in the database.

--
   Richard Huxton
   Archonet Ltd

Re: which is more scalable for the database?

From
Shane Ambler
Date:
Timasmith wrote:
> Suppose I have a database table with 20 fields which are lookups to a
> single table.
>
> configtable(configtable_id, a_field, something_lookup_id,
> another_lookup_id, ...)
> lookup(lookup_id, value, description, ...)
>
>
> what is going to be faster to map the rows to an Object which needs
> the 'value' for every field ending in lookup_id
>

How long is ball of string 'a' compared to ball of string 'b'?

That will depend on a lot on how large the lookup table will be (100
rows or 10 million rows?) and on how long your descriptions are. Will
you want all 20 descriptions for each query you run? If your
descriptions are 200 characters each then maybe you can reduce some data
transfer by caching these at the client instead of transferring them
each time you retrieve a result. But then how will you keep your local
cache up to date?

I would suggest the view for a couple of reasons - first it will
simplify your queries as you won't need all the join details in every
query you run. Second the select for the query can be parsed and stored
whereas separate selects will be parsed each time.

A lot of this will depend on what you are doing and what you are using
to do it. If you are using php then you have less options for caching
than say a client program written in c. php may run pretty fast but it
won't outrun a compiled c program.

> a) View
>
>
> select c.configtable_id, l1.value as something_lookup, l2.value as
> another_lookup
> from configtable c,
>        lookup l1,
>        lookup l2
> where c.something_lookup_id = l1.lookup_id
>     and c.another_lookup_id = l2.lookup_id
>
>
> foreach row
>    map values to object
> end
>
>
> b) Cache all lookup values and populate
>
>
> select c.* from configtable
>
>
> foreach row
>    map values to object
>    if lookup_id
>        find value from hashtable and map value to object
>    endif
> end
>
>
> It seems that the latter *might* be better to scale outward better,
> as
> you could add application servers to do the caching/mapping and you
> only select from a single table?
>

Maybe but then postgresql may do the lookups quicker than what you have
available at the client end.

The only way you will really know is to load up some sample data and
test each method yourself.

How far are you expecting to scale? The simplest method may just work
fast enough that all the extra caching and setup/maintenance of this
will outweigh any benefit.



--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: which is more scalable for the database?

From
"Timasmith"
Date:
On Mar 8, 2:26 pm, p...@Sheeky.Biz (Shane Ambler) wrote:
> Timasmithwrote:
> > Suppose I have a database table with 20 fields which are lookups to a
> > single table.
>
> > configtable(configtable_id, a_field, something_lookup_id,
> > another_lookup_id, ...)
> > lookup(lookup_id, value, description, ...)
>
> > what is going to be faster to map the rows to an Object which needs
> > the 'value' for every field ending in lookup_id
>
> How long is ball of string 'a' compared to ball of string 'b'?
>
> That will depend on a lot on how large the lookup table will be (100
> rows or 10 million rows?)

  100-500,000 rows

and on how long your descriptions are.

  10-60, perhaps averaging around 20 characters

Will
> you want all 20 descriptions for each query you run?

  Yes, it is the price I pay with Hibernate for vastly simplifying
data access

If your
> descriptions are 200 characters each then maybe you can reduce some data
> transfer by caching these at the client instead of transferring them
> each time you retrieve a result. But then how will you keep your local
> cache up to date?

  The data I would cache would change infrequently in the production
environment and there would be
  no expectation it would be 'immediate' on change.
  I would provide services to flush on demand, or on save (of
reference data), and/or through a scheduler etc.

>
> I would suggest the view for a couple of reasons - first it will
> simplify your queries as you won't need all the join details in every
> query you run. Second the select for the query can be parsed and stored
> whereas separate selects will be parsed each time.

  If I was caching on the client, I would select all data only once
and there would be no queries, it would only be
  select * from sometable
  and the client (actually server side bean), would populate the
object with the missing lookups.

>
> A lot of this will depend on what you are doing and what you are using
> to do it. If you are using php then you have less options for caching
> than say a client program written in c. php may run pretty fast but it
> won't outrun a compiled c program.

   Java hashtable.


>
>
>
>
>
> > a) View
>
> > select c.configtable_id, l1.value as something_lookup, l2.value as
> > another_lookup
> > from configtable c,
> >        lookup l1,
> >        lookup l2
> > where c.something_lookup_id = l1.lookup_id
> >     and c.another_lookup_id = l2.lookup_id
>
> > foreach row
> >    map values to object
> > end
>
> > b) Cache all lookup values and populate
>
> > select c.* from configtable
>
> > foreach row
> >    map values to object
> >    if lookup_id
> >        find value from hashtable and map value to object
> >    endif
> > end
>
> > It seems that the latter *might* be better to scale outward better,
> > as
> > you could add application servers to do the caching/mapping and you
> > only select from a single table?
>
> Maybe but then postgresql may do the lookups quicker than what you have
> available at the client end.

Right, but could it keep up, how much extra effort is it to do the
indexed lookups (almost all in memory), for the client.

>
> The only way you will really know is to load up some sample data and
> test each method yourself.

Its very hard to simulate though due to production hardware having 16
cpus, Gigs of memory, huge databases, hundereds of concurrent users -
I just dont have access to that kind of environment.

>
> How far are you expecting to scale? The simplest method may just work
> fast enough that all the extra caching and setup/maintenance of this
> will outweigh any benefit.

I would like to scale to 10,000s of power users on a complex OLTP
system.  Adding servers to expand out is generally feasible, but one
can only scale upward so far before reaching a hardware peak.

>
> --
>
> Shane Ambler
> p...@Sheeky.Biz
>
> Get Sheeky @http://Sheeky.Biz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match- Hide quoted text -
>
> - Show quoted text -