Thread: which is more scalable for the database?
Suppose I have a database table with 20 fields which are lookups to a single table. configtable(configtable_id, a_field, something_lookup_id, another_lookup_id, ...) lookup(lookup_id, value, description, ...) what is going to be faster to map the rows to an Object which needs the 'value' for every field ending in lookup_id a) View select c.configtable_id, l1.value as something_lookup, l2.value as another_lookup from configtable c, lookup l1, lookup l2 where c.something_lookup_id = l1.lookup_id and c.another_lookup_id = l2.lookup_id foreach row map values to object end b) Cache all lookup values and populate select c.* from configtable foreach row map values to object if lookup_id find value from hashtable and map value to object endif end It seems that the latter *might* be better to scale outward better, as you could add application servers to do the caching/mapping and you only select from a single table?
Timasmith wrote: > Suppose I have a database table with 20 fields which are lookups to a > single table. > > configtable(configtable_id, a_field, something_lookup_id, > another_lookup_id, ...) > lookup(lookup_id, value, description, ...) > > > what is going to be faster to map the rows to an Object which needs > the 'value' for every field ending in lookup_id [snip] > b) Cache all lookup values and populate > It seems that the latter *might* be better to scale outward better, > as > you could add application servers to do the caching/mapping and you > only select from a single table? It *might* scale better and be faster. It might not. The only way to know is to test it. When you do compare, don't forget to allow for handling changes to the lookup table in the database. -- Richard Huxton Archonet Ltd
Timasmith wrote: > Suppose I have a database table with 20 fields which are lookups to a > single table. > > configtable(configtable_id, a_field, something_lookup_id, > another_lookup_id, ...) > lookup(lookup_id, value, description, ...) > > > what is going to be faster to map the rows to an Object which needs > the 'value' for every field ending in lookup_id > How long is ball of string 'a' compared to ball of string 'b'? That will depend on a lot on how large the lookup table will be (100 rows or 10 million rows?) and on how long your descriptions are. Will you want all 20 descriptions for each query you run? If your descriptions are 200 characters each then maybe you can reduce some data transfer by caching these at the client instead of transferring them each time you retrieve a result. But then how will you keep your local cache up to date? I would suggest the view for a couple of reasons - first it will simplify your queries as you won't need all the join details in every query you run. Second the select for the query can be parsed and stored whereas separate selects will be parsed each time. A lot of this will depend on what you are doing and what you are using to do it. If you are using php then you have less options for caching than say a client program written in c. php may run pretty fast but it won't outrun a compiled c program. > a) View > > > select c.configtable_id, l1.value as something_lookup, l2.value as > another_lookup > from configtable c, > lookup l1, > lookup l2 > where c.something_lookup_id = l1.lookup_id > and c.another_lookup_id = l2.lookup_id > > > foreach row > map values to object > end > > > b) Cache all lookup values and populate > > > select c.* from configtable > > > foreach row > map values to object > if lookup_id > find value from hashtable and map value to object > endif > end > > > It seems that the latter *might* be better to scale outward better, > as > you could add application servers to do the caching/mapping and you > only select from a single table? > Maybe but then postgresql may do the lookups quicker than what you have available at the client end. The only way you will really know is to load up some sample data and test each method yourself. How far are you expecting to scale? The simplest method may just work fast enough that all the extra caching and setup/maintenance of this will outweigh any benefit. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
On Mar 8, 2:26 pm, p...@Sheeky.Biz (Shane Ambler) wrote: > Timasmithwrote: > > Suppose I have a database table with 20 fields which are lookups to a > > single table. > > > configtable(configtable_id, a_field, something_lookup_id, > > another_lookup_id, ...) > > lookup(lookup_id, value, description, ...) > > > what is going to be faster to map the rows to an Object which needs > > the 'value' for every field ending in lookup_id > > How long is ball of string 'a' compared to ball of string 'b'? > > That will depend on a lot on how large the lookup table will be (100 > rows or 10 million rows?) 100-500,000 rows and on how long your descriptions are. 10-60, perhaps averaging around 20 characters Will > you want all 20 descriptions for each query you run? Yes, it is the price I pay with Hibernate for vastly simplifying data access If your > descriptions are 200 characters each then maybe you can reduce some data > transfer by caching these at the client instead of transferring them > each time you retrieve a result. But then how will you keep your local > cache up to date? The data I would cache would change infrequently in the production environment and there would be no expectation it would be 'immediate' on change. I would provide services to flush on demand, or on save (of reference data), and/or through a scheduler etc. > > I would suggest the view for a couple of reasons - first it will > simplify your queries as you won't need all the join details in every > query you run. Second the select for the query can be parsed and stored > whereas separate selects will be parsed each time. If I was caching on the client, I would select all data only once and there would be no queries, it would only be select * from sometable and the client (actually server side bean), would populate the object with the missing lookups. > > A lot of this will depend on what you are doing and what you are using > to do it. If you are using php then you have less options for caching > than say a client program written in c. php may run pretty fast but it > won't outrun a compiled c program. Java hashtable. > > > > > > > a) View > > > select c.configtable_id, l1.value as something_lookup, l2.value as > > another_lookup > > from configtable c, > > lookup l1, > > lookup l2 > > where c.something_lookup_id = l1.lookup_id > > and c.another_lookup_id = l2.lookup_id > > > foreach row > > map values to object > > end > > > b) Cache all lookup values and populate > > > select c.* from configtable > > > foreach row > > map values to object > > if lookup_id > > find value from hashtable and map value to object > > endif > > end > > > It seems that the latter *might* be better to scale outward better, > > as > > you could add application servers to do the caching/mapping and you > > only select from a single table? > > Maybe but then postgresql may do the lookups quicker than what you have > available at the client end. Right, but could it keep up, how much extra effort is it to do the indexed lookups (almost all in memory), for the client. > > The only way you will really know is to load up some sample data and > test each method yourself. Its very hard to simulate though due to production hardware having 16 cpus, Gigs of memory, huge databases, hundereds of concurrent users - I just dont have access to that kind of environment. > > How far are you expecting to scale? The simplest method may just work > fast enough that all the extra caching and setup/maintenance of this > will outweigh any benefit. I would like to scale to 10,000s of power users on a complex OLTP system. Adding servers to expand out is generally feasible, but one can only scale upward so far before reaching a hardware peak. > > -- > > Shane Ambler > p...@Sheeky.Biz > > Get Sheeky @http://Sheeky.Biz > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match- Hide quoted text - > > - Show quoted text -