Thread: What to index to speed up my UNION views?
I had a bright idea that has not worked quite as well as I thought. We have a web sites for selling cars that we are trying to expand to vans, bikes etc. We get a datafeed containing prices and technical data updated nightly (for cars it's about 2.3GB, others are smaller). This comes into SQL Server as one database per vehicle type, and we export it as CSV data, which is COPY'd into Postgres. The current version of the application uses Hibernate. Now, as the database for each vehicle type has an idential schema I thought it would be possible to use views to access the underlying data so we don't have to duplicate the mappings for each data type. So for example, there is a table "capmod" which stores vehicle models. Unfortunately, the primary key column for the table is not unique across all dataset databases, so a model id used to identify a car model in the car database may also identiffy a van model in the van database. So, I created a view like this: CREATE OR REPLACE VIEW capmod AS SELECT 'cap_car'::character varying::character varying(10) AS "vehicle_type", car_capmod.cmod_code, ... FROM cap_car.car_capmod UNION SELECT 'cap_lcv'::character varying::character varying(10) AS "vehicle_type", lcv_capmod.cmod_code, ... FROM cap_lcv.lcv_capmod; I've removed all the actual data columns. Effectively this makes the primary key for the view composite based on vehicle_type and cmod_code. The problem is that performance has taken a massive hit. Maybe the answer to this is simple - I just need to make sure that the index on the underlying car_capmod, lcv_capmod, XXX_capmod tables is hit in a query such as "SELECT * FROM capmod where vehicle_type = 'cap_car' and cmod_code=1234". Failing that I will have to include a vehicle_type column in each of the underlying tables, but I want to avoid anything that complicates the import procedure (which is already very slow). I'd be very grateful for any advice Cheers Ashley Moran
On Mon, Mar 27, 2006 at 11:44:00AM +0100, Ashley Moran wrote: > I had a bright idea that has not worked quite as well as I thought. > > We have a web sites for selling cars that we are trying to expand to vans, > bikes etc. We get a datafeed containing prices and technical data updated > nightly (for cars it's about 2.3GB, others are smaller). This comes into SQL > Server as one database per vehicle type, and we export it as CSV data, which > is COPY'd into Postgres. <snip> Just to make sure it's not something obvious, but: > So, I created a view like this: > > CREATE OR REPLACE VIEW capmod AS > SELECT 'cap_car'::character varying::character varying(10) AS "vehicle_type", > car_capmod.cmod_code, ... > FROM cap_car.car_capmod > UNION > SELECT 'cap_lcv'::character varying::character varying(10) AS "vehicle_type", > lcv_capmod.cmod_code, ... > FROM cap_lcv.lcv_capmod; You're using UNION rather than UNION ALL where. There's a big difference and I imagine you actually want the latter. It also makes a big difference in query optimisation. If that doesn't fix it, come back with the EXPLAIN ANALYZE output of your query. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Mon, Mar 27, 2006 at 11:44:00AM +0100, Ashley Moran wrote: > I had a bright idea that has not worked quite as well as I thought. > > We have a web sites for selling cars that we are trying to expand to vans, > bikes etc. We get a datafeed containing prices and technical data updated > nightly (for cars it's about 2.3GB, others are smaller). This comes into SQL > Server as one database per vehicle type, and we export it as CSV data, which > is COPY'd into Postgres. > > The current version of the application uses Hibernate. Now, as the database > for each vehicle type has an idential schema I thought it would be possible > to use views to access the underlying data so we don't have to duplicate the > mappings for each data type. So for example, there is a table "capmod" which > stores vehicle models. Unfortunately, the primary key column for the table > is not unique across all dataset databases, so a model id used to identify a > car model in the car database may also identiffy a van model in the van > database. BTW, you might also find inheritance to be of use: http://www.postgresql.org/docs/8.1/interactive/ddl-inherit.html -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Monday 27 March 2006 13:57, Jim C. Nasby wrote: > BTW, you might also find inheritance to be of use: > http://www.postgresql.org/docs/8.1/interactive/ddl-inherit.html I'm not sure inheritance will help here or not. I need to aggregate all the sub tables and add a field whose value depends on which table the data came from. I think views are probably easier but I've never actually used table inheritance. Ashley
On Monday 27 March 2006 12:13, Martijn van Oosterhout wrote: > You're using UNION rather than UNION ALL where. There's a big > difference and I imagine you actually want the latter. It also makes a > big difference in query optimisation. > > If that doesn't fix it, come back with the EXPLAIN ANALYZE output of > your query. > > Have a nice day, Thanks Martin It turned out that the query being used was wrong anyway (pulling through too much data). When it was fixed it sped up greatly. I've searched on the Postgres docs and can't find an explanation of UNION ALL. How does it differ from UNION? Ashley
On Tue, Mar 28, 2006 at 09:45:02AM +0100, Ashley Moran wrote: > On Monday 27 March 2006 12:13, Martijn van Oosterhout wrote: > > You're using UNION rather than UNION ALL where. There's a big > > difference and I imagine you actually want the latter. It also makes a > > big difference in query optimisation. > > > > If that doesn't fix it, come back with the EXPLAIN ANALYZE output of > > your query. > > > > Have a nice day, > > Thanks Martin > > It turned out that the query being used was wrong anyway (pulling through too > much data). When it was fixed it sped up greatly. I've searched on the > Postgres docs and can't find an explanation of UNION ALL. How does it differ > from UNION? That's because it's decribed in the SQL standard. UNION ALL just joins the results of the two queries. UNION removes duplicates which usually means sorting and comparing the tuples. UNION ALL is faster and usually what you want anyway... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Tuesday 28 March 2006 10:05, Martijn van Oosterhout wrote: > That's because it's decribed in the SQL standard. UNION ALL just joins > the results of the two queries. UNION removes duplicates which usually > means sorting and comparing the tuples. UNION ALL is faster and usually > what you want anyway... Yes that's exactly what I want. Thanks for pointing out the difference