Thread: What to index to speed up my UNION views?

What to index to speed up my UNION views?

From

Ashley Moran

Date:

27 March 2006, 07:00:43

I had a bright idea that has not worked quite as well as I thought.

We have a web sites for selling cars that we are trying to expand to vans,
bikes etc.  We get a datafeed containing prices and technical data updated
nightly (for cars it's about 2.3GB, others are smaller).  This comes into SQL
Server as one database per vehicle type, and we export it as CSV data, which
is COPY'd into Postgres.

The current version of the application uses Hibernate.  Now, as the database
for each vehicle type has an idential schema I thought it would be possible
to use views to access the underlying data so we don't have to duplicate the
mappings for each data type.  So for example, there is a table "capmod" which
stores vehicle models.  Unfortunately, the primary key column for the table
is not unique across all dataset databases, so a model id used to identify a
car model in the car database may also identiffy a van model in the van
database.

So, I created a view like this:

CREATE OR REPLACE VIEW capmod AS
 SELECT 'cap_car'::character varying::character varying(10) AS "vehicle_type",
car_capmod.cmod_code, ...
   FROM cap_car.car_capmod
UNION
 SELECT 'cap_lcv'::character varying::character varying(10) AS "vehicle_type",
lcv_capmod.cmod_code, ...
   FROM cap_lcv.lcv_capmod;

I've removed all the actual data columns.

Effectively this makes the primary key for the view composite based on
vehicle_type and cmod_code.

The problem is that performance has taken a massive hit.  Maybe the answer to
this is simple - I just need to make sure that the index on the underlying
car_capmod, lcv_capmod, XXX_capmod tables is hit in a query such as "SELECT *
FROM capmod where vehicle_type = 'cap_car' and cmod_code=1234".  Failing that
I will have to include a vehicle_type column in each of the underlying
tables, but I want to avoid anything that complicates the import procedure
(which is already very slow).

I'd be very grateful for any advice

Cheers
Ashley Moran

Re: What to index to speed up my UNION views?

From

Martijn van Oosterhout

Date:

27 March 2006, 07:13:26

On Mon, Mar 27, 2006 at 11:44:00AM +0100, Ashley Moran wrote:
> I had a bright idea that has not worked quite as well as I thought.
>
> We have a web sites for selling cars that we are trying to expand to vans,
> bikes etc.  We get a datafeed containing prices and technical data updated
> nightly (for cars it's about 2.3GB, others are smaller).  This comes into SQL
> Server as one database per vehicle type, and we export it as CSV data, which
> is COPY'd into Postgres.

<snip>

Just to make sure it's not something obvious, but:

> So, I created a view like this:
>
> CREATE OR REPLACE VIEW capmod AS
>  SELECT 'cap_car'::character varying::character varying(10) AS "vehicle_type",
> car_capmod.cmod_code, ...
>    FROM cap_car.car_capmod
> UNION
>  SELECT 'cap_lcv'::character varying::character varying(10) AS "vehicle_type",
> lcv_capmod.cmod_code, ...
>    FROM cap_lcv.lcv_capmod;

You're using UNION rather than UNION ALL where. There's a big
difference and I imagine you actually want the latter. It also makes a
big difference in query optimisation.

If that doesn't fix it, come back with the EXPLAIN ANALYZE output of
your query.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

signature.asc

Re: What to index to speed up my UNION views?

From

"Jim C. Nasby"

Date:

27 March 2006, 08:57:05

On Mon, Mar 27, 2006 at 11:44:00AM +0100, Ashley Moran wrote:
> I had a bright idea that has not worked quite as well as I thought.
>
> We have a web sites for selling cars that we are trying to expand to vans,
> bikes etc.  We get a datafeed containing prices and technical data updated
> nightly (for cars it's about 2.3GB, others are smaller).  This comes into SQL
> Server as one database per vehicle type, and we export it as CSV data, which
> is COPY'd into Postgres.
>
> The current version of the application uses Hibernate.  Now, as the database
> for each vehicle type has an idential schema I thought it would be possible
> to use views to access the underlying data so we don't have to duplicate the
> mappings for each data type.  So for example, there is a table "capmod" which
> stores vehicle models.  Unfortunately, the primary key column for the table
> is not unique across all dataset databases, so a model id used to identify a
> car model in the car database may also identiffy a van model in the van
> database.

BTW, you might also find inheritance to be of use:
http://www.postgresql.org/docs/8.1/interactive/ddl-inherit.html
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: What to index to speed up my UNION views?

From

Ashley Moran

Date:

28 March 2006, 04:43:08

On Monday 27 March 2006 13:57, Jim C. Nasby wrote:
> BTW, you might also find inheritance to be of use:
> http://www.postgresql.org/docs/8.1/interactive/ddl-inherit.html

I'm not sure inheritance will help here or not.  I need to aggregate all the
sub tables and add a field whose value depends on which table the data came
from.  I think views are probably easier but I've never actually used table
inheritance.

Ashley

Re: What to index to speed up my UNION views?

From

Ashley Moran

Date:

28 March 2006, 04:45:02

On Monday 27 March 2006 12:13, Martijn van Oosterhout wrote:
> You're using UNION rather than UNION ALL where. There's a big
> difference and I imagine you actually want the latter. It also makes a
> big difference in query optimisation.
>
> If that doesn't fix it, come back with the EXPLAIN ANALYZE output of
> your query.
>
> Have a nice day,

Thanks Martin

It turned out that the query being used was wrong anyway (pulling through too
much data).  When it was fixed it sped up greatly. I've searched on the
Postgres docs and can't find an explanation of UNION ALL.  How does it differ
from UNION?

Ashley

Re: What to index to speed up my UNION views?

From

Martijn van Oosterhout

Date:

28 March 2006, 05:05:20

On Tue, Mar 28, 2006 at 09:45:02AM +0100, Ashley Moran wrote:
> On Monday 27 March 2006 12:13, Martijn van Oosterhout wrote:
> > You're using UNION rather than UNION ALL where. There's a big
> > difference and I imagine you actually want the latter. It also makes a
> > big difference in query optimisation.
> >
> > If that doesn't fix it, come back with the EXPLAIN ANALYZE output of
> > your query.
> >
> > Have a nice day,
>
> Thanks Martin
>
> It turned out that the query being used was wrong anyway (pulling through too
> much data).  When it was fixed it sped up greatly. I've searched on the
> Postgres docs and can't find an explanation of UNION ALL.  How does it differ
> from UNION?

That's because it's decribed in the SQL standard. UNION ALL just joins
the results of the two queries. UNION removes duplicates which usually
means sorting and comparing the tuples. UNION ALL is faster and usually
what you want anyway...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

signature.asc

Re: What to index to speed up my UNION views?

From

Ashley Moran

Date:

28 March 2006, 05:15:07

On Tuesday 28 March 2006 10:05, Martijn van Oosterhout wrote:
> That's because it's decribed in the SQL standard. UNION ALL just joins
> the results of the two queries. UNION removes duplicates which usually
> means sorting and comparing the tuples. UNION ALL is faster and usually
> what you want anyway...

Yes that's exactly what I want.  Thanks for pointing out the difference