Re: improving performance of UNION and ORDER BY - Mailing list pgsql-general

From Chris Gamache
Subject Re: improving performance of UNION and ORDER BY
Date
Msg-id 20020307135628.59688.qmail@web13804.mail.yahoo.com
Whole thread Raw
In response to improving performance of UNION and ORDER BY  (Chris Gamache <cgg007@yahoo.com>)
List pgsql-general
I think I've found an acceptable compromise. The optimizer will use the index
if I limit each of the subqueries to no more than 6200 rows. I think a
composite total ~12000 is enough data, understanding that the users of this
query would be silly to use it for paging through more than a week's data. I
think it'd be silly to go beyond 24 hours... Anyway, if they're looking for
something more specific, I take out the limits to allow it search through the
entire database, and unless they're not using specific enough search criteria
the query will still move rather quickly. A decent compromise.

Thanks for helping me nail this one down!

CG

--- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:
> On Wed, 6 Mar 2002, Chris Gamache wrote:
>
> > UNION ALL was an excellent idea! It didn't cut much time off, but at least
> no
> > resources are devoted to eliminating the nonexistant duplicate rows.
> >
> > I've had days to think about this. It seems as though the ORDER BY part of
> the
> > first query is the culprit. When I run this query by itself, I can see that
> it
> > would comprise the bulk of the UNION query time.
> >
> > select
> >    a.username as "User",
> >    a.trans_date as "Date",
> >    tl.longtype as "Type",
> >    a.trans_data as "Query Data",
> >    a.trans_charge as "Charged",
> >    a.user_reference_id as "Reference ID"
> >  from a_trans_log a
> >  join addtypelong tl on a.trans_type = tl.shorttype
> >  where a.trans_date >= '12/31/01'::TIMESTAMP
> >  order by a.trans_date desc, a.trans_data limit 20;
> >
> > By removing the ORDER BY a.trans_data, it cut the query down to the "almost
> > instant" level... EXPLAIN shows me that it uses the indeces! I guess I need
> to
> > drop that part of the ORDER BY, or make an index for it to use... Bah.
>
> Yeah, although since you're doing different kinds of scans on the two
> columns (desc and asc), I'm not sure what'd be necessary to get the index
> used.  Normally you'd just make an index on trans_date,trans_data, but
> that won't work here.  I think Tom Lane may have discussed a way to get
> the index scan for that case sometime in the past couple of months on the
> mailing list.
>
> > Alas...
> > Unless someone knows different, I don't believe that I can use the LIMIT
> > statement in each of the subqueries. The app needs to page through the
> > UNIONized table. As the app pages through LIMIT 20,20 LIMIT 20,40 LIMIT
> 20,60,
> > etc. It needs to look at the whole sorted UNIONized table. (am I making ANY
> > sense?)
>
> That makes sense.  You could cheat a little if you knew how large the
> results sets for the parts were if you're using union all since I believe
> it will do them in order and just append the sets, but that doesn't seem
> like a good idea in general.
>
> > Even by dropping the order on column 4, it still takes 6 seconds to
> assemble
> > the data, sort it and limit it... Could I create a cross-table index
> > specifically for this query? I doubt its a) possible, b) (even if possible)
> a
> > good idea.
>
> I don't think so.
>
> One question is whether some of this could be done by keeping another
> table around that you do your limits through.  This would at least cut
> down some of the cost (or for that matter if you could make a table with
> this using triggers and such).
>
>


__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

pgsql-general by date:

Previous
From: Jon Hassen
Date:
Subject: index item size 4496 exceeds maximum 2713
Next
From: "montfort"
Date:
Subject: error