Re: Help with tuning this query - Mailing list pgsql-performance
From | John Arbash Meinel |
---|---|
Subject | Re: Help with tuning this query |
Date | |
Msg-id | 4225FA10.6070302@arbash-meinel.com Whole thread Raw |
In response to | Re: Help with tuning this query ("Ken Egervari" <ken@upfactor.com>) |
List | pgsql-performance |
Ken Egervari wrote: >> First, what version of postgres, and have you run VACUUM ANALYZE >> recently? >> Also, please attach the result of running EXPLAIN ANALYZE. >> (eg, explain analyze select s.* from shipment ...) > > > I'm using postgres 8.0. I wish I could paste explain analyze, but I > won't be at work for a few days. I was hoping some Postgres/SQL > experts here would be able to simply look at the query and make > recommendations because it's not a very difficult or unique query. > That's the problem. Without explain analyze, it's hard to say why it is performing weird, because it *does* look like a straightforward query. >> It's very possible that you don't have up-to-date statistics, which >> causes postgres to make a bad estimate of what the fastest plan is. > > > I run VACUUM ANALYZE religiously. I even dumped the production > database and used it as my test database after a full vacuum analyze. > It's really as fresh as it can be. > Good. Again, this is just the first precaution, as not everyone is as careful as you. And without the explain analyze, you can't tell what the planner estimates are. >> I don't know how to make Hibernate do what you want, but if you change >> the query to using subselects (not all databases support this, so >> hibernate might not let you), you can see a performance improvement. > > > Yes, Hibernate supports sub-selects. In fact, I can even drop down to > JDBC explicitly, so whatever SQL tricks out there I can use will work > on Hibernate. In what way will sub-selects improve this query? > When doing massive joins across multiple tables (as you are doing) it is frequently faster to do a couple of small joins where you only need a couple of rows as input to the rest. Something like: select * from shipment s where s.carrier_code_id in (select cc.id from carrier_code cc join carrier c on cc.carrier_id = c.id) and s.current_status_id in (select cs.id from shipment_status cs where ...) Again it's something that you can try. I have found quite a few of my queries performed much better with subselects. I'm guessing it's because with big queries it has a harder time figuring out how to refactor (the decision tree becomes big). But I'm not really sure. I just know it can work. >> Also sometimes using explicit joins can be worse than just letting the >> query manager figure it out. So something like >> select s.* from shipment s, carrier_code cc, carrier c, ... >> where s.carrier_code_id = cc.id and c.id = cc.carrier_id and .... > > > I think I can avoid using joins in Hibernate, but it makes the query > harder to maintain. How much of a performance benefit are we talking > with this change? Since hibernate is an object language, you don't > actually have to specify many joins. You can use the "dot" notation. > I'm not saying this *will* improve performance. It is just something to try. It very easily could not be worth the overhead. > Query query = session.createQuery( > "select shipment " + > "from Shipment shipment " + > " inner join > shipment.cargoControlNumber.carrierCode.carrier.persons person " + > " inner join shipment.currentStatus currentStatus " + > " inner join currentStatus.releaseCode releaseCode " + > " left join fetch shipment.currentStatus " + > "where " + > " person.id = :personId and " + > " shipment.isPurged = false and " + > " releaseCode.number = '9' and " + > " currentStatus is not null and " + > " currentStatus.date >= current_date - 31 " + > "order by currentStatus.date desc" > ); > > query.setParameter( "personId", personId ); > > query.setFirstResult( firstResult ); > query.setMaxResults( maxResults ); > > return query.list(); > > As you can see, it's fairly elegant language and maps to SQL quite well. > >> But again, since this is generated from another program (Hibernate), I >> really don't know how you tell it how to tune the SQL. Probably the >> biggest "non-bug" performance improvements are from tuning the SQL. > > > I agree, but the ones I've tried aren't good enough. I have made > these indexes that apply to this query as well as others in my from > looking at my SQL scripts. Many of my queries have really sped up to > 14 milliseconds from these indexes. But I can't make this query run > any faster. > > CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person > USING btree (person_id); > CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person > USING btree (carrier_id); > CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree > (carrier_id); > CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree > (carrier_code_id); > CREATE INDEX current_status_date_idx ON shipment_status USING btree > (date); > CREATE INDEX shipment_current_status_id_idx ON shipment USING btree > (current_status_id); > CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING > btree (shipment_id); > > Thanks for your responses everyone. I'll try and get you that explain > analyze. I'm just not at work at the moment but this is a problem > that I'm simply puzzled and worried about. I'm getting all of this > from CVS on my work server. > > Ken There is also the possibility that you are having problems with cross-column correlation, or poor distribution of a column. Postgres doesn't keep cross-column statistics, so if 2 columns are correlated, then it mis-estimates selectivity, and might pick the wrong plan. In general your query looks decent, we just need to figure out what is going on. John =:->
Attachment
pgsql-performance by date: