Thread: How to read query plan
Hi all, I am new to PostgreSQL and query optimizations. We have recently moved our project from MySQL to PostgreSQL and we are having performance problem with one of our most often used queries. On MySQL the speed was sufficient but PostgreSQL chooses time expensive query plan. I would like to optimize it somehow but the query plan from EXPLAIN ANALYZE is little bit cryptic to me. So the first thing I would like is to understand the query plan. I have read "performance tips" and FAQ but it didn't move me too much further. I would appreciate if someone could help me to understand the query plan and what are the possible general options I can test. I think at this moment the most expensive part is the "Sort". Am I right? If so, how could I generally avoid it (turning something on or off, using parentheses for JOINs etc.) to force some more efficient query plan? Thank you for any suggestions. QUERY PLAN Merge Right Join (cost=9868.84..9997.74 rows=6364 width=815) (actual time=9982.022..10801.216 rows=6364 loops=1) Merge Cond: ("outer".idpk = "inner".cadastralunitidfk) -> Index Scan using cadastralunits_pkey on cadastralunits (cost=0.00..314.72 rows=13027 width=31) (actual time=0.457..0.552rows=63 loops=1) -> Sort (cost=9868.84..9884.75 rows=6364 width=788) (actual time=9981.405..10013.708 rows=6364 loops=1) Sort Key: addevicessites.cadastralunitidfk -> Hash Left Join (cost=5615.03..7816.51 rows=6364 width=788) (actual time=3898.603..9884.248 rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartnerstickeridfk = "inner".idpk) -> Hash Left Join (cost=5612.27..7718.29 rows=6364 width=762) (actual time=3898.243..9104.791 rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartnermaintaineridfk = "inner".idpk) -> Hash Left Join (cost=5609.51..7620.06 rows=6364 width=736) (actual time=3897.996..8341.965 rows=6364loops=1) Hash Cond: ("outer".addevicessitepartnerelectricitysupplieridfk = "inner".idpk) -> Hash Left Join (cost=5606.74..7521.84 rows=6364 width=710) (actual time=3897.736..7572.182rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartneridentificationoperatoridfk = "inner".idpk) -> Nested Loop Left Join (cost=5603.98..7423.62 rows=6364 width=684) (actual time=3897.436..6821.713rows=6364 loops=1) Join Filter: ("outer".addevicessitestatustypeidfk = "inner".idpk) -> Nested Loop Left Join (cost=5602.93..6706.61 rows=6364 width=657) (actual time=3897.294..6038.976rows=6364 loops=1) Join Filter: ("outer".addevicessitepositionidfk = "inner".idpk) -> Nested Loop Left Join (cost=5601.89..6276.01 rows=6364 width=634) (actualtime=3897.158..5303.575 rows=6364 loops=1) Join Filter: ("outer".addevicessitevisibilityidfk = "inner".idpk) -> Merge Right Join (cost=5600.85..5702.21 rows=6364 width=602) (actualtime=3896.963..4583.749 rows=6364 loops=1) Merge Cond: ("outer".idpk = "inner".addevicessitesizeidfk) -> Index Scan using addevicessitesizes_pkey on addevicessitesizes (cost=0.00..5.62 rows=110 width=14) (actual time=0.059..0.492 rows=110 loops=1) -> Sort (cost=5600.85..5616.76 rows=6364 width=592) (actual time=3896.754..3915.022rows=6364 loops=1) Sort Key: addevicessites.addevicessitesizeidfk -> Hash Left Join (cost=2546.59..4066.81 rows=6364 width=592)(actual time=646.162..3792.310 rows=6364 loops=1) Hash Cond: ("outer".addevicessitedistrictidfk = "inner".idpk) -> Hash Left Join (cost=2539.29..3964.05 rows=6364width=579) (actual time=645.296..3142.128 rows=6364 loops=1) Hash Cond: ("outer".addevicessitestreetdescriptionidfk= "inner".idpk) -> Hash Left Join (cost=2389.98..2724.64 rows=6364width=544) (actual time=632.806..2466.030 rows=6364 loops=1) Hash Cond: ("outer".addevicessitestreetidfk= "inner".idpk) -> Hash Left Join (cost=2324.25..2515.72rows=6364 width=518) (actual time=626.081..1822.137 rows=6364 loops=1) Hash Cond: ("outer".addevicessitecityidfk= "inner".idpk) -> Merge Right Join (cost=2321.70..2417.71rows=6364 width=505) (actual time=625.598..1220.967 rows=6364 loops=1) Merge Cond: ("outer".idpk = "inner".addevicessitecountyidfk) -> Sort (cost=5.83..6.10 rows=110width=17) (actual time=0.348..0.391 rows=110 loops=1) Sort Key: addevicessitecounties.idpk -> Seq Scan on addevicessitecounties (cost=0.00..2.10 rows=110 width=17) (actual time=0.007..0.145 rows=110 loops=1) -> Sort (cost=2315.87..2331.78rows=6364 width=492) (actual time=625.108..640.325 rows=6364 loops=1) Sort Key: addevicessites.addevicessitecountyidfk -> Merge Right Join (cost=0.00..1006.90rows=6364 width=492) (actual time=0.145..543.043 rows=6364 loops=1) Merge Cond: ("outer".idpk= "inner".addevicessiteregionidfk) -> Index Scan usingaddevicessiteregions_pkey on addevicessiteregions (cost=0.00..3.17 rows=15 width=23) (actual time=0.011..0.031 rows=15loops=1) -> Index Scan usingaddevicessites_addevicessiteregionidfk on addevicessites (cost=0.00..924.14 rows=6364 width=473) (actual time=0.010..9.825rows=6364 loops=1) -> Hash (cost=2.24..2.24 rows=124width=17) (actual time=0.238..0.238 rows=0 loops=1) -> Seq Scan on addevicessitecities (cost=0.00..2.24 rows=124 width=17) (actual time=0.009..0.145 rows=124 loops=1) -> Hash (cost=58.58..58.58 rows=2858 width=34)(actual time=6.532..6.532 rows=0 loops=1) -> Seq Scan on addevicessitestreets (cost=0.00..58.58 rows=2858 width=34) (actual time=0.040..4.129 rows=2858 loops=1) -> Hash (cost=96.85..96.85 rows=4585 width=43)(actual time=11.786..11.786 rows=0 loops=1) -> Seq Scan on addevicessitestreetdescriptions (cost=0.00..96.85 rows=4585 width=43) (actual time=0.036..7.290 rows=4585 loops=1) -> Hash (cost=6.44..6.44 rows=344 width=21) (actualtime=0.730..0.730 rows=0 loops=1) -> Seq Scan on addevicessitedistricts (cost=0.00..6.44rows=344 width=21) (actual time=0.027..0.478 rows=344 loops=1) -> Materialize (cost=1.04..1.08 rows=4 width=36) (actual time=0.000..0.002rows=4 loops=6364) -> Seq Scan on addevicessitevisibilities (cost=0.00..1.04 rows=4width=36) (actual time=0.036..0.050 rows=4 loops=1) -> Materialize (cost=1.03..1.06 rows=3 width=27) (actual time=0.001..0.002rows=3 loops=6364) -> Seq Scan on addevicessitepositions (cost=0.00..1.03 rows=3 width=27)(actual time=0.013..0.017 rows=3 loops=1) -> Materialize (cost=1.05..1.10 rows=5 width=31) (actual time=0.000..0.002 rows=5loops=6364) -> Seq Scan on addevicessitestatustypes (cost=0.00..1.05 rows=5 width=31) (actualtime=0.012..0.019 rows=5 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.171..0.171 rows=0 loops=1) -> Seq Scan on partneridentifications partneridentificationsoperator (cost=0.00..2.61rows=61 width=34) (actual time=0.027..0.126 rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.130..0.130 rows=0 loops=1) -> Seq Scan on partners partnerselectricitysupplier (cost=0.00..2.61 rows=61 width=34)(actual time=0.003..0.076 rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.118..0.118 rows=0 loops=1) -> Seq Scan on partners partnersmaintainer (cost=0.00..2.61 rows=61 width=34) (actual time=0.003..0.075rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.171..0.171 rows=0 loops=1) -> Seq Scan on partners partnerssticker (cost=0.00..2.61 rows=61 width=34) (actual time=0.029..0.120rows=61 loops=1) Total runtime: 10811.567 ms -- Miroslav Šulc
Attachment
Miroslav Šulc wrote: > Hi all, > > I am new to PostgreSQL and query optimizations. We have recently moved > our project from MySQL to PostgreSQL and we are having performance > problem with one of our most often used queries. On MySQL the speed > was sufficient but PostgreSQL chooses time expensive query plan. I > would like to optimize it somehow but the query plan from EXPLAIN > ANALYZE is little bit cryptic to me. > > So the first thing I would like is to understand the query plan. I > have read "performance tips" and FAQ but it didn't move me too much > further. > > I would appreciate if someone could help me to understand the query > plan and what are the possible general options I can test. I think at > this moment the most expensive part is the "Sort". Am I right? If so, > how could I generally avoid it (turning something on or off, using > parentheses for JOINs etc.) to force some more efficient query plan? > > Thank you for any suggestions. > You really need to post the original query, so we can see *why* postgres thinks it needs to run the plan this way. Also, the final sort actually isn't that expensive. When you have the numbers (cost=xxx..yyy) the xxx is the time when the step can start, and the yyy is the time when the step can finish. For a lot of steps, it can start running while the sub-steps are still feeding back more data, for others, it has to wait for the sub-steps to finish. The first thing to look for, is to make sure the estimated number of rows is close to the actual number of rows. If they are off, then postgres may be mis-estimating the optimal plan. (If postgres thinks it is going to only need 10 rows, it may use an index scan, but when 1000 rows are returned, a seq scan might have been faster.) You seem to be doing a lot of outer joins. Is that necessary? I don't really know what you are looking for, but you are joining against enough tables, that I think this query is always going to be slow. From what I can tell, you have 1 table which has 6364 rows, and you are grabbing all of those rows, and then outer joining it with about 11 other tables. I would actually guess that the most expensive parts of the plan are the NESTED LOOPS which when they go to materialize have to do a sequential scan, and they get executed 6364 times. It looks like the other tables are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan, the problem is that because you are doing it 6k times, it ends up taking about 300ms of your time. You could try setting "set enable_nestloop to off". I don't know that it will be faster, but it could be. In general, though, it seems like you should be asking a different question, rather than trying to optimize the query that you have. Can you post the original SQL statement, and maybe describe what you are trying to do? John =:->
Attachment
On Sun, 2005-03-13 at 16:32 +0100, Miroslav Šulc wrote: > Hi all, > > I am new to PostgreSQL and query optimizations. We have recently moved > our project from MySQL to PostgreSQL and we are having performance > problem with one of our most often used queries. On MySQL the speed was > sufficient but PostgreSQL chooses time expensive query plan. I would > like to optimize it somehow but the query plan from EXPLAIN ANALYZE is > little bit cryptic to me. > [snip output of EXPLAIN ANALYZE] for those of us who have not yet reached the level where one can infer it from the query plan, how abour showing us the actual query too ? but as an example of what to look for, consider the first few lines (reformatted): > Merge Right Join (cost=9868.84..9997.74 rows=6364 width=815) > (actual time=9982.022..10801.216 rows=6364 loops=1) > Merge Cond: ("outer".idpk = "inner".cadastralunitidfk) > -> Index Scan using cadastralunits_pkey on cadastralunits > (cost=0.00..314.72 rows=13027 width=31) > (actual time=0.457..0.552 rows=63 loops=1) > -> Sort (cost=9868.84..9884.75 rows=6364 width=788) > (actual time=9981.405..10013.708 rows=6364 loops=1) notice that the index scan is expected to return 13027 rows, but actually returns 63. this might influence the a choice of plan. gnari
Hi John, thank you for your response. John Arbash Meinel wrote: > You really need to post the original query, so we can see *why* postgres > thinks it needs to run the plan this way. Here it is: SELECT AdDevicesSites.IDPK, AdDevicesSites.AdDevicesSiteSizeIDFK, AdDevicesSites.AdDevicesSiteRegionIDFK, AdDevicesSites.AdDevicesSiteCountyIDFK, AdDevicesSites.AdDevicesSiteCityIDFK, AdDevicesSites.AdDevicesSiteDistrictIDFK, AdDevicesSites.AdDevicesSiteStreetIDFK, AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK, AdDevicesSites.AdDevicesSitePositionIDFK, AdDevicesSites.AdDevicesSiteVisibilityIDFK, AdDevicesSites.AdDevicesSiteStatusTypeIDFK, AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK, AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK, AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK, AdDevicesSites.AdDevicesSitePartnerStickerIDFK, AdDevicesSites.CadastralUnitIDFK, AdDevicesSites.MediaType, AdDevicesSites.Mark, AdDevicesSites.Amount, AdDevicesSites.Distance, AdDevicesSites.OwnLightening, AdDevicesSites.LocationDownTown, AdDevicesSites.LocationSuburb, AdDevicesSites.LocationBusinessDistrict, AdDevicesSites.LocationResidentialDistrict, AdDevicesSites.LocationIndustrialDistrict, AdDevicesSites.LocationNoBuildings, AdDevicesSites.ParkWayHighWay, AdDevicesSites.ParkWayFirstClassRoad, AdDevicesSites.ParkWayOtherRoad, AdDevicesSites.ParkWayStreet, AdDevicesSites.ParkWayAccess, AdDevicesSites.ParkWayExit, AdDevicesSites.ParkWayParkingPlace, AdDevicesSites.ParkWayPassangersOnly, AdDevicesSites.ParkWayCrossRoad, AdDevicesSites.PositionStandAlone, AdDevicesSites.NeighbourhoodPublicTransportation, AdDevicesSites.NeighbourhoodInterCityTransportation, AdDevicesSites.NeighbourhoodPostOffice, AdDevicesSites.NeighbourhoodNewsStand, AdDevicesSites.NeighbourhoodAmenities, AdDevicesSites.NeighbourhoodSportsSpot, AdDevicesSites.NeighbourhoodHealthServiceSpot, AdDevicesSites.NeighbourhoodShops, AdDevicesSites.NeighbourhoodShoppingCenter, AdDevicesSites.NeighbourhoodSuperMarket, AdDevicesSites.NeighbourhoodPetrolStation, AdDevicesSites.NeighbourhoodSchool, AdDevicesSites.NeighbourhoodBank, AdDevicesSites.NeighbourhoodRestaurant, AdDevicesSites.NeighbourhoodHotel, AdDevicesSites.RestrictionCigarettes, AdDevicesSites.RestrictionPolitics, AdDevicesSites.RestrictionSpirits, AdDevicesSites.RestrictionSex, AdDevicesSites.RestrictionOther, AdDevicesSites.RestrictionNote, AdDevicesSites.SpotMapFile, AdDevicesSites.SpotPhotoFile, AdDevicesSites.SourcePhotoTimeStamp, AdDevicesSites.SourceMapTimeStamp, AdDevicesSites.Price, AdDevicesSites.WebPrice, AdDevicesSites.CadastralUnitCode, AdDevicesSites.BuildingNumber, AdDevicesSites.ParcelNumber, AdDevicesSites.GPSLatitude, AdDevicesSites.GPSLongitude, AdDevicesSites.GPSHeight, AdDevicesSites.MechanicalOpticalCoordinates, AdDevicesSites.Deleted, AdDevicesSites.Protected, AdDevicesSites.DateCreated, AdDevicesSites.DateLastModified, AdDevicesSites.DateDeleted, AdDevicesSites.CreatedByUserIDFK, AdDevicesSites.LastModifiedByUserIDFK, AdDevicesSites.DeletedByUserIDFK, AdDevicesSites.PhotoLastModificationDate, AdDevicesSites.MapLastModificationDate, AdDevicesSites.DateLastImported, AdDevicesSiteRegions.Name AS AdDevicesSiteRegionName, AdDevicesSiteCounties.Name AS AdDevicesSiteCountyName, AdDevicesSiteCities.Name AS AdDevicesSiteCityName, AdDevicesSiteStreets.Name AS AdDevicesSiteStreetName, AdDevicesSiteDistricts.Name AS AdDevicesSiteDistrictName, AdDevicesSiteStreetDescriptions.Name_cs AS AdDevicesSiteStreetDescriptionName_cs, AdDevicesSiteStreetDescriptions.Name_en AS AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3) AS AdDevicesSiteVisibilityName_cs, SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS PartnerElectricitySupplierName, PartnersMaintainer.Name AS PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName, CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS CadastralUnitName FROM AdDevicesSites LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK = AdDevicesSiteRegions.IDPK LEFT JOIN AdDevicesSiteCounties ON AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK = AdDevicesSiteCities.IDPK LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK = AdDevicesSiteStreets.IDPK LEFT JOIN AdDevicesSiteStreetDescriptions ON AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK = AdDevicesSiteStreetDescriptions.IDPK LEFT JOIN AdDevicesSiteDistricts ON AdDevicesSites.AdDevicesSiteDistrictIDFK = AdDevicesSiteDistricts.IDPK LEFT JOIN AdDevicesSiteSizes ON AdDevicesSites.AdDevicesSiteSizeIDFK = AdDevicesSiteSizes.IDPK LEFT JOIN AdDevicesSiteVisibilities ON AdDevicesSites.AdDevicesSiteVisibilityIDFK = AdDevicesSiteVisibilities.IDPK LEFT JOIN AdDevicesSitePositions ON AdDevicesSites.AdDevicesSitePositionIDFK = AdDevicesSitePositions.IDPK LEFT JOIN AdDevicesSiteStatusTypes ON AdDevicesSites.AdDevicesSiteStatusTypeIDFK = AdDevicesSiteStatusTypes.IDPK LEFT JOIN PartnerIdentifications AS PartnerIdentificationsOperator ON AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK = PartnerIdentificationsOperator.IDPK LEFT JOIN Partners AS PartnersElectricitySupplier ON AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK = PartnersElectricitySupplier.IDPK LEFT JOIN Partners AS PartnersMaintainer ON AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK = PartnersMaintainer.IDPK LEFT JOIN Partners AS PartnersSticker ON AdDevicesSites.AdDevicesSitePartnerStickerIDFK = PartnersSticker.IDPK LEFT JOIN CadastralUnits ON AdDevicesSites.CadastralUnitIDFK = CadastralUnits.IDPK > Also, the final sort actually isn't that expensive. > > When you have the numbers (cost=xxx..yyy) the xxx is the time when the > step can start, and the yyy is the time when the step can finish. For a > lot of steps, it can start running while the sub-steps are still feeding > back more data, for others, it has to wait for the sub-steps to finish. This is thi bit of information I didn't find in the documentation and were looking for. Thank you for the enlightening :-) With this knowledge I can see that the JOINs are the bottleneck. > The first thing to look for, is to make sure the estimated number of > rows is close to the actual number of rows. If they are off, then > postgres may be mis-estimating the optimal plan. (If postgres thinks it > is going to only need 10 rows, it may use an index scan, but when 1000 > rows are returned, a seq scan might have been faster.) The "row=" numbers are equal to those of the total count of items in that tables (generated by VACUUM ANALYZE). > You seem to be doing a lot of outer joins. Is that necessary? These external tables contain information that are a unique parameter of the AdDevice (like Position, Region, County, City etc.), in some containing localized description of the property attribute. Some of them could be moved into the main table but that would create a redundancy, some of them cannot be moved into the main table (like information about Partners which is definitely another object with respect to AdDevices). I think the names of the tables are self-explanatory so it should be clear what each table stores. Is this design incorrect? In fact, we only need about 30 records at a time but LIMIT can speed-up the query only when looking for the first 30 records. Setting OFFSET slows the query down. > I don't > really know what you are looking for, but you are joining against enough > tables, that I think this query is always going to be slow. In MySQL the query was not so slow and I don't see any reason why there should be large differences in SELECT speed. But if the design of the tables is incorrect, we will correct it. > From what I can tell, you have 1 table which has 6364 rows, and you are > grabbing all of those rows, and then outer joining it with about 11 > other tables. Here are the exact numbers: AdDevicesSites - 6364 AdDevicesSiteRegions - 15 AdDevicesSiteCounties - 110 AdDevicesSiteCities - 124 AdDevicesSiteStreets - 2858 AdDevicesSiteStreetDescriptions - 4585 AdDevicesSiteDistricts - 344 AdDevicesSiteSizes - 110 AdDevicesSiteVisibilities - 4 AdDevicesSitePositions - 3 AdDevicesSiteStatusTypes - 5 PartnerIdentifications - 61 Partners - 61 CadastralUnits - 13027 > I would actually guess that the most expensive parts of the plan are the > NESTED LOOPS which when they go to materialize have to do a sequential > scan, and they get executed 6364 times. It looks like the other tables > are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan, > the problem is that because you are doing it 6k times, it ends up taking > about 300ms of your time. > > You could try setting "set enable_nestloop to off". > I don't know that it will be faster, but it could be. I have tried that and it resulted in about 2 sec slowdown :-( > In general, though, it seems like you should be asking a different > question, rather than trying to optimize the query that you have. You mean "how should I improve the design to make the query faster"? > Can you post the original SQL statement, and maybe describe what you are > trying to do? I hope the explanation above is clear and sufficient :-) > > John > =:-> >
Attachment
Hi Ragnar, Ragnar Hafstað wrote: >[snip output of EXPLAIN ANALYZE] > >for those of us who have not yet reached the level where one can >infer it from the query plan, how abour showing us the actual >query too ? > > I thought it will be sufficient to show me where the main bottleneck is. And in fact, the query is rather lengthy. But I have included it in the response to John. So sorry for the incompletness. >but as an example of what to look for, consider the first few lines >(reformatted): > > >>Merge Right Join (cost=9868.84..9997.74 rows=6364 width=815) >> (actual time=9982.022..10801.216 rows=6364 loops=1) >> Merge Cond: ("outer".idpk = "inner".cadastralunitidfk) >> -> Index Scan using cadastralunits_pkey on cadastralunits >> (cost=0.00..314.72 rows=13027 width=31) >> (actual time=0.457..0.552 rows=63 loops=1) >> -> Sort (cost=9868.84..9884.75 rows=6364 width=788) >> (actual time=9981.405..10013.708 rows=6364 loops=1) >> >> >notice that the index scan is expected to return 13027 rows, but >actually returns 63. this might influence the a choice of plan. > > Yes, the situation in this scenario is that the table of CadastralUnits contains all units from country but the AdDevices in this case are only from the 63 CadastralUnits. So the result - 63 rows - is just this little subset. Up to that, not all AdDevices have CadastralUnitIDFK set to an IDPK that exists in CadastralUnits but to zero (= no CadastralUnit set). >gnari > > Miroslav Šulc
Attachment
Miroslav Šulc wrote: > Hi John, > > thank you for your response. > How about a quick side track. Have you played around with your shared_buffers, maintenance_work_mem, and work_mem settings? What version of postgres are you using? The above names changed in 8.0, and 8.0 also has some perfomance improvements over the 7.4 series. What is your hardware? Are you testing this while there is load on the system, or under no load. Are you re-running the query multiple times, and reporting the later speeds, or just the first time? (If nothing is loaded into memory, the first run is easily 10x slower than later ones.) Just some background info. If you have set these to reasonable values, we probably don't need to spend much time here, but it's just one of those things to check. John =:->
Attachment
Miroslav Šulc wrote: > Hi John, > > thank you for your response. > I will comment on things separately. > John Arbash Meinel wrote: > ... > These external tables contain information that are a unique parameter > of the AdDevice (like Position, Region, County, City etc.), in some > containing localized description of the property attribute. Some of > them could be moved into the main table but that would create a > redundancy, some of them cannot be moved into the main table (like > information about Partners which is definitely another object with > respect to AdDevices). I think the names of the tables are > self-explanatory so it should be clear what each table stores. Is this > design incorrect? > It's actually more of a question as to why you are doing left outer joins, rather than simple joins. Are the tables not fully populated? If so, why not? How are you using this information? Why is it useful to get back rows that don't have all of their information filled out? Why is it useful to have so many columns returned? It seems like it most cases, you are only going to be able to use *some* of the information, why not create more queries that are specialized, rather than one get everything query. > In fact, we only need about 30 records at a time but LIMIT can > speed-up the query only when looking for the first 30 records. Setting > OFFSET slows the query down. > Have you thought about using a cursor instead of using limit + offset? This may not help the overall time, but it might let you split up when the time is spent. BEGIN; DECLARE <cursor_name> CURSOR FOR SELECT ... FROM ...; FETCH FORWARD 30 FROM <cursor_name>; FETCH FORWARD 30 FROM <cursor_name>; ... END; >> I don't >> really know what you are looking for, but you are joining against enough >> tables, that I think this query is always going to be slow. > > > In MySQL the query was not so slow and I don't see any reason why > there should be large differences in SELECT speed. But if the design > of the tables is incorrect, we will correct it. > In the other post I asked about your postgres settings. The defaults are pretty stingy, so that *might* be an issue. >> From what I can tell, you have 1 table which has 6364 rows, and you are >> grabbing all of those rows, and then outer joining it with about 11 >> other tables. > > > Here are the exact numbers: > > AdDevicesSites - 6364 > AdDevicesSiteRegions - 15 > AdDevicesSiteCounties - 110 > AdDevicesSiteCities - 124 > AdDevicesSiteStreets - 2858 > AdDevicesSiteStreetDescriptions - 4585 > AdDevicesSiteDistricts - 344 > AdDevicesSiteSizes - 110 > AdDevicesSiteVisibilities - 4 > AdDevicesSitePositions - 3 > AdDevicesSiteStatusTypes - 5 > PartnerIdentifications - 61 > Partners - 61 > CadastralUnits - 13027 > And if I understand correctly, you consider all of these to be outer joins. Meaning you want *all* of AdDevicesSites, and whatever info goes along with it, but there are no restrictions as to what rows you want. You want everything you can get. Do you actually need *everything*? You mention only needing 30, what for? >> I would actually guess that the most expensive parts of the plan are the >> NESTED LOOPS which when they go to materialize have to do a sequential >> scan, and they get executed 6364 times. It looks like the other tables >> are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan, >> the problem is that because you are doing it 6k times, it ends up taking >> about 300ms of your time. >> >> You could try setting "set enable_nestloop to off". >> I don't know that it will be faster, but it could be. > > > I have tried that and it resulted in about 2 sec slowdown :-( Generally, the optimizer *does* select the best query plan. As long as it has accurate statistics, which it seems to in this case. > >> In general, though, it seems like you should be asking a different >> question, rather than trying to optimize the query that you have. > > > You mean "how should I improve the design to make the query faster"? > There is one possibility if we don't find anything nicer. Which is to create a lazy materialized view. Basically, you run this query, and store it in a table. Then when you want to do the SELECT, you just do that against the unrolled table. You can then create triggers, etc to keep the data up to date. Here is a good documentation of it: http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html It is basically a way that you can un-normalize data, in a safe way. Also, another thing that you can do, is instead of using a cursor, you can create a temporary table with the results of the query, and create a primary key which is just a simple counter. Then instead of doing limit + offset, you can select * where id > 0 and id < 30; ... select * where id > 30 and id < 60; etc. It still requires the original query to be run, though, so it is not necessarily optimal for you. >> Can you post the original SQL statement, and maybe describe what you are >> trying to do? > > > I hope the explanation above is clear and sufficient :-) > >> >> John >> =:-> > Unfortunately, I don't really see any obvious problems with your query in the way that you are using it. The problem is that you are not applying any selectivity, so postgres has to go to all the tables, and get all the rows, and then try to logically merge them together. It is doing a hash merge, which is generally one of the faster ones and it seems to be doing the right thing. I would be curious to see how mysql was handling this query, to see if there was something different it was trying to do. I'm also curious how much of a difference there was. John =:->
Attachment
John Arbash Meinel wrote: > How about a quick side track. > Have you played around with your shared_buffers, maintenance_work_mem, > and work_mem settings? I have tried to set shared_buffers to 48000 now but no speedup (11,098.813 ms third try). The others are still default. I'll see documentation and will play with the other parameters. > What version of postgres are you using? 8.0.1 > The above names changed in 8.0, > and 8.0 also has some perfomance improvements over the 7.4 series. > > What is your hardware? My dev notebook Acer TravelMate 292LMi $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 9 model name : Intel(R) Pentium(R) M processor 1500MHz stepping : 5 cpu MHz : 1495.485 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 tm pbe est tm2 bogomips : 2957.31 $ cat /proc/meminfo MemTotal: 516136 kB MemFree: 18024 kB Buffers: 21156 kB Cached: 188868 kB SwapCached: 24 kB Active: 345596 kB Inactive: 119344 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 516136 kB LowFree: 18024 kB SwapTotal: 1004020 kB SwapFree: 1003996 kB Dirty: 4 kB Writeback: 0 kB Mapped: 343676 kB Slab: 18148 kB CommitLimit: 1262088 kB Committed_AS: 951536 kB PageTables: 2376 kB VmallocTotal: 516056 kB VmallocUsed: 90528 kB VmallocChunk: 424912 kB IDE disc. # hdparm -Tt /dev/hda /dev/hda: Timing cached reads: 1740 MB in 2.00 seconds = 870.13 MB/sec Timing buffered disk reads: 40 MB in 3.30 seconds = 12.10 MB/sec > Are you testing this while there is load on the > system, or under no load. The load is low. This is few seconds after I have run the EXPLAIN ANALYZE. # cat /proc/loadavg 0.31 0.51 0.33 1/112 6909 > Are you re-running the query multiple times, and reporting the later > speeds, or just the first time? (If nothing is loaded into memory, the > first run is easily 10x slower than later ones.) The times changes only little. First run was about 13 sec, second about 10 sec, third about 11 sec etc. > Just some background info. If you have set these to reasonable values, > we probably don't need to spend much time here, but it's just one of > those things to check. Sure you are right. I'll try the other parameters. > > John > =:-> > Miroslav
Attachment
John Arbash Meinel wrote: > It's actually more of a question as to why you are doing left outer > joins, rather than simple joins. > Are the tables not fully populated? If so, why not? Some records do not consist of full information (they are collected from different sources which use different approach to the data collection) so using INNER JOIN would cause some records wouldn't be displayed which is unacceptable. > How are you using this information? Why is it useful to get back rows > that don't have all of their information filled out? Each row contains main information which are important. The other information are also important but may be missing. Information are display on lists of 30 rows or on a card. When using filter the query is much faster but the case without filter has these results. > Why is it useful to have so many columns returned? It seems like it most > cases, you are only going to be able to use *some* of the information, > why not create more queries that are specialized, rather than one get > everything query. Many of the columns are just varchar(1) (because of the migration from MySQL enum field type) so the record is not so long as it could seem. These fields are just switches (Y(es) or N(o)). The problem is users can define their own templates and in different scenarios there might be displayed different information so reducing the number of fields would mean in some cases it wouldn't work as expected. But if we couldn't speed the query up, we will try to improve it other way. Is there any serious reason not to use so much fields except memory usage? It seems to me that it shouldn't have a great impact on the speed in this case. > Have you thought about using a cursor instead of using limit + offset? > This may not help the overall time, but it might let you split up when > the time is spent. > ...... No. I come from MySQL world where these things are not common (at least when using MyISAM databases). The other reason (if I understand it well) is that the retrieval of the packages of 30 records is not sequential. Our app is web based and we use paging. User can select page 1 and then page 10, then go backward to page 9 etc. > And if I understand correctly, you consider all of these to be outer > joins. Meaning you want *all* of AdDevicesSites, and whatever info goes > along with it, but there are no restrictions as to what rows you want. > You want everything you can get. > > Do you actually need *everything*? You mention only needing 30, what for? For display of single page consisting of 30 rows. The reason I query all rows is that this is one of the filters users can use. User can display just bigboards or billboards (or specify more advanced filters) but he/she can also display AdDevices without any filter (page by page). Before I select the 30 row, I need to order them by a key and after that select the records, so this is also the reason why to ask for all rows. The key for sorting might be different for each run. > There is one possibility if we don't find anything nicer. Which is to > create a lazy materialized view. Basically, you run this query, and > store it in a table. Then when you want to do the SELECT, you just do > that against the unrolled table. > You can then create triggers, etc to keep the data up to date. > Here is a good documentation of it: > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html > > It is basically a way that you can un-normalize data, in a safe way. > > Also, another thing that you can do, is instead of using a cursor, you > can create a temporary table with the results of the query, and create a > primary key which is just a simple counter. Then instead of doing limit > + offset, you can select * where id > 0 and id < 30; ... select * where > id > 30 and id < 60; etc. > > It still requires the original query to be run, though, so it is not > necessarily optimal for you. These might be the other steps in case we cannot speed-up the query. I would prefer to speed the query up :-) > Unfortunately, I don't really see any obvious problems with your query > in the way that you are using it. The problem is that you are not > applying any selectivity, so postgres has to go to all the tables, and > get all the rows, and then try to logically merge them together. It is > doing a hash merge, which is generally one of the faster ones and it > seems to be doing the right thing. > > I would be curious to see how mysql was handling this query, to see if > there was something different it was trying to do. I'm also curious how > much of a difference there was. In fact, on MySQL I didn't see any slow reactions so I didn't measure and inspect it. But I can try it if I figure out how to copy the database from PostgreSQL to MySQL. > > John > =:-> > Thank you for your time and help. Miroslav
Attachment
John Arbash Meinel <john@arbash-meinel.com> writes: > How about a quick side track. > Have you played around with your shared_buffers, maintenance_work_mem, > and work_mem settings? Indeed. The hash joins seem unreasonably slow considering how little data they are processing (unless this is being run on some ancient toaster...). One thought that comes to mind is that work_mem may be set so small that the hashes are forced into multiple batches. Another question worth asking is what are the data types of the columns being joined on. If they are character types, what locale and encoding is the database using? > Are you re-running the query multiple times, and reporting the later > speeds, or just the first time? (If nothing is loaded into memory, the > first run is easily 10x slower than later ones.) That cost would be paid during the bottom-level scans though. The thing that strikes me here is that nearly all of the cost is being spent joining. > What version of postgres are you using? And what's the platform (hardware and OS)? regards, tom lane
Miroslav Šulc wrote: > John Arbash Meinel wrote: ... > Many of the columns are just varchar(1) (because of the migration from > MySQL enum field type) so the record is not so long as it could seem. > These fields are just switches (Y(es) or N(o)). The problem is users > can define their own templates and in different scenarios there might > be displayed different information so reducing the number of fields > would mean in some cases it wouldn't work as expected. But if we > couldn't speed the query up, we will try to improve it other way. > Is there any serious reason not to use so much fields except memory > usage? It seems to me that it shouldn't have a great impact on the > speed in this case. Is there a reason to use varchar(1) instead of char(1). There probably is 0 performance difference, I'm just curious. > >> Have you thought about using a cursor instead of using limit + offset? >> This may not help the overall time, but it might let you split up when >> the time is spent. >> ...... > > > No. I come from MySQL world where these things are not common (at > least when using MyISAM databases). The other reason (if I understand > it well) is that the retrieval of the packages of 30 records is not > sequential. Our app is web based and we use paging. User can select > page 1 and then page 10, then go backward to page 9 etc. > Well, with cursors you can also do "FETCH ABSOLUTE 1 FROM <cursor_name>", which sets the cursor position, and then you can "FETCH FORWARD 30". I honestly don't know how the performance will be, but it is something that you could try. >> And if I understand correctly, you consider all of these to be outer >> joins. Meaning you want *all* of AdDevicesSites, and whatever info goes >> along with it, but there are no restrictions as to what rows you want. >> You want everything you can get. >> >> Do you actually need *everything*? You mention only needing 30, what >> for? > > > For display of single page consisting of 30 rows. The reason I query > all rows is that this is one of the filters users can use. User can > display just bigboards or billboards (or specify more advanced > filters) but he/she can also display AdDevices without any filter > (page by page). Before I select the 30 row, I need to order them by a > key and after that select the records, so this is also the reason why > to ask for all rows. The key for sorting might be different for each run. > How are you caching the information in the background in order to support paging? Since you aren't using limit/offset, and you don't seem to be creating a temporary table, I assume you have a layer inbetween the web server and the database (or possibly inside the webserver) which keeps track of current session information. Is that true? > These might be the other steps in case we cannot speed-up the query. I > would prefer to speed the query up :-) Naturally fast query comes first. I just have the feeling it is either a postgres configuration problem, or an intrinsic problem to postgres. Given your constraints, there's not much that we can change about the query itself. > In fact, on MySQL I didn't see any slow reactions so I didn't measure > and inspect it. But I can try it if I figure out how to copy the > database from PostgreSQL to MySQL. I figured you still had a copy of the MySQL around to compare to. You probably don't need to spend too much time on it yet. John =:->
Attachment
Tom Lane wrote: >John Arbash Meinel <john@arbash-meinel.com> writes: > > >>How about a quick side track. >>Have you played around with your shared_buffers, maintenance_work_mem, >>and work_mem settings? >> >> > >Indeed. The hash joins seem unreasonably slow considering how little >data they are processing (unless this is being run on some ancient >toaster...). One thought that comes to mind is that work_mem may be >set so small that the hashes are forced into multiple batches. > > I've just tried to uncomment the settings for these parameters with with no impact on the query speed. shared_buffers = 48000 # min 16, at least max_connections*2, 8KB each work_mem = 1024 # min 64, size in KB maintenance_work_mem = 16384 # min 1024, size in KB max_stack_depth = 2048 # min 100, size in KB >Another question worth asking is what are the data types of the columns >being joined on. If they are character types, what locale and encoding >is the database using? > > I have checked this and there are some JOINs smallint against integer. Is that problem? I would use smallint for IDPKs of some smaller tables but the lack of SMALLSERIAL and my laziness made me use SERIAL instead which is integer. >That cost would be paid during the bottom-level scans though. The thing >that strikes me here is that nearly all of the cost is being spent >joining. > > >>What version of postgres are you using? >> >> > >And what's the platform (hardware and OS)? > > I've already posted the hardware info. OS is Linux (Gentoo) with kernel 2.6.11. > regards, tom lane > > Miroslav
Attachment
John Arbash Meinel wrote: > Is there a reason to use varchar(1) instead of char(1). There probably > is 0 performance difference, I'm just curious. No, not at all. I'm just not used to char(). > Well, with cursors you can also do "FETCH ABSOLUTE 1 FROM > <cursor_name>", which sets the cursor position, and then you can "FETCH > FORWARD 30". > I honestly don't know how the performance will be, but it is something > that you could try. This is science for me at this moment :-) >> For display of single page consisting of 30 rows. The reason I query >> all rows is that this is one of the filters users can use. User can >> display just bigboards or billboards (or specify more advanced >> filters) but he/she can also display AdDevices without any filter >> (page by page). Before I select the 30 row, I need to order them by a >> key and after that select the records, so this is also the reason why >> to ask for all rows. The key for sorting might be different for each >> run. >> > How are you caching the information in the background in order to > support paging? Since you aren't using limit/offset, and you don't seem > to be creating a temporary table, I assume you have a layer inbetween > the web server and the database (or possibly inside the webserver) which > keeps track of current session information. Is that true? I just need three information: 1) used filter (stored in session, identified by filter index in query string) 2) page length (static predefined) 3) what page to display (in query string) >> In fact, on MySQL I didn't see any slow reactions so I didn't measure >> and inspect it. But I can try it if I figure out how to copy the >> database from PostgreSQL to MySQL. > > > I figured you still had a copy of the MySQL around to compare to. You > probably don't need to spend too much time on it yet. It's not so simple because there are some differences between MySQL and PostgreSQL in how they handle case sensitivity etc. The database table structures are not the same too because of different data types support and data values support. > > John > =:-> Miroslav
Attachment
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > I've just tried to uncomment the settings for these parameters with with > no impact on the query speed. > shared_buffers = 48000 # min 16, at least max_connections*2, > 8KB each > work_mem = 1024 # min 64, size in KB > maintenance_work_mem = 16384 # min 1024, size in KB > max_stack_depth = 2048 # min 100, size in KB Hmm. Given the small size of the auxiliary tables, you'd think they'd fit in 1MB work_mem no problem. But try bumping work_mem up to 10MB just to see if it makes a difference. (BTW, you do know that altering the .conf file doesn't in itself do anything? You have to SIGHUP the postmaster to make it notice the change ... and for certain parameters such as shared_buffers, you actually have to stop and restart the postmaster. You can use the SHOW command to verify whether a change has taken effect.) > I have checked this and there are some JOINs smallint against integer. > Is that problem? That probably explains why some of the joins are merges instead of hashes --- hash join doesn't work across datatypes. Doesn't seem like it should be a huge problem though. I was more concerned about the possibility of slow locale-dependent string comparisons. regards, tom lane
Tom Lane wrote: >=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > > >>shared_buffers = 48000 # min 16, at least max_connections*2, >>8KB each >>work_mem = 1024 # min 64, size in KB >>maintenance_work_mem = 16384 # min 1024, size in KB >>max_stack_depth = 2048 # min 100, size in KB >> >> > >Hmm. Given the small size of the auxiliary tables, you'd think they'd >fit in 1MB work_mem no problem. But try bumping work_mem up to 10MB >just to see if it makes a difference. (BTW, you do know that altering >the .conf file doesn't in itself do anything? You have to SIGHUP the >postmaster to make it notice the change ... and for certain parameters >such as shared_buffers, you actually have to stop and restart the >postmaster. You can use the SHOW command to verify whether a change >has taken effect.) > > I've tried to set work_mem to 10240, restarted postmaster and tried the EXPLAIN ANALYZE but there is only cca 200 ms speedup. >>I have checked this and there are some JOINs smallint against integer. >>Is that problem? >> >> >That probably explains why some of the joins are merges instead of >hashes --- hash join doesn't work across datatypes. Doesn't seem like >it should be a huge problem though. I was more concerned about the >possibility of slow locale-dependent string comparisons. > > There are only JOINs number against number. I've tried to change one of the fields from smallint to integer but there was no speedup. > regards, tom lane > > Miroslav
Attachment
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > There are only JOINs number against number. Hmph. There's no reason I can see that hash joins should be as slow as they seem to be in your test. Is the data confidential? If you'd be willing to send me a pg_dump off-list, I'd like to replicate this test and try to see where the time is going. regards, tom lane
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: >> Is the data confidential? If you'd be willing to send me a pg_dump >> off-list, I'd like to replicate this test and try to see where the time >> is going. >> > Thank you very much for your offer. The data are partially confidental > so I hashed some of the text information and changed some values (not > the values for the JOINs) so I could send it to you. I've checked the > EXPLAIN ANALYZE if anything changed and the result is merely the same > (maybe cca 1 sec slower - maybe because the hash caused the text data to > be longer). No problem; thank you for supplying the test case. What I find is rather surprising: most of the runtime can be blamed on disassembling and reassembling tuples during the join steps. Here are the hot spots according to gprof: ----------------------------------------------- 1.27 7.38 8277/103737 ExecScan [16] 2.93 17.02 19092/103737 ExecNestLoop <cycle 2> [14] 3.91 22.70 25456/103737 ExecMergeJoin <cycle 2> [13] 7.81 45.40 50912/103737 ExecHashJoin <cycle 2> [12] [9] 86.3 15.92 92.50 103737 ExecProject [9] 7.65 76.45 8809835/9143692 ExecEvalVar [10] 3.42 4.57 103737/103775 heap_formtuple [17] 0.03 0.24 12726/143737 ExecMakeFunctionResultNoSets [24] 0.02 0.12 103737/290777 ExecStoreTuple [44] 0.01 0.00 2/2 ExecEvalFunc [372] 0.00 0.00 2/22 ExecMakeFunctionResult [166] ----------------------------------------------- 0.00 0.00 42/9143692 ExecEvalFuncArgs [555] 0.05 0.51 59067/9143692 ExecHashGetHashValue [32] 0.24 2.38 274748/9143692 ExecMakeFunctionResultNoSets [24] 7.65 76.45 8809835/9143692 ExecProject [9] [10] 69.5 7.94 79.34 9143692 ExecEvalVar [10] 79.34 0.00 8750101/9175517 nocachegetattr [11] ----------------------------------------------- I think the reason this is popping to the top of the runtime is that the joins are so wide (an average of ~85 columns in a join tuple according to the numbers above). Because there are lots of variable-width columns involved, most of the time the fast path for field access doesn't apply and we end up going to nocachegetattr --- which itself is going to be slow because it has to scan over so many columns. So the cost is roughly O(N^2) in the number of columns. As a short-term hack, you might be able to improve matters if you can reorder your LEFT JOINs to have the minimum number of columns propagating up from the earlier join steps. In other words make the later joins add more columns than the earlier, as much as you can. This is actually good news, because before 8.0 we had much worse problems than this with extremely wide tuples --- there were O(N^2) behaviors all over the place due to the old List algorithms. Neil Conway's rewrite of the List code got rid of those problems, and now we can see the places that are left to optimize. The fact that there seems to be just one is very nice indeed. Since ExecProject operations within a nest of joins are going to be dealing entirely with Vars, I wonder if we couldn't speed matters up by having a short-circuit case for a projection that is only Vars. Essentially it would be a lot like execJunk.c, except able to cope with two input tuples. Using heap_deformtuple instead of retail extraction of fields would eliminate the O(N^2) penalty for wide tuples. regards, tom lane
I wrote: > Since ExecProject operations within a nest of joins are going to be > dealing entirely with Vars, I wonder if we couldn't speed matters up > by having a short-circuit case for a projection that is only Vars. > Essentially it would be a lot like execJunk.c, except able to cope > with two input tuples. Using heap_deformtuple instead of retail > extraction of fields would eliminate the O(N^2) penalty for wide tuples. Actually, we already had a pending patch (from Atsushi Ogawa) that eliminates that particular O(N^2) behavior in another way. After applying it, I get about a factor-of-4 reduction in the runtime for Miroslav's example. ExecEvalVar and associated routines are still a pretty good fraction of the runtime, so it might still be worth doing something like the above, but it'd probably be just a marginal win instead of a big win. regards, tom lane
Tom Lane wrote: >... >I think the reason this is popping to the top of the runtime is that the >joins are so wide (an average of ~85 columns in a join tuple according >to the numbers above). Because there are lots of variable-width columns >involved, most of the time the fast path for field access doesn't apply >and we end up going to nocachegetattr --- which itself is going to be >slow because it has to scan over so many columns. So the cost is >roughly O(N^2) in the number of columns. > > As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't be helpful to change them to char(1)? Would it solve the variable-width problem at least for some fields and speed the query up? >As a short-term hack, you might be able to improve matters if you can >reorder your LEFT JOINs to have the minimum number of columns >propagating up from the earlier join steps. In other words make the >later joins add more columns than the earlier, as much as you can. > > That will be hard as the main table which contains most of the fields is LEFT JOINed with the others. I'll look at it if I find some way to improve it. I'm not sure whether I understand the process of performing the plan but I imagine that the data from AdDevicesSites are retrieved only once when they are loaded and maybe stored in memory. Are the columns stored in the order they are in the SQL command? If so, wouldn't it be useful to move all varchar fields at the end of the SELECT query? I'm just guessing because I don't know at all how a database server is implemented and what it really does. >.. > regards, tom lane > > Miroslav
Attachment
Tom Lane wrote: >I wrote: > > >>Since ExecProject operations within a nest of joins are going to be >>dealing entirely with Vars, I wonder if we couldn't speed matters up >>by having a short-circuit case for a projection that is only Vars. >>Essentially it would be a lot like execJunk.c, except able to cope >>with two input tuples. Using heap_deformtuple instead of retail >>extraction of fields would eliminate the O(N^2) penalty for wide tuples. >> >> > >Actually, we already had a pending patch (from Atsushi Ogawa) that >eliminates that particular O(N^2) behavior in another way. After >applying it, I get about a factor-of-4 reduction in the runtime for >Miroslav's example. > > Is there a chance we will see this patch in the 8.0.2 release? And when can we expect this release? >ExecEvalVar and associated routines are still a pretty good fraction of >the runtime, so it might still be worth doing something like the above, >but it'd probably be just a marginal win instead of a big win. > > regards, tom lane > > Miroslav
John Arbash Meinel wrote: >> In fact, on MySQL I didn't see any slow reactions so I didn't measure >> and inspect it. But I can try it if I figure out how to copy the >> database from PostgreSQL to MySQL. > > > I figured you still had a copy of the MySQL around to compare to. You > probably don't need to spend too much time on it yet. So I have some results. I have tested the query on both PostgreSQL 8.0.1 and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL result is 11,667.916 ms, MySQL result is 448.4 ms. Both databases are running on the same machine (my laptop) and contain the same data. However there are some differences in the data table definitions: 1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I use 'enum' 2) in PostgreSQL in some cases I use connection fields that are not of the same type (smallint <-> integer (SERIAL)), in MySQL I use the same types > > John > =:-> Miroslav
Attachment
> 1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I > use 'enum' > 2) in PostgreSQL in some cases I use connection fields that are not of > the same type (smallint <-> integer (SERIAL)), in MySQL I use the same > types Well both those things will make PostgreSQL slower... Chris
Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL or an integer. Your query seems of the form : SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY sort_key LIMIT N OFFSET M; I would suggest to rewrite it in a simpler way : instead of generating the whole result set, sorting it, and then grabbing a slice, generate only the ror id's, grab a slice, and then generate the full rows from that. - If you order by a field which is in main_table : SELECT FROM main_table LEFT JOIN a lot of tables WHERE main_table.id IN (SELECT id FROM main_table ORDER BY sort_key LIMIT N OFFSET M ) ORDER BY sort_key LIMIT N OFFSET M; - If you order by a field in one of the child tables, I guess you only want to display the rows in the main table which have this field, ie. not-null in the LEFT JOIN. You can also use the principle above. - You can use a straight join instead of an IN. On Mon, 14 Mar 2005 09:58:49 +0100, Miroslav Šulc <miroslav.sulc@startnet.cz> wrote: > John Arbash Meinel wrote: > >>> In fact, on MySQL I didn't see any slow reactions so I didn't measure >>> and inspect it. But I can try it if I figure out how to copy the >>> database from PostgreSQL to MySQL. >> >> >> I figured you still had a copy of the MySQL around to compare to. You >> probably don't need to spend too much time on it yet. > > So I have some results. I have tested the query on both PostgreSQL 8.0.1 > and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL > result is 11,667.916 ms, MySQL result is 448.4 ms. > > Both databases are running on the same machine (my laptop) and contain > the same data. However there are some differences in the data table > definitions: > 1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I > use 'enum' > 2) in PostgreSQL in some cases I use connection fields that are not of > the same type (smallint <-> integer (SERIAL)), in MySQL I use the same > types > >> >> John >> =:-> > > Miroslav
Christopher Kings-Lynne wrote: >> 1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL >> I use 'enum' >> 2) in PostgreSQL in some cases I use connection fields that are not >> of the same type (smallint <-> integer (SERIAL)), in MySQL I use the >> same types > > > Well both those things will make PostgreSQL slower... I think so. I'll change the varchar(1) fields to char(1) where possible and will think out what I will do with the smallint <-> integer JOINs. Something like SMALLSERIAL would be pleasant :-) I thought I will wait for Tom Lane's reaction to my improvement suggestions I have posted in other mail but maybe he has a deep night because of different time zone. > > Chris Miroslav
Attachment
PFC wrote: > > Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL > or an integer. Sure I could. The problem is our project still supports both MySQL and PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of changes in the code if we would change to the BOOL data type. > Your query seems of the form : > > SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY sort_key > LIMIT N OFFSET M; > > I would suggest to rewrite it in a simpler way : instead of > generating the whole result set, sorting it, and then grabbing a > slice, generate only the ror id's, grab a slice, and then generate > the full rows from that. > > - If you order by a field which is in main_table : > SELECT FROM main_table LEFT JOIN a lot of tables WHERE > main_table.id IN (SELECT id FROM main_table ORDER BY sort_key LIMIT N > OFFSET M > ) ORDER BY sort_key LIMIT N OFFSET M; > > - If you order by a field in one of the child tables, I guess you > only want to display the rows in the main table which have this > field, ie. not-null in the LEFT JOIN. You can also use the principle > above. > > - You can use a straight join instead of an IN. Do you mean something like this? SELECT Table.IDPK, Table2.varchar1, Table2.varchar2, ... FROM Table LEFT JOIN many tables INNER JOIN Table AS Table2 Miroslav
Attachment
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't > be helpful to change them to char(1)? Would it solve the variable-width > problem at least for some fields and speed the query up? No, because char(1) isn't physically fixed-width (consider multibyte encodings). There's really no advantage to char(N) in Postgres. I don't know what you're doing with those fields, but if they are effectively booleans or small codes you might be able to convert them to bool or int fields. There is also the "char" datatype (not to be confused with char(1)) which can hold single ASCII characters, but is nonstandard and a bit impoverished as to functionality. However, I doubt this is worth pursuing. One of the things I tested yesterday was a quick hack to organize the storage of intermediate join tuples with fixed-width fields first and non-fixed ones later. It really didn't help much at all :-(. I think the trouble with your example is that in the existing code, the really fast path applies only when the tuple contains no nulls --- and since you're doing all that left joining, there's frequently at least one null lurking. regards, tom lane
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > Tom Lane wrote: >> Actually, we already had a pending patch (from Atsushi Ogawa) that >> eliminates that particular O(N^2) behavior in another way. After >> applying it, I get about a factor-of-4 reduction in the runtime for >> Miroslav's example. >> > Is there a chance we will see this patch in the 8.0.2 release? No. We are not in the habit of making non-bug-fix changes in stable branches. Ogawa's patch is in CVS for 8.1. regards, tom lane
In article <423556B8.4020500@startnet.cz>, =?ISO-8859-15?Q?Miroslav_=A6ulc?= <miroslav.sulc@startnet.cz> writes: >> Instead of a varchar(1) containing 'y' or 'n' you could use a >> BOOL or an integer. > Sure I could. The problem is our project still supports both MySQL and > PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of > changes in the code if we would change to the BOOL data type. Since BOOL is exactly what you want to express and since MySQL also supports BOOL (*), you should make that change anyway. (*) MySQL recognizes BOOL as a column type and silently uses TINYINT(1) instead.
Tom Lane wrote: >=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > > >>As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't >>be helpful to change them to char(1)? Would it solve the variable-width >>problem at least for some fields and speed the query up? >> >> > >No, because char(1) isn't physically fixed-width (consider multibyte >encodings). There's really no advantage to char(N) in Postgres. > > I was aware of that :-( >I don't know what you're doing with those fields, but if they are >effectively booleans or small codes you might be able to convert them to >bool or int fields. There is also the "char" datatype (not to be >confused with char(1)) which can hold single ASCII characters, but is >nonstandard and a bit impoverished as to functionality. > > The problem lies in migration from MySQL to PostgreSQL. In MySQL we (badly) choose enum for yes/no switches (there's nothing like boolean field type in MySQL as I know but we could use tinyint). It will be very time consuming to rewrite all such enums and check the code whether it works. >However, I doubt this is worth pursuing. One of the things I tested >yesterday was a quick hack to organize the storage of intermediate join >tuples with fixed-width fields first and non-fixed ones later. It >really didn't help much at all :-(. I think the trouble with your >example is that in the existing code, the really fast path applies only >when the tuple contains no nulls --- and since you're doing all that >left joining, there's frequently at least one null lurking. > > Unfortunatelly I don't see any other way than LEFT JOINing in this case. > regards, tom lane > > Miroslav
Attachment
Harald Fuchs wrote: >>Sure I could. The problem is our project still supports both MySQL and >>PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of >>changes in the code if we would change to the BOOL data type. >> >> > >Since BOOL is exactly what you want to express and since MySQL also >supports BOOL (*), you should make that change anyway. > > I know that. The time will have to come. >(*) MySQL recognizes BOOL as a column type and silently uses >TINYINT(1) instead. > > I've checked that and you are right, but the BOOL is in MySQL from version 4.1.0 though we could use tinyint instead of enum - our bad choice. Miroslav
Attachment
Miroslav Šulc wrote: > Tom Lane wrote: > >> ... >> I think the reason this is popping to the top of the runtime is that the >> joins are so wide (an average of ~85 columns in a join tuple according >> to the numbers above). Because there are lots of variable-width columns >> involved, most of the time the fast path for field access doesn't apply >> and we end up going to nocachegetattr --- which itself is going to be >> slow because it has to scan over so many columns. So the cost is >> roughly O(N^2) in the number of columns. >> >> > As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't > be helpful to change them to char(1)? Would it solve the > variable-width problem at least for some fields and speed the query up? > I'm guessing there really wouldn't be a difference. I think varchar() and char() are stored the same way, just one always has space padding. I believe they are both varlena types, so they are still "variable" length. >> As a short-term hack, you might be able to improve matters if you can >> reorder your LEFT JOINs to have the minimum number of columns >> propagating up from the earlier join steps. In other words make the >> later joins add more columns than the earlier, as much as you can. >> >> > That will be hard as the main table which contains most of the fields > is LEFT JOINed with the others. I'll look at it if I find some way to > improve it. One thing that you could try, is to select just the primary keys from the main table, and then later on, join back to that table to get the rest of the columns. It is a little bit hackish, but if it makes your query faster, you might want to try it. > > I'm not sure whether I understand the process of performing the plan > but I imagine that the data from AdDevicesSites are retrieved only > once when they are loaded and maybe stored in memory. Are the columns > stored in the order they are in the SQL command? If so, wouldn't it be > useful to move all varchar fields at the end of the SELECT query? I'm > just guessing because I don't know at all how a database server is > implemented and what it really does. > I don't think they are stored in the order of the SELECT <> portion. I'm guessing they are loaded and saved as you go. But that the order of the LEFT JOIN at the end is probably important. >> .. >> regards, tom lane >> >> > Miroslav John =:->
Attachment
Miroslav ©ulc <miroslav.sulc@startnet.cz> writes: > I think so. I'll change the varchar(1) fields to char(1) where possible char isn't faster than varchar on postgres. If anything it may be slightly slower because every comparison first needs to pad both sides with spaces. -- greg
=?ISO-8859-15?Q?Miroslav_=A6ulc?= <miroslav.sulc@startnet.cz> writes: > PFC wrote: >> Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL >> or an integer. > Sure I could. The problem is our project still supports both MySQL and > PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of > changes in the code if we would change to the BOOL data type. Just FYI, I did a quick search-and-replace on your dump to replace varchar(1) by "char", which makes the column fixed-width without any change in the visible data. This made hardly any difference in the join speed though :-(. So that is looking like a dead end. John's idea about re-joining to the main table to pick up the bulk of its fields only after joining to the sub-tables might work. regards, tom lane
Tom Lane wrote: >Just FYI, I did a quick search-and-replace on your dump to replace >varchar(1) by "char", which makes the column fixed-width without any >change in the visible data. This made hardly any difference in the >join speed though :-(. So that is looking like a dead end. > > I'll try to change the data type to bool but the problem I stand in front of is that the code expects that SELECTs return 'Y' or 'N' but what I have found out till now is that PostgreSQL returns 't' or 'f' for bool data. I think about some solution but they use CPU :-( >John's idea about re-joining to the main table to pick up the bulk of >its fields only after joining to the sub-tables might work. > > I'll try that. It seems it could work. > regards, tom lane > > Miroslav
Attachment
Miroslav Šulc wrote: > PFC wrote: > >> Your query seems of the form : >> >> SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY >> sort_key LIMIT N OFFSET M; >> >> I would suggest to rewrite it in a simpler way : instead of >> generating the whole result set, sorting it, and then grabbing a >> slice, generate only the ror id's, grab a slice, and then generate >> the full rows from that. >> >> - If you order by a field which is in main_table : >> SELECT FROM main_table LEFT JOIN a lot of tables WHERE >> main_table.id IN (SELECT id FROM main_table ORDER BY sort_key LIMIT >> N OFFSET M >> ) ORDER BY sort_key LIMIT N OFFSET M; >> >> - If you order by a field in one of the child tables, I guess you >> only want to display the rows in the main table which have this >> field, ie. not-null in the LEFT JOIN. You can also use the principle >> above. >> >> - You can use a straight join instead of an IN. > > > Do you mean something like this? > > SELECT Table.IDPK, Table2.varchar1, Table2.varchar2, ... > FROM Table > LEFT JOIN many tables > INNER JOIN Table AS Table2 > > Miroslav I would also recommend using the subselect format. Where any columns that you are going to need to sort on show up in the subselect. So you would have: SELECT ... FROM main_table LEFT JOIN tablea ON ... LEFT JOIN tableb ON ... ... JOIN other_table ON ... WHERE main_table.idpk IN (SELECT idpk FROM main_table JOIN other_table ON main_table.idpk = other_table.<main_idpk> WHERE ... ORDER BY other_table.abcd LIMIT n OFFSET m) ; I think the final LIMIT + OFFSET would give you the wrong results, since you have already filtered out the important rows. I also think you don't need the final order by, since the results should already be in sorted order. Now this also assumes that if someone is sorting on a row, then they don't want null entries. If they do, then you can change the subselect into a left join. But with appropriate selectivity and indexes, an inner join can filter out a lot of rows, and give you better performance. The inner subselect gives you selectivity on the main table, so that you don't have to deal with all the columns in the search, and then you don't have to deal with all the rows later on. I think you can also do this: SELECT ... FROM (SELECT main_table.idpk, other_table.<columns> FROM main_table JOIN other_table ....) as p LEFT JOIN ... JOIN main_table ON main_table.idpk = p.idpk; In that case instead of selecting out the id and putting that into the where, you put it in the from, and then join against it. I don't really know which is better. John =:->
Attachment
Hi,
I have an idea about your problem. Will it be difficult not to change the entire code but only the queries? You can change type in the Postgres to bool. Then, when select data you can use a CASE..WHEN to return 'Y' or 'N' or even write a little function which accepts bool and returns 'Y' or 'N'. In this case in all your queries you will have to replace the select of bool field with select form the function.
Kaloyan
Miroslav Љulc wrote:
I have an idea about your problem. Will it be difficult not to change the entire code but only the queries? You can change type in the Postgres to bool. Then, when select data you can use a CASE..WHEN to return 'Y' or 'N' or even write a little function which accepts bool and returns 'Y' or 'N'. In this case in all your queries you will have to replace the select of bool field with select form the function.
Kaloyan
Miroslav Љulc wrote:
Tom Lane wrote:Just FYI, I did a quick search-and-replace on your dump to replaceI'll try to change the data type to bool but the problem I stand in front of is that the code expects that SELECTs return 'Y' or 'N' but what I have found out till now is that PostgreSQL returns 't' or 'f' for bool data. I think about some solution but they use CPU :-(
varchar(1) by "char", which makes the column fixed-width without any
change in the visible data. This made hardly any difference in the
join speed though :-(. So that is looking like a dead end.
John's idea about re-joining to the main table to pick up the bulk ofI'll try that. It seems it could work.
its fields only after joining to the sub-tables might work.
regards, tom laneMiroslav
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Kaloyan Iliev Iliev wrote: > Hi, > > I have an idea about your problem. Will it be difficult not to change > the entire code but only the queries? You can change type in the > Postgres to bool. Then, when select data you can use a CASE..WHEN to > return 'Y' or 'N' or even write a little function which accepts bool > and returns 'Y' or 'N'. In this case in all your queries you will have > to replace the select of bool field with select form the function. Thank you for your suggestion. I had a private message exchange with Harald Fuchs who suggested the same (except the function). Here is what whe "exchanged": Harald Fuchs wrote: > If you can control the SELECTs, just use > > SELECT CASE col WHEN true THEN 'Y' ELSE 'N' END > > instead of > > SELECT col > > Thus you wouldn't need to change your application code. > > I use single SELECT for both PostgreSQL and MySQL. I could use your solution. It would just require some tagging of bool fields in SELECTs so I could add the CASE statement in case I use PostgreSQL backend. Miroslav
Attachment
=?ISO-8859-2?Q?Miroslav_=A9ulc?= <miroslav.sulc@startnet.cz> writes: > [ concerning a deeply nested LEFT JOIN to get data from a star schema ] > So I have some results. I have tested the query on both PostgreSQL 8.0.1 > and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL > result is 11,667.916 ms, MySQL result is 448.4 ms. That's a fairly impressive discrepancy :-(, and even the slot_getattr() patch that Atsushi Ogawa provided isn't going to close the gap. (I got about a 4x speedup on Miroslav's example in my testing, which leaves us still maybe 6x slower than MySQL.) Looking at the post-patch profile for the test case, there is still quite a lot of cycles going into tuple assembly and disassembly: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 24.47 4.49 4.49 _mcount 8.01 5.96 1.47 9143692 0.00 0.00 ExecEvalVar 6.92 7.23 1.27 6614373 0.00 0.00 slot_deformtuple 6.54 8.43 1.20 9143692 0.00 0.00 slot_getattr 6.21 9.57 1.14 103737 0.01 0.03 ExecTargetList 5.56 10.59 1.02 103775 0.01 0.01 DataFill 3.22 11.18 0.59 103775 0.01 0.01 ComputeDataSize 2.83 11.70 0.52 ExecEvalVar 2.72 12.20 0.50 9094122 0.00 0.00 memcpy 2.51 12.66 0.46 encore 2.40 13.10 0.44 427448 0.00 0.00 nocachegetattr 2.13 13.49 0.39 103775 0.00 0.02 heap_formtuple 2.07 13.87 0.38 noshlibs 1.20 14.09 0.22 225329 0.00 0.00 _doprnt 1.20 14.31 0.22 msquadloop 1.14 14.52 0.21 chunks 0.98 14.70 0.18 871885 0.00 0.00 AllocSetAlloc 0.98 14.88 0.18 $$dyncall 0.76 15.02 0.14 594242 0.00 0.00 FunctionCall3 0.71 15.15 0.13 213312 0.00 0.00 comparetup_heap 0.65 15.27 0.12 6364 0.02 0.13 printtup 0.60 15.38 0.11 790702 0.00 0.00 pfree (_mcount is profiling overhead, ignore it.) It looks to me like just about everything in the top dozen functions is there as a result of the fact that join steps form new tuples that are the merge of their input tuples. Even our favorite villains, palloc and pfree, are down in the sub-percent range. I am guessing that the reason MySQL wins on this is that they avoid doing any data copying during a join step. I wonder whether we could accomplish the same by taking Ogawa's patch to the next level: allow a TupleTableSlot to contain either a "materialized" tuple as now, or a "virtual" tuple that is simply an array of Datums and null flags. (It's virtual in the sense that any pass-by-reference Datums would have to be pointing to data at the next level down.) This would essentially turn the formtuple and deformtuple operations into no-ops, and get rid of a lot of the associated overhead such as ComputeDataSize and DataFill. The only operations that would have to forcibly materialize a tuple would be ones that need to keep the tuple till after they fetch their next input tuple --- hashing and sorting are examples, but very many plan node types don't ever need to do that. I haven't worked out the details, but it seems likely that this could be a relatively nonintrusive patch. The main thing that would be an issue would be that direct reference to slot->val would become verboten (since you could no longer be sure there was a materialized tuple there). I think this would possibly affect some contrib stuff, which is a strong hint that it'd break some existing user-written code out there. Thoughts? regards, tom lane
Tom Lane wrote: >>So I have some results. I have tested the query on both PostgreSQL 8.0.1 >>and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL >>result is 11,667.916 ms, MySQL result is 448.4 ms. >> >> > >That's a fairly impressive discrepancy :-(, and even the slot_getattr() >patch that Atsushi Ogawa provided isn't going to close the gap. >(I got about a 4x speedup on Miroslav's example in my testing, which >leaves us still maybe 6x slower than MySQL.) > > As I wrote, the comparison is not "fair". Here are the conditions: "Both databases are running on the same machine (my laptop) and contain the same data. However there are some differences in the data table definitions: 1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I use 'enum' 2) in PostgreSQL in some cases I use connection fields that are not of the same type (smallint <-> integer (SERIAL)), in MySQL I use the same types" For those not used to MySQL, enum is an integer "mapped" to a text string (that's how I see it). That means that when you have field such as enum('yes','no','DK'), in the table there are stored numbers 1, 2 and 3 which are mapped to the text values 'yes', 'no' and 'DK'. The description is not accurate (I'm not MySQL programmer, I didn't check it recently and I didn't inspect the code - I wouldn't understand it either) but I think it's not that important. What is important is the fact that MySQL has to work with some dozen fields that are numbers and PostgreSQL has to work with the same fields as varchar(). Some of the other fields are also varchars. This might (or might not) cause the speed difference. However I think that if devs figure out how to speed this up, other cases will benefit from the improvement too. As I understood from the contributions of other, the 2) shouldn't have a great impact on the speed. >Looking at the post-patch profile for the test case, there is still >quite a lot of cycles going into tuple assembly and disassembly: > >Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 24.47 4.49 4.49 _mcount > 8.01 5.96 1.47 9143692 0.00 0.00 ExecEvalVar > 6.92 7.23 1.27 6614373 0.00 0.00 slot_deformtuple > 6.54 8.43 1.20 9143692 0.00 0.00 slot_getattr > 6.21 9.57 1.14 103737 0.01 0.03 ExecTargetList > 5.56 10.59 1.02 103775 0.01 0.01 DataFill > 3.22 11.18 0.59 103775 0.01 0.01 ComputeDataSize > 2.83 11.70 0.52 ExecEvalVar > 2.72 12.20 0.50 9094122 0.00 0.00 memcpy > 2.51 12.66 0.46 encore > 2.40 13.10 0.44 427448 0.00 0.00 nocachegetattr > 2.13 13.49 0.39 103775 0.00 0.02 heap_formtuple > 2.07 13.87 0.38 noshlibs > 1.20 14.09 0.22 225329 0.00 0.00 _doprnt > 1.20 14.31 0.22 msquadloop > 1.14 14.52 0.21 chunks > 0.98 14.70 0.18 871885 0.00 0.00 AllocSetAlloc > 0.98 14.88 0.18 $$dyncall > 0.76 15.02 0.14 594242 0.00 0.00 FunctionCall3 > 0.71 15.15 0.13 213312 0.00 0.00 comparetup_heap > 0.65 15.27 0.12 6364 0.02 0.13 printtup > 0.60 15.38 0.11 790702 0.00 0.00 pfree > >(_mcount is profiling overhead, ignore it.) It looks to me like just >about everything in the top dozen functions is there as a result of the >fact that join steps form new tuples that are the merge of their input >tuples. Even our favorite villains, palloc and pfree, are down in the >sub-percent range. > >I am guessing that the reason MySQL wins on this is that they avoid >doing any data copying during a join step. I wonder whether we could >accomplish the same by taking Ogawa's patch to the next level: allow >a TupleTableSlot to contain either a "materialized" tuple as now, >or a "virtual" tuple that is simply an array of Datums and null flags. >(It's virtual in the sense that any pass-by-reference Datums would have >to be pointing to data at the next level down.) This would essentially >turn the formtuple and deformtuple operations into no-ops, and get rid >of a lot of the associated overhead such as ComputeDataSize and >DataFill. The only operations that would have to forcibly materialize >a tuple would be ones that need to keep the tuple till after they fetch >their next input tuple --- hashing and sorting are examples, but very >many plan node types don't ever need to do that. > >I haven't worked out the details, but it seems likely that this could be >a relatively nonintrusive patch. The main thing that would be an issue >would be that direct reference to slot->val would become verboten (since >you could no longer be sure there was a materialized tuple there). >I think this would possibly affect some contrib stuff, which is a strong >hint that it'd break some existing user-written code out there. > >Thoughts? > > Mine thought is that I don't know what you are talking about :-) Now seriously, I am far below this level of knowledge. But I can contribute a test that (maybe) can help. I have rewritten the query so it JOINs the varchar() fields (in fact all fields except the IDPK) at the last INNER JOIN. Though there is one more JOIN, the query is more than 5 times faster (1975.312 ms) :-) So my silly opinion is that if the planner could decide that there are too much time expensive fields that are not needed during performing JOINs and these could be JOINed at the last step, it would do it this way :-) Below is the adjusted query and the EXPLAIN ANALYZE output. (Tom, you can run it on the data I have sent you and it should run without changes.) SELECT AdDevicesSites.IDPK, AdDevicesSites2.AdDevicesSiteSizeIDFK, AdDevicesSites2.AdDevicesSiteRegionIDFK, AdDevicesSites2.AdDevicesSiteCountyIDFK, AdDevicesSites2.AdDevicesSiteCityIDFK, AdDevicesSites2.AdDevicesSiteDistrictIDFK, AdDevicesSites2.AdDevicesSiteStreetIDFK, AdDevicesSites2.AdDevicesSiteStreetDescriptionIDFK, AdDevicesSites2.AdDevicesSitePositionIDFK, AdDevicesSites2.AdDevicesSiteVisibilityIDFK, AdDevicesSites2.AdDevicesSiteStatusTypeIDFK, AdDevicesSites2.AdDevicesSitePartnerIdentificationOperatorIDFK, AdDevicesSites2.AdDevicesSitePartnerElectricitySupplierIDFK, AdDevicesSites2.AdDevicesSitePartnerMaintainerIDFK, AdDevicesSites2.AdDevicesSitePartnerStickerIDFK, AdDevicesSites2.CadastralUnitIDFK, AdDevicesSites2.MediaType, AdDevicesSites2.Mark, AdDevicesSites2.Amount, AdDevicesSites2.Distance, AdDevicesSites2.OwnLightening, AdDevicesSites2.LocationDownTown, AdDevicesSites2.LocationSuburb, AdDevicesSites2.LocationBusinessDistrict, AdDevicesSites2.LocationResidentialDistrict, AdDevicesSites2.LocationIndustrialDistrict, AdDevicesSites2.LocationNoBuildings, AdDevicesSites2.ParkWayHighWay, AdDevicesSites2.ParkWayFirstClassRoad, AdDevicesSites2.ParkWayOtherRoad, AdDevicesSites2.ParkWayStreet, AdDevicesSites2.ParkWayAccess, AdDevicesSites2.ParkWayExit, AdDevicesSites2.ParkWayParkingPlace, AdDevicesSites2.ParkWayPassangersOnly, AdDevicesSites2.ParkWayCrossRoad, AdDevicesSites2.PositionStandAlone, AdDevicesSites2.NeighbourhoodPublicTransportation, AdDevicesSites2.NeighbourhoodInterCityTransportation, AdDevicesSites2.NeighbourhoodPostOffice, AdDevicesSites2.NeighbourhoodNewsStand, AdDevicesSites2.NeighbourhoodAmenities, AdDevicesSites2.NeighbourhoodSportsSpot, AdDevicesSites2.NeighbourhoodHealthServiceSpot, AdDevicesSites2.NeighbourhoodShops, AdDevicesSites2.NeighbourhoodShoppingCenter, AdDevicesSites2.NeighbourhoodSuperMarket, AdDevicesSites2.NeighbourhoodPetrolStation, AdDevicesSites2.NeighbourhoodSchool, AdDevicesSites2.NeighbourhoodBank, AdDevicesSites2.NeighbourhoodRestaurant, AdDevicesSites2.NeighbourhoodHotel, AdDevicesSites2.RestrictionCigarettes, AdDevicesSites2.RestrictionPolitics, AdDevicesSites2.RestrictionSpirits, AdDevicesSites2.RestrictionSex, AdDevicesSites2.RestrictionOther, AdDevicesSites2.RestrictionNote, AdDevicesSites2.SpotMapFile, AdDevicesSites2.SpotPhotoFile, AdDevicesSites2.SourcePhotoTimeStamp, AdDevicesSites2.SourceMapTimeStamp, AdDevicesSites2.Price, AdDevicesSites2.WebPrice, AdDevicesSites2.CadastralUnitCode, AdDevicesSites2.BuildingNumber, AdDevicesSites2.ParcelNumber, AdDevicesSites2.GPSLatitude, AdDevicesSites2.GPSLongitude, AdDevicesSites2.GPSHeight, AdDevicesSites2.MechanicalOpticalCoordinates, AdDevicesSites2.Deleted, AdDevicesSites2.Protected, AdDevicesSites2.DateCreated, AdDevicesSites2.DateLastModified, AdDevicesSites2.DateDeleted, AdDevicesSites2.CreatedByUserIDFK, AdDevicesSites2.LastModifiedByUserIDFK, AdDevicesSites2.DeletedByUserIDFK, AdDevicesSites2.PhotoLastModificationDate, AdDevicesSites2.MapLastModificationDate, AdDevicesSites2.DateLastImported, AdDevicesSiteRegions.Name AS AdDevicesSiteRegionName, AdDevicesSiteCounties.Name AS AdDevicesSiteCountyName, AdDevicesSiteCities.Name AS AdDevicesSiteCityName, AdDevicesSiteStreets.Name AS AdDevicesSiteStreetName, AdDevicesSiteDistricts.Name AS AdDevicesSiteDistrictName, AdDevicesSiteStreetDescriptions.Name_cs AS AdDevicesSiteStreetDescriptionName_cs, AdDevicesSiteStreetDescriptions.Name_en AS AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3) AS AdDevicesSiteVisibilityName_cs, SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS PartnerElectricitySupplierName, PartnersMaintainer.Name AS PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName, CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS CadastralUnitName FROM AdDevicesSites LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK = AdDevicesSiteRegions.IDPK LEFT JOIN AdDevicesSiteCounties ON AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK = AdDevicesSiteCities.IDPK LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK = AdDevicesSiteStreets.IDPK LEFT JOIN AdDevicesSiteStreetDescriptions ON AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK = AdDevicesSiteStreetDescriptions.IDPK LEFT JOIN AdDevicesSiteDistricts ON AdDevicesSites.AdDevicesSiteDistrictIDFK = AdDevicesSiteDistricts.IDPK LEFT JOIN AdDevicesSiteSizes ON AdDevicesSites.AdDevicesSiteSizeIDFK = AdDevicesSiteSizes.IDPK LEFT JOIN AdDevicesSiteVisibilities ON AdDevicesSites.AdDevicesSiteVisibilityIDFK = AdDevicesSiteVisibilities.IDPK LEFT JOIN AdDevicesSitePositions ON AdDevicesSites.AdDevicesSitePositionIDFK = AdDevicesSitePositions.IDPK LEFT JOIN AdDevicesSiteStatusTypes ON AdDevicesSites.AdDevicesSiteStatusTypeIDFK = AdDevicesSiteStatusTypes.IDPK LEFT JOIN PartnerIdentifications AS PartnerIdentificationsOperator ON AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK = PartnerIdentificationsOperator.IDPK LEFT JOIN Partners AS PartnersElectricitySupplier ON AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK = PartnersElectricitySupplier.IDPK LEFT JOIN Partners AS PartnersMaintainer ON AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK = PartnersMaintainer.IDPK LEFT JOIN Partners AS PartnersSticker ON AdDevicesSites.AdDevicesSitePartnerStickerIDFK = PartnersSticker.IDPK LEFT JOIN CadastralUnits ON AdDevicesSites.CadastralUnitIDFK = CadastralUnits.IDPK INNER JOIN AdDevicesSites AS AdDevicesSites2 ON AdDevicesSites.IDPK = AdDevicesSites2.IDPK QUERY PLAN Hash Join (cost=193867.68..235417.92 rows=142556 width=815) (actual time=1577.080..2200.677 rows=6364 loops=1) Hash Cond: ("outer".idpk = "inner".idpk) -> Seq Scan on addevicessites addevicessites2 (cost=0.00..13118.56 rows=142556 width=473) (actual time=40.650..49.195rows=6364 loops=1) -> Hash (cost=186898.29..186898.29 rows=142556 width=350) (actual time=1534.080..1534.080 rows=0 loops=1) -> Merge Right Join (cost=184758.38..186898.29 rows=142556 width=350) (actual time=1187.653..1244.955 rows=6364loops=1) Merge Cond: ("outer".idpk = "inner".cadastralunitidfk) -> Index Scan using cadastralunits_pkey on cadastralunits (cost=0.00..314.49 rows=13027 width=31) (actualtime=0.034..0.128 rows=63 loops=1) -> Sort (cost=184758.38..185114.77 rows=142556 width=325) (actual time=1187.582..1190.111 rows=6364 loops=1) Sort Key: addevicessites.cadastralunitidfk -> Hash Left Join (cost=93887.04..143074.76 rows=142556 width=325) (actual time=502.584..1129.145 rows=6364loops=1) Hash Cond: ("outer".addevicessitepartnerstickeridfk = "inner".idpk) -> Hash Left Join (cost=93884.28..140933.65 rows=142556 width=307) (actual time=502.388..1075.572rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartnermaintaineridfk = "inner".idpk) -> Hash Left Join (cost=93881.51..138792.55 rows=142556 width=289) (actual time=502.208..1023.802rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartnerelectricitysupplieridfk = "inner".idpk) -> Hash Left Join (cost=93878.75..136651.45 rows=142556 width=271) (actual time=502.041..969.959rows=6364 loops=1) Hash Cond: ("outer".addevicessitepartneridentificationoperatoridfk = "inner".idpk) -> Nested Loop Left Join (cost=93875.99..134510.35 rows=142556 width=253) (actualtime=501.849..915.256 rows=6364 loops=1) Join Filter: ("outer".addevicessitestatustypeidfk = "inner".idpk) -> Nested Loop Left Join (cost=93874.93..118471.74 rows=142556 width=228)(actual time=501.826..818.436 rows=6364 loops=1) Join Filter: ("outer".addevicessitepositionidfk = "inner".idpk) -> Nested Loop Left Join (cost=93873.90..108848.18 rows=142556width=207) (actual time=501.802..737.137 rows=6364 loops=1) Join Filter: ("outer".addevicessitevisibilityidfk = "inner".idpk) -> Merge Right Join (cost=93872.86..96017.09 rows=142556width=177) (actual time=501.741..554.834 rows=6364 loops=1) Merge Cond: ("outer".idpk = "inner".addevicessitesizeidfk) -> Index Scan using addevicessitesizes_pkey on addevicessitesizes (cost=0.00..5.62 rows=110 width=14) (actual time=0.009..0.264 rows=110 loops=1) -> Sort (cost=93872.86..94229.25 rows=142556 width=169)(actual time=501.697..504.267 rows=6364 loops=1) Sort Key: addevicessites.addevicessitesizeidfk -> Hash Left Join (cost=57752.91..65764.23 rows=142556width=169) (actual time=234.321..466.130 rows=6364 loops=1) Hash Cond: ("outer".addevicessitedistrictidfk= "inner".idpk) -> Hash Left Join (cost=57743.17..63616.15rows=142556 width=164) (actual time=233.456..421.267 rows=6364 loops=1) Hash Cond: ("outer".addevicessitestreetdescriptionidfk= "inner".idpk) -> Hash Left Join (cost=57634.86..62707.43rows=142556 width=137) (actual time=223.396..362.945 rows=6364 loops=1) Hash Cond: ("outer".addevicessitestreetidfk= "inner".idpk) -> Hash Left Join (cost=57566.95..61844.20rows=142556 width=119) (actual time=217.101..312.605 rows=6364 loops=1) Hash Cond: ("outer".addevicessitecityidfk= "inner".idpk) -> Merge Right Join (cost=57561.85..59700.76rows=142556 width=110) (actual time=216.635..266.672 rows=6364 loops=1) Merge Cond: ("outer".idpk= "inner".addevicessitecountyidfk) -> Sort (cost=6.19..6.48rows=117 width=17) (actual time=0.350..0.389 rows=116 loops=1) Sort Key: addevicessitecounties.idpk -> Seq Scanon addevicessitecounties (cost=0.00..2.17 rows=117 width=17) (actual time=0.008..0.146 rows=117 loops=1) -> Sort (cost=57555.66..57912.05rows=142556 width=99) (actual time=216.250..218.611 rows=6364 loops=1) Sort Key: addevicessites.addevicessitecountyidfk -> Merge RightJoin (cost=33573.63..35712.03 rows=142556 width=99) (actual time=173.646..199.036 rows=6364 loops=1) MergeCond: ("outer".idpk = "inner".addevicessiteregionidfk) -> Sort (cost=1.44..1.48 rows=15 width=23) (actual time=0.055..0.059 rows=13 loops=1) Sort Key: addevicessiteregions.idpk -> Seq Scan on addevicessiteregions (cost=0.00..1.15 rows=15 width=23) (actual time=0.016..0.032 rows=15 loops=1) -> Sort (cost=33572.19..33928.58 rows=142556 width=82) (actual time=173.559..176.398 rows=6364 loops=1) Sort Key: addevicessites.addevicessiteregionidfk -> Seq Scan on addevicessites (cost=0.00..13118.56 rows=142556 width=82) (actual time=62.345..164.783 rows=6364 loops=1) -> Hash (cost=4.48..4.48rows=248 width=17) (actual time=0.283..0.283 rows=0 loops=1) -> Seq Scan on addevicessitecities (cost=0.00..4.48 rows=248 width=17) (actual time=0.011..0.162 rows=138 loops=1) -> Hash (cost=60.53..60.53rows=2953 width=34) (actual time=6.229..6.229 rows=0 loops=1) -> Seq Scan on addevicessitestreets (cost=0.00..60.53 rows=2953 width=34) (actual time=0.010..3.816 rows=2984 loops=1) -> Hash (cost=96.85..96.85 rows=4585width=43) (actual time=10.017..10.017 rows=0 loops=1) -> Seq Scan on addevicessitestreetdescriptions (cost=0.00..96.85 rows=4585 width=43) (actual time=0.008..6.371 rows=4585 loops=1) -> Hash (cost=8.59..8.59 rows=459 width=21)(actual time=0.815..0.815 rows=0 loops=1) -> Seq Scan on addevicessitedistricts (cost=0.00..8.59 rows=459 width=21) (actual time=0.007..0.541 rows=382 loops=1) -> Materialize (cost=1.04..1.08 rows=4 width=36) (actualtime=0.000..0.002 rows=4 loops=6364) -> Seq Scan on addevicessitevisibilities (cost=0.00..1.04rows=4 width=36) (actual time=0.026..0.035 rows=4 loops=1) -> Materialize (cost=1.03..1.06 rows=3 width=27) (actual time=0.000..0.001rows=3 loops=6364) -> Seq Scan on addevicessitepositions (cost=0.00..1.03 rows=3width=27) (actual time=0.007..0.010 rows=3 loops=1) -> Materialize (cost=1.05..1.10 rows=5 width=31) (actual time=0.000..0.002rows=5 loops=6364) -> Seq Scan on addevicessitestatustypes (cost=0.00..1.05 rows=5width=31) (actual time=0.006..0.016 rows=5 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.144..0.144 rows=0loops=1) -> Seq Scan on partneridentifications partneridentificationsoperator (cost=0.00..2.61 rows=61 width=34) (actual time=0.007..0.103 rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.122..0.122 rows=0 loops=1) -> Seq Scan on partners partnerselectricitysupplier (cost=0.00..2.61 rows=61width=34) (actual time=0.004..0.072 rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.136..0.136 rows=0 loops=1) -> Seq Scan on partners partnersmaintainer (cost=0.00..2.61 rows=61 width=34) (actualtime=0.005..0.086 rows=61 loops=1) -> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.143..0.143 rows=0 loops=1) -> Seq Scan on partners partnerssticker (cost=0.00..2.61 rows=61 width=34) (actual time=0.009..0.098rows=61 loops=1) Total runtime: 2210.937 ms > regards, tom lane > > Miroslav
Attachment
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > seriously, I am far below this level of knowledge. But I can contribute > a test that (maybe) can help. I have rewritten the query so it JOINs the > varchar() fields (in fact all fields except the IDPK) at the last INNER > JOIN. Though there is one more JOIN, the query is more than 5 times > faster (1975.312 ms) :-) That confirms my thought that passing the data up through multiple levels of join is what's killing us. I'll work on a solution. This will of course be even less back-patchable to 8.0.* than Ogawa's work, but hopefully it will fix the issue for 8.1. regards, tom lane
Tom Lane wrote: >=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: > > >>seriously, I am far below this level of knowledge. But I can contribute >>a test that (maybe) can help. I have rewritten the query so it JOINs the >>varchar() fields (in fact all fields except the IDPK) at the last INNER >>JOIN. Though there is one more JOIN, the query is more than 5 times >>faster (1975.312 ms) :-) >> >> > >That confirms my thought that passing the data up through multiple >levels of join is what's killing us. I'll work on a solution. This >will of course be even less back-patchable to 8.0.* than Ogawa's work, >but hopefully it will fix the issue for 8.1. > > regards, tom lane > > Tom, thank you and the others for help. I'm glad that my problem can help PostgreSQL improve and that there are people like you that understand each little bit of the software :-) Miroslav
Attachment
I have asked him for the data and played with his queries, and obtained massive speedups with the following queries : http://boutiquenumerique.com/pf/miroslav/query.sql http://boutiquenumerique.com/pf/miroslav/query2.sql http://boutiquenumerique.com/pf/miroslav/materialize.sql Note that my optimized version of the Big Joins is not much faster that the materialized view without index (hash joins are damn fast in postgres) but of course using an index...
On my machine (Laptop with Pentium-M 1.6 GHz and 512MB DDR333) I get the following timings : Big Joins Query will all the fields and no order by (I just put a SELECT * in the first table) yielding about 6k rows : => 12136.338 ms Replacing the SELECT * from the table with many fields by just a SELECT of the foreign key columns : => 1874.612 ms I felt like playing a bit so I implemented a hash join in python (download the file, it works on Miroslav's data) : All timings do not include time to fetch the data from the database. Fetching all the tables takes about 1.1 secs. * With something that looks like the current implementation (copying tuples around) and fetching all the fields from the big table : => Fetching all the tables : 1.1 secs. => Joining : 4.3 secs * Fetching only the integer fields => Fetching all the tables : 0.4 secs. => Joining : 1.7 secs * A smarter join which copies nothing and updates the rows as they are processed, adding fields : => Fetching all the tables : 1.1 secs. => Joining : 0.4 secs With the just-in-time compiler activated, it goes down to about 0.25 seconds. First thing, this confirms what Tom said. It also means that doing this query in the application can be a lot faster than doing it in postgres including fetching all of the tables. There's a problem somewhere ! It should be the other way around ! The python mappings (dictionaries : { key : value } ) are optimized like crazy but they store column names for each row. And it's a dynamic script language ! Argh. Note : run the program like this : python test.py |less -S So that the time spent scrolling your terminal does not spoil the measurements. Download test program : http://boutiquenumerique.com/pf/miroslav/test.py