Re: Avoiding tuple construction/deconstruction during joining

From: Miroslav Šulc
Subject: Re: Avoiding tuple construction/deconstruction during joining
Date: ,
(view: Whole thread, Raw)
In response to: Avoiding tuple construction/deconstruction during joining  (Tom Lane)
Responses: Re: Avoiding tuple construction/deconstruction during joining  (Tom Lane)
List: pgsql-performance

Tree view

How to read query plan  (Miroslav Šulc, )
 Re: How to read query plan  (John Arbash Meinel, )
  Re: How to read query plan  (Miroslav Šulc, )
   Re: How to read query plan  (John Arbash Meinel, )
    Re: How to read query plan  (Miroslav Šulc, )
    Re: How to read query plan  (Tom Lane, )
     Re: How to read query plan  (Miroslav Šulc, )
      Re: How to read query plan  (Tom Lane, )
       Re: How to read query plan  (Miroslav Šulc, )
        Re: How to read query plan  (Tom Lane, )
   Re: How to read query plan  (John Arbash Meinel, )
    Re: How to read query plan  (Miroslav Šulc, )
     Re: How to read query plan  (John Arbash Meinel, )
      Re: How to read query plan  (Miroslav Šulc, )
      Re: How to read query plan  (Miroslav Šulc, )
       Re: How to read query plan  (Christopher Kings-Lynne, )
        Re: How to read query plan  (Miroslav Šulc, )
         Re: How to read query plan  (Greg Stark, )
       Re: How to read query plan  (PFC, )
        Re: How to read query plan  (Miroslav Šulc, )
         Re: How to read query plan  (Tom Lane, )
          Re: How to read query plan  (Miroslav Šulc, )
           Re: How to read query plan  (Kaloyan Iliev Iliev, )
            Re: How to read query plan  (Miroslav Šulc, )
         Re: How to read query plan  (John Arbash Meinel, )
 Re: How to read query plan  (Ragnar Hafstað, )
  Re: How to read query plan  (Miroslav Šulc, )
 Re: How to read query plan  (Tom Lane, )
  Re: [HACKERS] How to read query plan  (Tom Lane, )
   Re: [HACKERS] How to read query plan  (Miroslav Šulc, )
    Re: [HACKERS] How to read query plan  (Tom Lane, )
  Re: How to read query plan  (Miroslav Šulc, )
   Re: How to read query plan  (Tom Lane, )
    Re: How to read query plan  (Miroslav Šulc, )
   Re: How to read query plan  (John Arbash Meinel, )
 Re: How to read query plan  (Harald Fuchs, )
  Re: How to read query plan  (Miroslav Šulc, )
 Avoiding tuple construction/deconstruction during joining  (Tom Lane, )
  Re: Avoiding tuple construction/deconstruction during joining  (Miroslav Šulc, )
   Re: Avoiding tuple construction/deconstruction during joining  (Tom Lane, )
    Re: Avoiding tuple construction/deconstruction during joining  (Miroslav Šulc, )
  Re: Avoiding tuple construction/deconstruction during joining  (PFC, )
  Re: Avoiding tuple construction/deconstruction during joining  (PFC, )

Tom Lane wrote:

>>So I have some results. I have tested the query on both PostgreSQL 8.0.1
>>and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL
>>result is 11,667.916 ms, MySQL result is 448.4 ms.
>That's a fairly impressive discrepancy :-(, and even the slot_getattr()
>patch that Atsushi Ogawa provided isn't going to close the gap.
>(I got about a 4x speedup on Miroslav's example in my testing, which
>leaves us still maybe 6x slower than MySQL.)
As I wrote, the comparison is not "fair". Here are the conditions:

"Both databases are running on the same machine (my laptop) and contain
the same data. However there are some differences in the data table
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I
use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of
the same type (smallint <-> integer (SERIAL)), in MySQL I use the same

For those not used to MySQL, enum is an integer "mapped" to a text
string (that's how I see it). That means that when you have field such
as enum('yes','no','DK'), in the table there are stored numbers 1, 2 and
3 which are mapped to the text values 'yes', 'no' and 'DK'. The
description is not accurate (I'm not MySQL programmer, I didn't check it
recently and I didn't inspect the code - I wouldn't understand it
either) but I think it's not that important. What is important is the
fact that MySQL has to work with some dozen fields that are numbers and
PostgreSQL has to work with the same fields as varchar(). Some of the
other fields are also varchars. This might (or might not) cause the
speed difference. However I think that if devs figure out how to speed
this up, other cases will benefit from the improvement too.

As I understood from the contributions of other, the 2) shouldn't have a
great impact on the speed.

>Looking at the post-patch profile for the test case, there is still
>quite a lot of cycles going into tuple assembly and disassembly:
>Each sample counts as 0.01 seconds.
>  %   cumulative   self              self     total
> time   seconds   seconds    calls  ms/call  ms/call  name
> 24.47      4.49     4.49                             _mcount
>  8.01      5.96     1.47  9143692     0.00     0.00  ExecEvalVar
>  6.92      7.23     1.27  6614373     0.00     0.00  slot_deformtuple
>  6.54      8.43     1.20  9143692     0.00     0.00  slot_getattr
>  6.21      9.57     1.14   103737     0.01     0.03  ExecTargetList
>  5.56     10.59     1.02   103775     0.01     0.01  DataFill
>  3.22     11.18     0.59   103775     0.01     0.01  ComputeDataSize
>  2.83     11.70     0.52                             ExecEvalVar
>  2.72     12.20     0.50  9094122     0.00     0.00  memcpy
>  2.51     12.66     0.46                             encore
>  2.40     13.10     0.44   427448     0.00     0.00  nocachegetattr
>  2.13     13.49     0.39   103775     0.00     0.02  heap_formtuple
>  2.07     13.87     0.38                             noshlibs
>  1.20     14.09     0.22   225329     0.00     0.00  _doprnt
>  1.20     14.31     0.22                             msquadloop
>  1.14     14.52     0.21                             chunks
>  0.98     14.70     0.18   871885     0.00     0.00  AllocSetAlloc
>  0.98     14.88     0.18                             $$dyncall
>  0.76     15.02     0.14   594242     0.00     0.00  FunctionCall3
>  0.71     15.15     0.13   213312     0.00     0.00  comparetup_heap
>  0.65     15.27     0.12     6364     0.02     0.13  printtup
>  0.60     15.38     0.11   790702     0.00     0.00  pfree
>(_mcount is profiling overhead, ignore it.)  It looks to me like just
>about everything in the top dozen functions is there as a result of the
>fact that join steps form new tuples that are the merge of their input
>tuples.  Even our favorite villains, palloc and pfree, are down in the
>sub-percent range.
>I am guessing that the reason MySQL wins on this is that they avoid
>doing any data copying during a join step.  I wonder whether we could
>accomplish the same by taking Ogawa's patch to the next level: allow
>a TupleTableSlot to contain either a "materialized" tuple as now,
>or a "virtual" tuple that is simply an array of Datums and null flags.
>(It's virtual in the sense that any pass-by-reference Datums would have
>to be pointing to data at the next level down.)  This would essentially
>turn the formtuple and deformtuple operations into no-ops, and get rid
>of a lot of the associated overhead such as ComputeDataSize and
>DataFill.  The only operations that would have to forcibly materialize
>a tuple would be ones that need to keep the tuple till after they fetch
>their next input tuple --- hashing and sorting are examples, but very
>many plan node types don't ever need to do that.
>I haven't worked out the details, but it seems likely that this could be
>a relatively nonintrusive patch.  The main thing that would be an issue
>would be that direct reference to slot->val would become verboten (since
>you could no longer be sure there was a materialized tuple there).
>I think this would possibly affect some contrib stuff, which is a strong
>hint that it'd break some existing user-written code out there.
Mine thought is that I don't know what you are talking about :-) Now
seriously, I am far below this level of knowledge. But I can contribute
a test that (maybe) can help. I have rewritten the query so it JOINs the
varchar() fields (in fact all fields except the IDPK) at the last INNER
JOIN. Though there is one more JOIN, the query is more than 5 times
faster (1975.312 ms) :-)

So my silly opinion is that if the planner could decide that there are
too much time expensive fields that are not needed during performing
JOINs and these could be JOINed at the last step, it would do it this
way :-)

Below is the adjusted query and the EXPLAIN ANALYZE output. (Tom, you
can run it on the data I have sent you and it should run without changes.)

SELECT AdDevicesSites.IDPK, AdDevicesSites2.AdDevicesSiteSizeIDFK,
AdDevicesSites2.CadastralUnitIDFK, AdDevicesSites2.MediaType,
AdDevicesSites2.Mark, AdDevicesSites2.Amount, AdDevicesSites2.Distance,
AdDevicesSites2.OwnLightening, AdDevicesSites2.LocationDownTown,
AdDevicesSites2.LocationNoBuildings, AdDevicesSites2.ParkWayHighWay,
AdDevicesSites2.ParkWayFirstClassRoad, AdDevicesSites2.ParkWayOtherRoad,
AdDevicesSites2.ParkWayStreet, AdDevicesSites2.ParkWayAccess,
AdDevicesSites2.ParkWayExit, AdDevicesSites2.ParkWayParkingPlace,
AdDevicesSites2.ParkWayPassangersOnly, AdDevicesSites2.ParkWayCrossRoad,
AdDevicesSites2.NeighbourhoodSchool, AdDevicesSites2.NeighbourhoodBank,
AdDevicesSites2.RestrictionPolitics, AdDevicesSites2.RestrictionSpirits,
AdDevicesSites2.RestrictionSex, AdDevicesSites2.RestrictionOther,
AdDevicesSites2.RestrictionNote, AdDevicesSites2.SpotMapFile,
AdDevicesSites2.SpotPhotoFile, AdDevicesSites2.SourcePhotoTimeStamp,
AdDevicesSites2.SourceMapTimeStamp, AdDevicesSites2.Price,
AdDevicesSites2.WebPrice, AdDevicesSites2.CadastralUnitCode,
AdDevicesSites2.BuildingNumber, AdDevicesSites2.ParcelNumber,
AdDevicesSites2.GPSLatitude, AdDevicesSites2.GPSLongitude,
AdDevicesSites2.GPSHeight, AdDevicesSites2.MechanicalOpticalCoordinates,
AdDevicesSites2.Deleted, AdDevicesSites2.Protected,
AdDevicesSites2.DateCreated, AdDevicesSites2.DateLastModified,
AdDevicesSites2.DateDeleted, AdDevicesSites2.CreatedByUserIDFK,
AdDevicesSites2.DateLastImported, AdDevicesSiteRegions.Name AS
AdDevicesSiteRegionName, AdDevicesSiteCounties.Name AS
AdDevicesSiteCountyName, AdDevicesSiteCities.Name AS
AdDevicesSiteCityName, AdDevicesSiteStreets.Name AS
AdDevicesSiteStreetName, AdDevicesSiteDistricts.Name AS
AdDevicesSiteDistrictName, AdDevicesSiteStreetDescriptions.Name_cs AS
AdDevicesSiteStreetDescriptions.Name_en AS
AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS
AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3)
AS AdDevicesSiteVisibilityName_cs,
SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS
AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS
AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS
AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS
AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS
AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS
PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS
PartnerElectricitySupplierName, PartnersMaintainer.Name AS
PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName,
CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS
FROM AdDevicesSites
LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK
= AdDevicesSiteRegions.IDPK
LEFT JOIN AdDevicesSiteCounties ON
AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK
LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK =
LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK
= AdDevicesSiteStreets.IDPK
LEFT JOIN AdDevicesSiteStreetDescriptions ON
AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK =
LEFT JOIN AdDevicesSiteDistricts ON
AdDevicesSites.AdDevicesSiteDistrictIDFK = AdDevicesSiteDistricts.IDPK
LEFT JOIN AdDevicesSiteSizes ON AdDevicesSites.AdDevicesSiteSizeIDFK =
LEFT JOIN AdDevicesSiteVisibilities ON
AdDevicesSites.AdDevicesSiteVisibilityIDFK = AdDevicesSiteVisibilities.IDPK
LEFT JOIN AdDevicesSitePositions ON
AdDevicesSites.AdDevicesSitePositionIDFK = AdDevicesSitePositions.IDPK
LEFT JOIN AdDevicesSiteStatusTypes ON
AdDevicesSites.AdDevicesSiteStatusTypeIDFK = AdDevicesSiteStatusTypes.IDPK
LEFT JOIN PartnerIdentifications AS PartnerIdentificationsOperator ON
AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK =
LEFT JOIN Partners AS PartnersElectricitySupplier ON
AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK =
LEFT JOIN Partners AS PartnersMaintainer ON
AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK = PartnersMaintainer.IDPK
LEFT JOIN Partners AS PartnersSticker ON
AdDevicesSites.AdDevicesSitePartnerStickerIDFK = PartnersSticker.IDPK
LEFT JOIN CadastralUnits ON AdDevicesSites.CadastralUnitIDFK =
INNER JOIN AdDevicesSites AS AdDevicesSites2 ON AdDevicesSites.IDPK =


Hash Join  (cost=193867.68..235417.92 rows=142556 width=815) (actual time=1577.080..2200.677 rows=6364 loops=1)

  Hash Cond: ("outer".idpk = "inner".idpk)

  ->  Seq Scan on addevicessites addevicessites2  (cost=0.00..13118.56 rows=142556 width=473) (actual
time=40.650..49.195rows=6364 loops=1) 

  ->  Hash  (cost=186898.29..186898.29 rows=142556 width=350) (actual time=1534.080..1534.080 rows=0 loops=1)

        ->  Merge Right Join  (cost=184758.38..186898.29 rows=142556 width=350) (actual time=1187.653..1244.955

              Merge Cond: ("outer".idpk = "inner".cadastralunitidfk)

              ->  Index Scan using cadastralunits_pkey on cadastralunits  (cost=0.00..314.49 rows=13027 width=31)
(actualtime=0.034..0.128 rows=63 loops=1) 

              ->  Sort  (cost=184758.38..185114.77 rows=142556 width=325) (actual time=1187.582..1190.111 rows=6364

                    Sort Key: addevicessites.cadastralunitidfk

                    ->  Hash Left Join  (cost=93887.04..143074.76 rows=142556 width=325) (actual time=502.584..1129.145

                          Hash Cond: ("outer".addevicessitepartnerstickeridfk = "inner".idpk)

                          ->  Hash Left Join  (cost=93884.28..140933.65 rows=142556 width=307) (actual
time=502.388..1075.572rows=6364 loops=1) 

                                Hash Cond: ("outer".addevicessitepartnermaintaineridfk = "inner".idpk)

                                ->  Hash Left Join  (cost=93881.51..138792.55 rows=142556 width=289) (actual
time=502.208..1023.802rows=6364 loops=1) 

                                      Hash Cond: ("outer".addevicessitepartnerelectricitysupplieridfk = "inner".idpk)

                                      ->  Hash Left Join  (cost=93878.75..136651.45 rows=142556 width=271) (actual
time=502.041..969.959rows=6364 loops=1) 

                                            Hash Cond: ("outer".addevicessitepartneridentificationoperatoridfk =

                                            ->  Nested Loop Left Join  (cost=93875.99..134510.35 rows=142556 width=253)
(actualtime=501.849..915.256 rows=6364 loops=1) 

                                                  Join Filter: ("outer".addevicessitestatustypeidfk = "inner".idpk)

                                                  ->  Nested Loop Left Join  (cost=93874.93..118471.74 rows=142556
width=228)(actual time=501.826..818.436 rows=6364 loops=1) 

                                                        Join Filter: ("outer".addevicessitepositionidfk = "inner".idpk)

                                                        ->  Nested Loop Left Join  (cost=93873.90..108848.18
rows=142556width=207) (actual time=501.802..737.137 rows=6364 loops=1) 

                                                              Join Filter: ("outer".addevicessitevisibilityidfk =

                                                              ->  Merge Right Join  (cost=93872.86..96017.09
rows=142556width=177) (actual time=501.741..554.834 rows=6364 loops=1) 

                                                                    Merge Cond: ("outer".idpk =

                                                                    ->  Index Scan using addevicessitesizes_pkey on
addevicessitesizes (cost=0.00..5.62 rows=110 width=14) (actual time=0.009..0.264 rows=110 loops=1) 

                                                                    ->  Sort  (cost=93872.86..94229.25 rows=142556
width=169)(actual time=501.697..504.267 rows=6364 loops=1) 

                                                                          Sort Key:

                                                                          ->  Hash Left Join  (cost=57752.91..65764.23
rows=142556width=169) (actual time=234.321..466.130 rows=6364 loops=1) 

                                                                                Hash Cond:
("outer".addevicessitedistrictidfk= "inner".idpk) 

                                                                                ->  Hash Left Join
(cost=57743.17..63616.15rows=142556 width=164) (actual time=233.456..421.267 rows=6364 loops=1) 

                                                                                      Hash Cond:
("outer".addevicessitestreetdescriptionidfk= "inner".idpk) 

                                                                                      ->  Hash Left Join
(cost=57634.86..62707.43rows=142556 width=137) (actual time=223.396..362.945 rows=6364 loops=1) 

                                                                                            Hash Cond:
("outer".addevicessitestreetidfk= "inner".idpk) 

                                                                                            ->  Hash Left Join
(cost=57566.95..61844.20rows=142556 width=119) (actual time=217.101..312.605 rows=6364 loops=1) 

                                                                                                  Hash Cond:
("outer".addevicessitecityidfk= "inner".idpk) 

                                                                                                  ->  Merge Right Join
(cost=57561.85..59700.76rows=142556 width=110) (actual time=216.635..266.672 rows=6364 loops=1) 

                                                                                                        Merge Cond:
("outer".idpk= "inner".addevicessitecountyidfk) 

                                                                                                        ->  Sort
(cost=6.19..6.48rows=117 width=17) (actual time=0.350..0.389 rows=116 loops=1) 

                                                                                                              Sort Key:

                                                                                                              ->  Seq
Scanon addevicessitecounties  (cost=0.00..2.17 rows=117 width=17) (actual time=0.008..0.146 rows=117 loops=1) 

                                                                                                        ->  Sort
(cost=57555.66..57912.05rows=142556 width=99) (actual time=216.250..218.611 rows=6364 loops=1) 

                                                                                                              Sort Key:

                                                                                                              ->  Merge
RightJoin  (cost=33573.63..35712.03 rows=142556 width=99) (actual time=173.646..199.036 rows=6364 loops=1) 

MergeCond: ("outer".idpk = "inner".addevicessiteregionidfk) 

Sort (cost=1.44..1.48 rows=15 width=23) (actual time=0.055..0.059 rows=13 loops=1) 

 Sort Key: addevicessiteregions.idpk 

 ->  Seq Scan on addevicessiteregions  (cost=0.00..1.15 rows=15 width=23) (actual time=0.016..0.032 rows=15 loops=1) 

Sort (cost=33572.19..33928.58 rows=142556 width=82) (actual time=173.559..176.398 rows=6364 loops=1) 

 Sort Key: addevicessites.addevicessiteregionidfk 

 ->  Seq Scan on addevicessites  (cost=0.00..13118.56 rows=142556 width=82) (actual time=62.345..164.783 rows=6364

                                                                                                  ->  Hash
(cost=4.48..4.48rows=248 width=17) (actual time=0.283..0.283 rows=0 loops=1) 

                                                                                                        ->  Seq Scan on
addevicessitecities (cost=0.00..4.48 rows=248 width=17) (actual time=0.011..0.162 rows=138 loops=1) 

                                                                                            ->  Hash
(cost=60.53..60.53rows=2953 width=34) (actual time=6.229..6.229 rows=0 loops=1) 

                                                                                                  ->  Seq Scan on
addevicessitestreets (cost=0.00..60.53 rows=2953 width=34) (actual time=0.010..3.816 rows=2984 loops=1) 

                                                                                      ->  Hash  (cost=96.85..96.85
rows=4585width=43) (actual time=10.017..10.017 rows=0 loops=1) 

                                                                                            ->  Seq Scan on
addevicessitestreetdescriptions (cost=0.00..96.85 rows=4585 width=43) (actual time=0.008..6.371 rows=4585 loops=1) 

                                                                                ->  Hash  (cost=8.59..8.59 rows=459
width=21)(actual time=0.815..0.815 rows=0 loops=1) 

                                                                                      ->  Seq Scan on
addevicessitedistricts (cost=0.00..8.59 rows=459 width=21) (actual time=0.007..0.541 rows=382 loops=1) 

                                                              ->  Materialize  (cost=1.04..1.08 rows=4 width=36)
(actualtime=0.000..0.002 rows=4 loops=6364) 

                                                                    ->  Seq Scan on addevicessitevisibilities
(cost=0.00..1.04rows=4 width=36) (actual time=0.026..0.035 rows=4 loops=1) 

                                                        ->  Materialize  (cost=1.03..1.06 rows=3 width=27) (actual
time=0.000..0.001rows=3 loops=6364) 

                                                              ->  Seq Scan on addevicessitepositions  (cost=0.00..1.03
rows=3width=27) (actual time=0.007..0.010 rows=3 loops=1) 

                                                  ->  Materialize  (cost=1.05..1.10 rows=5 width=31) (actual
time=0.000..0.002rows=5 loops=6364) 

                                                        ->  Seq Scan on addevicessitestatustypes  (cost=0.00..1.05
rows=5width=31) (actual time=0.006..0.016 rows=5 loops=1) 

                                            ->  Hash  (cost=2.61..2.61 rows=61 width=34) (actual time=0.144..0.144

                                                  ->  Seq Scan on partneridentifications partneridentificationsoperator
(cost=0.00..2.61 rows=61 width=34) (actual time=0.007..0.103 rows=61 loops=1) 

                                      ->  Hash  (cost=2.61..2.61 rows=61 width=34) (actual time=0.122..0.122 rows=0

                                            ->  Seq Scan on partners partnerselectricitysupplier  (cost=0.00..2.61
rows=61width=34) (actual time=0.004..0.072 rows=61 loops=1) 

                                ->  Hash  (cost=2.61..2.61 rows=61 width=34) (actual time=0.136..0.136 rows=0 loops=1)

                                      ->  Seq Scan on partners partnersmaintainer  (cost=0.00..2.61 rows=61 width=34)
(actualtime=0.005..0.086 rows=61 loops=1) 

                          ->  Hash  (cost=2.61..2.61 rows=61 width=34) (actual time=0.143..0.143 rows=0 loops=1)

                                ->  Seq Scan on partners partnerssticker  (cost=0.00..2.61 rows=61 width=34) (actual
time=0.009..0.098rows=61 loops=1) 

Total runtime: 2210.937 ms

>            regards, tom lane


pgsql-performance by date:

From: Jan Wieck
Subject: Re: column name is "LIMIT"
From: "Greg Sabino Mullane"
Subject: Re: cpu_tuple_cost