Thread: Missed compiler optimization issue in function select_rtable_names_for_explain
Hi everyone,
I'm a compiler developer working on detecting missed optimization in real-world applications. Recently, we found that LLVM missed a dead store elimination optimization in the PostgreSQL code (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/ruleutils.c#L3794) in the master branch.
For the example below:
```
int dst[128];
memset(dst, 0, 128);
*unrelated = some_value;
dst[1] = 0;
dst[2] = 0;
```
LLVM cannot eliminate the useless stores after memset since the store to "unrelated" clobbers. But if we put the stores to "dst" ahead of the store to "unrelated", we could prevent confusing the compiler. See also the compiler explorer link: https://godbolt.org/z/P9jnKod3v and the issue of LLVM: https://github.com/llvm/llvm-project/issues/88632
To improve the codegen quality, I think it's also possible to modify the source code, changing the order of initialization of the member, to get better optimization. But I don't know whether this can be considered as a bug, thus post the issue here.
If anyone could confirm this problem or post a patch for it, let me know please. Thanks!
Best regards. Hongyu.
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
> On 22 May 2024, at 11:27, XChy <xxs_chy@outlook.com> wrote: > > Hi everyone, > I'm a compiler developer working on detecting missed optimization in real-world applications. Recently, we found that LLVMmissed a dead store elimination optimization in the PostgreSQL code (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/ruleutils.c#L3794)in the master branch. How is the memset in select_rtable_names_for_explain a dead-store? Even memset calls could be optimized away from the EXPLAIN codepath I have a feeling it would have to be many in a tight loop for it to be measurable even? -- Daniel Gustafsson
How is the memset in select_rtable_names_for_explain a dead-store? Even memsetcalls could be optimized away from the EXPLAIN codepath I have a feeling it would have to be many in a tight loop for it to be measurable even? -- Daniel Gustafsson
For the first question, I don't mean that the memset is the dead store. I mean that the stores with value "0" after the memset are dead:
```
dpns.ctes = NIL;
dpns.appendrels = NULL;
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
> On 22 May 2024, at 12:12, XChy <xxs_chy@outlook.com> wrote: > >> How is the memset in select_rtable_names_for_explain a dead-store? Even memset calls could be optimized away from theEXPLAIN codepath I have a feeling it >> would have to be many in a tight loop for it to be measurable even? > For the first question, I don't mean that the memset is the dead store. Gotcha > I mean that the stores with value "0" after the memset are dead: > ``` > dpns.subplans = NIL; > dpns.ctes = NIL; > dpns.appendrels = NULL; > ``` > since the memset has written zeroes to the object "dpns", and these members are known to be zero. They are known to be zero, but that's not entirely equivalent though is it? NIL is defined as ((List *) NULL) and NULL is typically defined as ((void *) 0), so sizeof(0) would be the size of an int and sizeof(NULL) would be the size of a void pointer. -- Daniel Gustafsson
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
On 2024-May-22, XChy wrote: > Hi everyone, > > I'm a compiler developer working on detecting missed optimization in > real-world applications. Recently, we found that LLVM missed a dead store > elimination optimization in the PostgreSQL code <https://github.com/postgres/postgres/blob/c37267162e889fe783786b9e28d1b65b82365a00/src/backend/utils/adt/ruleutils.c#L3794> (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/ruleutils.c#L3794) > in the master branch. I think if we want to improve how this code is seen by the compiler by modifying it, we should just remove the NULL/NIL assignments. It's a pretty arbitrary (and rather small) subset of fields being initialized, fields which very obviously have been zeroed by memset already, so I don't see any value in the explicit assignments. The criteria may be different in cases where some struct has all of its members initialized to something, because at least you can use that kind of thing to find places you need to modify when adding new struct members. Not so in select_rtable_names_for_explain. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I can see support will not be a problem. 10 out of 10." (Simon Wittber) (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)
I mean that the stores with value "0" after the memset are dead: ``` dpns.subplans = NIL; dpns.ctes = NIL; dpns.appendrels = NULL; ``` since the memset has written zeroes to the object "dpns", and these members are known to be zero.They are known to be zero, but that's not entirely equivalent though is it? NIL is defined as ((List *) NULL) and NULL is typically defined as ((void *) 0), so sizeof(0) would be the size of an int and sizeof(NULL) would be the size of a void pointer.
The type or size doesn't matter here. At IR or assembly level, they are all zeroes.
My main point is that "dpns.xxx" are filled with zeroes by the memset firstly, and overwriting them with zeroes in the following stores is redundant. LLVM cannot remove the redundant overwrites due to the initialization order. If we adjust the order of the initialization of "dpns.xxx", the compiler can remove such stores.
Does my explanation make sense to you?
Best regards, Hongyu.
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
Daniel Gustafsson <daniel@yesql.se> writes: > They are known to be zero, but that's not entirely equivalent though is it? > NIL is defined as ((List *) NULL) and NULL is typically defined as ((void *) > 0), so sizeof(0) would be the size of an int and sizeof(NULL) would be the size > of a void pointer. There are other places where we assume that a memset-to-zero will produce null pointers, so I don't think that that objection has a lot of force. My real answer is that this is our coding style and we are not going to change it: our normal convention is to initialize struct fields in declaration order, and that's what we're doing here. If some particular version of some particular compiler fails to make an entirely-negligible optimization in consequence, that is not something we are going to care about. Maybe we would care if the missed optimization were a serious performance loss in a very hot code path. But this is neither. I'd also say that this is pretty clearly a compiler bug. If it'd normally optimize away a null-pointer-store following a memset-to-zero, but does not in memset(&dpns, 0, sizeof(dpns)); dpns.rtable = rtable; dpns.subplans = NIL; dpns.ctes = NIL; dpns.appendrels = NULL; that would seem to indicate that the optimizer doesn't really understand that dpns.rtable is a distinct field from the others. How is that our problem? (I kind of wonder if what's actually blocking the optimization is the casts inside the NIL macros. Still not our problem.) regards, tom lane
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
> On 22 May 2024, at 13:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I think if we want to improve how this code is seen by the compiler by > modifying it, we should just remove the NULL/NIL assignments. *If* the optimization is measurable, IMHO. > It's a > pretty arbitrary (and rather small) subset of fields being initialized, > fields which very obviously have been zeroed by memset already, so I > don't see any value in the explicit assignments. My personal opinion is that there is value in immediately knowing the type of the members from the inits, in this case I know without looking up the definition that some of the members are list. I like that readability. That being said, I for sure won't fight if there is concensus to remove. -- Daniel Gustafsson
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
> On 22 May 2024, at 18:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >> They are known to be zero, but that's not entirely equivalent though is it? >> NIL is defined as ((List *) NULL) and NULL is typically defined as ((void *) >> 0), so sizeof(0) would be the size of an int and sizeof(NULL) would be the size >> of a void pointer. > > There are other places where we assume that a memset-to-zero will > produce null pointers, so I don't think that that objection has > a lot of force. It wasn't really an objection, but (perhaps a badly worded) an attempt to understand the proposal. > My real answer is that this is our coding style > and we are not going to change it: our normal convention is to > initialize struct fields in declaration order, and that's what > we're doing here. If some particular version of some particular > compiler fails to make an entirely-negligible optimization in > consequence, that is not something we are going to care about. +1 -- Daniel Gustafsson
Re: Missed compiler optimization issue in function select_rtable_names_for_explain
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I think if we want to improve how this code is seen by the compiler by > modifying it, we should just remove the NULL/NIL assignments. It's a > pretty arbitrary (and rather small) subset of fields being initialized, > fields which very obviously have been zeroed by memset already, so I > don't see any value in the explicit assignments. I have to concede that it's unclear why we're filling just these fields explicitly and not any others. We're at least consistent: all the callers of select_rtable_names do it the same way. regards, tom lane
The query and execution plan are shown below. My question is related to the result set the optimizer is choosing to buildthe hash table from. My understanding is for a hash join you want to build the hash table out of the smaller resultset. If you look at the execution plan below you can see the optimizer estimates 1000 rows from the seq scan of tablecollection and 120,000 rows from the seq scan of table docloc_test but is building the hash buckets from those 120000rows rather than from the 1000 rows. In our case under certain volume that causes that to spill to temp and underhigh load it hangs up the Aurora Postgres database. But if it were to build the hash table out of those 1000 rows itwould fit in work_mem so no problems. Why is it picking the larger result set? Another funky thing here- here are the table definitions: pgcci01ap=> \d CCI.COLLECTION Column | Type | Collation | Nullable | Default ----------------------+--------------------------------+-----------+----------+---------------------------------------- collection_name | character varying(40) | | not null | l_stage | numeric(11,0) | | | p_stage | numeric(11,0) | | | t_stage | numeric(11,0) | | | last_upd_datetime | timestamp(6) without time zone | | not null | last_upd_inits | character varying(30) | | not null | owner_name | character varying(30) | | | password | character varying(30) | | | email_address | character varying(2000) | | | available_flag | character(1) | | | collection_id | numeric(11,0) | | not null | collection_type | character varying(20) | | | retrieval_password | character varying(40) | | | partner_coll_name | character varying(40) | | | relation2partner | character varying(20) | | | reload_flag | character(1) | | | 'N'::bpchar partner_id | character varying(40) | | | content_timezone | character varying(40) | | not null | 'America/Chicago'::character varying token_type | character varying(30) | | | cc_collection_dest | character varying(40) | | | auto_reclaim_enabled | character(1) | | not null | 'N'::bpchar collection_family | character varying(40) | | not null | '<COLLECTION NAME>'::character varying access_password | character(40) | | | mic_group | character varying(40) | | | mic_type | character varying(10) | | | retrieval_source | character varying(40) | | not null | 'DOC1'::character varying Indexes: "xpkcollection" PRIMARY KEY, btree (collection_name) "xak1collection" UNIQUE CONSTRAINT, btree (collection_id) "xie1collection" btree (relation2partner) "xie2collection" btree (collection_family, collection_name) "xie3collection" btree (mic_group) Referenced by: TABLE "cci.index_update_proc" CONSTRAINT "rfk12_index_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.authority_update_proc" CONSTRAINT "rfk1_authority_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_event" CONSTRAINT "rfk1_collection_event" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_pit" CONSTRAINT "rfk1_collection_pit" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_stage" CONSTRAINT "rfk1_collection_stage" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.csloc_update_proc" CONSTRAINT "rfk1_csloc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.docloc_update_proc" CONSTRAINT "rfk1_docloc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_set_mrg" CONSTRAINT "rfk1_index_set_mrg" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_set_stats" CONSTRAINT "rfk1_index_set_stats" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_system_attr" CONSTRAINT "rfk1_index_system_attr" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.load_update_proc" CONSTRAINT "rfk1_load_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.metadoc_update_proc" CONSTRAINT "rfk1_metadoc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.mm_update_process" CONSTRAINT "rfk1_mm_update_process" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.reclaim" CONSTRAINT "rfk1_reclaim" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.rel_grp_upd_proc" CONSTRAINT "rfk1_rel_grp_upd_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.toc_update_process" CONSTRAINT "rfk1_toc_update_process" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.view_definition" CONSTRAINT "rfk1_view_definition" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.dcsloc_collection_stats" CONSTRAINT "rfk1dcsloc_collection_stats" FOREIGN KEY (collection_name) REFERENCEScci.collection(collection_name) TABLE "cci.doc_data_domain" CONSTRAINT "rfk1doc_data_domain" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.doc_update_process" CONSTRAINT "rfk2_doc_update_process" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.dcsloc_partition_map" CONSTRAINT "rfk2dcsloc_partition_map" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.meta_update_proc" CONSTRAINT "rfk3_metadoc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) pgcci01ap=> \d docloc_test Column | Type | Collation | Nullable | Default -------------------+--------------------------------+-----------+----------+------------- collection_name | character varying(40) | | not null | stage_id | numeric(11,0) | | not null | begin_stage_id | numeric(11,0) | | | last_upd_datetime | timestamp(6) without time zone | | | last_upd_inits | character varying(30) | | | docloc_check_type | character(1) | | not null | 'S'::bpchar replicate_done | character(1) | | not null | 'N'::bpchar docloc_id | numeric(11,0) | | | Indexes: "xpkdocloc" PRIMARY KEY, btree (collection_name, stage_id) Foreign-key constraints: "rfk1_docloc" FOREIGN KEY (collection_name, stage_id) REFERENCES cci.collection_stage(collection_name, stage_id) pgcci01ap=> \d CCI.COLLECTION_PIT Column | Type | Collation | Nullable | Default -------------------+--------------------------------+-----------+----------+--------- collection_name | character varying(40) | | not null | pit_id | numeric(11,0) | | not null | stage_code | character(1) | | not null | stage_id | numeric(11,0) | | not null | last_upd_datetime | timestamp(6) without time zone | | not null | last_upd_inits | character varying(30) | | not null | Indexes: "xpkcollection_pit" PRIMARY KEY, btree (collection_name, stage_code, pit_id, stage_id) "xak1collection_pit" UNIQUE, btree (collection_name, stage_code, pit_id, stage_id, last_upd_datetime DESC, last_upd_initsDESC) "xak2collection_pit" btree (collection_name, stage_code, stage_id) Foreign-key constraints: "rfk1_collection_pit" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) Triggers: td_collection_pit AFTER DELETE ON cci.collection_pit FOR EACH ROW EXECUTE FUNCTION cci."td_collection_pit$collection_pit"() When running some tests I forgot to create the PK on table docloc_test. When the PK was not on the table the optimizer decidedto create the hash table off the 1000 rows from collection. But as soon as I put the PK on that table it then decidesto use docloc_test to build the hash table. I can understand how the PK missing or not could impact the executionplan (full scanning the table and hash join vs nested looping to it) but in both cases docloc_test and collectionwere hash joined and the difference was which result set was used to build the hash table. I cannot come up withany theory on why the existence of non-existence of this PK would impact which result set the hash table was built from. In both cases the row estimates from collection and docloc_test were exactly the same in both plans (1000 from collection,120000 fro mdocloc_test). explain (analyze, buffers) SELECT DOCLOC.BEGIN_STAGE_ID, DOCLOC.COLLECTION_NAME, DOCLOC.DOCLOC_CHECK_TYPE, DOCLOC.DOCLOC_ID, DOCLOC.LAST_UPD_DATETIME,DOCLOC.LAST_UPD_INITS, DOCLOC.REPLICATE_DONE, DOCLOC.STAGE_ID, COLLECTION.ACCESS_PASSWORD, COLLECTION.AUTO_RECLAIM_ENABLED, COLLECTION.AVAILABLE_FLAG,COLLECTION.CC_COLLECTION_DEST, COLLECTION.COLLECTION_FAMILY, COLLECTION.COLLECTION_ID, COLLECTION.COLLECTION_NAME, COLLECTION.COLLECTION_TYPE, COLLECTION.CONTENT_TIMEZONE,COLLECTION.EMAIL_ADDRESS, COLLECTION.LAST_UPD_DATETIME, COLLECTION.LAST_UPD_INITS, COLLECTION.L_STAGE, COLLECTION.MIC_GROUP, COLLECTION.MIC_TYPE, COLLECTION.OWNER_NAME, COLLECTION.PARTNER_COLL_NAME, COLLECTION.PARTNER_ID, COLLECTION.PASSWORD, COLLECTION.P_STAGE, COLLECTION.RELATION2PARTNER,COLLECTION.RELOAD_FLAG, COLLECTION.RETRIEVAL_PASSWORD, COLLECTION.RETRIEVAL_SOURCE, COLLECTION.TOKEN_TYPE, COLLECTION.T_STAGE FROM DOCLOC_test docloc, CCI.COLLECTION WHERE COLLECTION.COLLECTION_ID IN (2188,15418,1427,1425,4584,1424,1429,1426,1638,1639,1640,1641,1642,1684,1685,1686,1428,13727,7421,7422,2189,5145,2599,2992,9245,9246,9247,9248,9249,9250,9251,9252,5121,19614,-6322,928,1544,-14765,929,-6323,930,931,932,1542,3890,6594,8986,-6279,19631,1466,19639,1467,1468,1475,1540,3892,-6348, -6358,4753,1978,1977,1966,2687,2686,2674,-6350,3060,3600,426,3654,2718,1824,445,496,556,432,421,423,471,16152,16362,16154,16155,16156,16157,16799,507,573,574,404,524,743,15388,694,3537,1408,10097,9376,9375,10098,9377,8559,8557,1599,1963,1976,8551,8555,8848,8552,8554,8734,8735,10306,8739,8736, 8737,9253,9254,9255,9256,9257,9258,9063,9194,9065,9066,9067,9068,9069,9070,9071,9072,9177,9178,9179,9180,9181,9182,9183,9184,9093,4222,3048,3116,6971,6972,6973,6974,2675,8387,2360,2672,2673,2681,2682,2683,1528,2671,2676,7438,2678,2677,2679,2685,2684,2361,2680,2688,3533,10605,349,385,396,631, 2541,1396,1979,3022,1980,138,12,6835,10609,7164,5307,6863,6864,20444,695,1815,514,600,601,602,603,604,605,3069,2321,790,446,490,6520,2006,3157,3158,3917,419,461,437,555,748,464,427,3539,16158,8738,8733,18,3628,3626,3627,4805,4806,5054,3630,3629,3631,4807,3634,4714,240,13710,7227,8909,8910, 8911,10730,8912,8951,2019,7315,7078,7930,7928,7929,693,7261,7262,239,242,8952,7228,330,328,7079,4812,4813,6975,14410,6976,6977,6978,6979,6980,6981,6982,6983,6984,6985,6986,6987,6988,6989,6990,6991,6992,6993,6994,6995,6996,6997,6998,6999,7000,7001,7002,7003,7004,7828,7829,4479,6544,10706, 10707,6545,4010,4011,6598,6847,6850,6848,6851,2892,6484,6486,6485,4009,4012,3040,3039,10746,8233,4476,4477,7255,6889,5139,7464,7465,7466,8388,7469,16159,16160,752,3540,831,3147,3148,3149,3137,5481,9161,1825,2028,9230,460,642,1827,8950,751,2833,3162,1990,1523,755,2013,754,1829,467,753,474, 2793,703,1632,412,4814,4815,4816,540,541,1406,2020,6795,7220,9554,9555,9556,2999,17387,17388,17443,17444,10682,10681,10683,10686,10684,10687,10685,10688,20,2510,155,21,22,26,27,28,29,30,31,32,33,7005,7006,7007,7008,7009,7010,7011,7125,7013,7014,7015,7016,7017,7018,7019,7020,6970,1512,2439, 17844,2959,16211,2506,1450,3211,1598,2037,3587,8563,8564,8565,8566,7468,8393,7467,8276,8392,3575,3576,9576,10516,3557,6900,1819,1809,1808,1805,1804,1807,1806,8558,10610,16806,16807,16808,16809,16810,2523,16811,16812,2522,3907,15,17,3817,3798,3810,3794,3821,3808,837,758,15389,757,814,548, 1419,3163,2957,15498,15499,2539,10295,16381,2032,1315,716,1421,16800,762,1833,8612,2039,17684,17685,15500,1834,1566,425,468,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,19880,379,19881,19884,19885,383,19886,19888,203,50,51,52,53,54,55,8567,8568,8569,8570,8571,8572,8573,8574,8575,8576, 8577,8578,8579,8580,8581,8582,8583,8584,8585,8586,8587,8588,8589,8590,8591,8592,8593,8594,8595,8596,8953,3811,7196,3784,3785,7216,7211,3800,9175,4701,3781,7215,7210,3813,3828,3789,7217,7212,3818,3792,3832,3829,3804,3807,7218,7213,5102,3809,7214,5095,3830,10034,4988,2924,2925,4553,4554,7221, 7222,7869,8271,8275,8410,16161,16162,8411,9428,414,476,935,20533,20534,20668,20679,20680,710,1401,433,2040,1838,3164,1853,8835,20681,2844,3165,16163,16164,771,56,2499,2501,235,57,58,59,60,61,62,63,141,202,2502,190,191,192,189,140,534,535,2508,139,533,536,199,200,2500,304,305,306,307,8597, 8598,8599,8600,8601,8602,8785,3586,3585,1612,1613,1614,1616,1617,1647,1648,1649,1652,1820,16375,13728,14411,2046,2047,2186,2263,2318,2432,2448,2442,2487,2525,2527,2535,2537,2544,22013,22014,22015,22016,769,2794,2200,484,1949,7360,8896,1812,3139,7115,2660,6912,659,3071,8647,711,1962,4002,16801, 705,6911,774,16166,16167,4495,778,7205,706,672,3054,449,2958,308,309,310,311,312,313,314,315,316,317,318,319,15400,15401,15402,15403,16685,15405,15406,15407,15408,15409,15410,15411,15412,15413,15414,16210,233,2507,2691,2692,2695,2801,2805,2806,2818,2905,2845,2987,2995,3026,3049,3050,3052,3066, 3091,3095,3186,3526,3649,3888,3884,3886,4288,4378,4517,4916,4917,5003,8561,21903,21911,21927,14644,1411,707,463,15501,1950,405,671,3055,15394,1170,1168,3152,9163,16382,16217,1844,417,2444,1841,2035,781,16363,2721,684,784,10571,782,462,646,3056,443,479,2007,3167,3168,118,2509,17019,17020,17021, 17022,17023,17445,17446,14395,17447,17448,17449,1447,247,3908,2187,1646,2511,3974,3975,3982,3983,3984,3985,3986,3987,3988,3989,3990,3991,3992,21880,21881,21885,21886,21887,21889,21890,21891,21939,21942,13714,21943,21945,21946,21962,4556,16218,4557,6539,4555,7452,1410,420,454,16169,16170,16171, 16172,16173,1872,16174,785,1847,1454,16175,-703,406,525,526,788,470,742,1947,7195,793,3153,503,563,564,2834,444,3993,3994,3995,3996,3997,3998,3999,4297,4298,4299,4300,4301,4302,4303,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,21940,21941, 21975,21977,21978,21979,21981,18232,8054,8055,8060,8058,8059,8056,8057,18233,18234,18235,18236,18237,8042,8043,8044,8045,8046,8047,8048,8049,8050,8051,8052,21985,21986,21969,21982,21983,4481,2720,741,16176,15395,3602,409,1811,16365,599,3072,4044,799,1869,3141,403,480) AND COLLECTION.COLLECTION_NAME=DOCLOC.COLLECTION_NAME AND DOCLOC.STAGE_ID= (SELECT MAX (STAGE_ID) FROM CCI.COLLECTION_PIT WHERE COLLECTION_PIT.COLLECTION_NAME=COLLECTION.COLLECTION_NAME AND COLLECTION_PIT.PIT_ID<=2147483647 AND COLLECTION_PIT.STAGE_CODE='F'); The plan below was executed with work_mem at 12 MB so it's not spilling to temp but this is only for my session. At 4MBit will spill to temp. Hash Join (cost=4302.50..5658.10 rows=1 width=529) (actual time=67.959..81.185 rows=496 loops=1) Hash Cond: (((collection.collection_name)::text = (docloc.collection_name)::text) AND ((SubPlan 2) = docloc.stage_id)) Buffers: shared hit=7735 -> Seq Scan on collection (cost=2.50..1040.08 rows=1000 width=478) (actual time=0.058..4.327 rows=1000 loops=1) Filter: (collection_id = ANY ('{2188,15418,1427,1425,4584,1424,1429,1426,1638,1639,1640,1641,1642,1684,1685,1686,1428,13727,7421,7422,2189,5145,2599,2992,9245,9246,9247,9248,9249,9250,9251,9252,5 121,19614,-6322,928,1544,-14765,929,-6323,930,931,932,1542,3890,6594,8986,-6279,19631,1466,19639,1467,1468,1475,1540,3892,-6348,-6358,4753,1978,1977,1966,2687,2686,2674,-6350,3060,3600,426,3654,2718,1824, 445,496,556,432,421,423,471,16152,16362,16154,16155,16156,16157,16799,507,573,574,404,524,743,15388,694,3537,1408,10097,9376,9375,10098,9377,8559,8557,1599,1963,1976,8551,8555,8848,8552,8554,8734,8735,103 06,8739,8736,8737,9253,9254,9255,9256,9257,9258,9063,9194,9065,9066,9067,9068,9069,9070,9071,9072,9177,9178,9179,9180,9181,9182,9183,9184,9093,4222,3048,3116,6971,6972,6973,6974,2675,8387,2360,2672,2673,2 681,2682,2683,1528,2671,2676,7438,2678,2677,2679,2685,2684,2361,2680,2688,3533,10605,349,385,396,631,2541,1396,1979,3022,1980,138,12,6835,10609,7164,5307,6863,6864,20444,695,1815,514,600,601,602,603,604,6 05,3069,2321,790,446,490,6520,2006,3157,3158,3917,419,461,437,555,748,464,427,3539,16158,8738,8733,18,3628,3626,3627,4805,4806,5054,3630,3629,3631,4807,3634,4714,240,13710,7227,8909,8910,8911,10730,8912,8 951,2019,7315,7078,7930,7928,7929,693,7261,7262,239,242,8952,7228,330,328,7079,4812,4813,6975,14410,6976,6977,6978,6979,6980,6981,6982,6983,6984,6985,6986,6987,6988,6989,6990,6991,6992,6993,6994,6995,6996 ,6997,6998,6999,7000,7001,7002,7003,7004,7828,7829,4479,6544,10706,10707,6545,4010,4011,6598,6847,6850,6848,6851,2892,6484,6486,6485,4009,4012,3040,3039,10746,8233,4476,4477,7255,6889,5139,7464,7465,7466, 8388,7469,16159,16160,752,3540,831,3147,3148,3149,3137,5481,9161,1825,2028,9230,460,642,1827,8950,751,2833,3162,1990,1523,755,2013,754,1829,467,753,474,2793,703,1632,412,4814,4815,4816,540,541,1406,2020,6 795,7220,9554,9555,9556,2999,17387,17388,17443,17444,10682,10681,10683,10686,10684,10687,10685,10688,20,2510,155,21,22,26,27,28,29,30,31,32,33,7005,7006,7007,7008,7009,7010,7011,7125,7013,7014,7015,7016,7 017,7018,7019,7020,6970,1512,2439,17844,2959,16211,2506,1450,3211,1598,2037,3587,8563,8564,8565,8566,7468,8393,7467,8276,8392,3575,3576,9576,10516,3557,6900,1819,1809,1808,1805,1804,1807,1806,8558,10610,1 6806,16807,16808,16809,16810,2523,16811,16812,2522,3907,15,17,3817,3798,3810,3794,3821,3808,837,758,15389,757,814,548,1419,3163,2957,15498,15499,2539,10295,16381,2032,1315,716,1421,16800,762,1833,8612,203 9,17684,17685,15500,1834,1566,425,468,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,19880,379,19881,19884,19885,383,19886,19888,203,50,51,52,53,54,55,8567,8568,8569,8570,8571,8572,8573,8574,8575,8576,85 77,8578,8579,8580,8581,8582,8583,8584,8585,8586,8587,8588,8589,8590,8591,8592,8593,8594,8595,8596,8953,3811,7196,3784,3785,7216,7211,3800,9175,4701,3781,7215,7210,3813,3828,3789,7217,7212,3818,3792,3832,3 829,3804,3807,7218,7213,5102,3809,7214,5095,3830,10034,4988,2924,2925,4553,4554,7221,7222,7869,8271,8275,8410,16161,16162,8411,9428,414,476,935,20533,20534,20668,20679,20680,710,1401,433,2040,1838,3164,18 53,8835,20681,2844,3165,16163,16164,771,56,2499,2501,235,57,58,59,60,61,62,63,141,202,2502,190,191,192,189,140,534,535,2508,139,533,536,199,200,2500,304,305,306,307,8597,8598,8599,8600,8601,8602,8785,3586 ,3585,1612,1613,1614,1616,1617,1647,1648,1649,1652,1820,16375,13728,14411,2046,2047,2186,2263,2318,2432,2448,2442,2487,2525,2527,2535,2537,2544,22013,22014,22015,22016,769,2794,2200,484,1949,7360,8896,181 2,3139,7115,2660,6912,659,3071,8647,711,1962,4002,16801,705,6911,774,16166,16167,4495,778,7205,706,672,3054,449,2958,308,309,310,311,312,313,314,315,316,317,318,319,15400,15401,15402,15403,16685,15405,154 06,15407,15408,15409,15410,15411,15412,15413,15414,16210,233,2507,2691,2692,2695,2801,2805,2806,2818,2905,2845,2987,2995,3026,3049,3050,3052,3066,3091,3095,3186,3526,3649,3888,3884,3886,4288,4378,4517,491 6,4917,5003,8561,21903,21911,21927,14644,1411,707,463,15501,1950,405,671,3055,15394,1170,1168,3152,9163,16382,16217,1844,417,2444,1841,2035,781,16363,2721,684,784,10571,782,462,646,3056,443,479,2007,3167, 3168,118,2509,17019,17020,17021,17022,17023,17445,17446,14395,17447,17448,17449,1447,247,3908,2187,1646,2511,3974,3975,3982,3983,3984,3985,3986,3987,3988,3989,3990,3991,3992,21880,21881,21885,21886,21887, 21889,21890,21891,21939,21942,13714,21943,21945,21946,21962,4556,16218,4557,6539,4555,7452,1410,420,454,16169,16170,16171,16172,16173,1872,16174,785,1847,1454,16175,-703,406,525,526,788,470,742,1947,7195, 793,3153,503,563,564,2834,444,3993,3994,3995,3996,3997,3998,3999,4297,4298,4299,4300,4301,4302,4303,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,2194 0,21941,21975,21977,21978,21979,21981,18232,8054,8055,8060,8058,8059,8056,8057,18233,18234,18235,18236,18237,8042,8043,8044,8045,8046,8047,8048,8049,8050,8051,8052,21985,21986,21969,21982,21983,4481,2720, 741,16176,15395,3602,409,1811,16365,599,3072,4044,799,1869,3141,403,480}'::numeric[])) Rows Removed by Filter: 15110 Buffers: shared hit=598 -> Hash (cost=2500.00..2500.00 rows=120000 width=51) (actual time=67.466..67.467 rows=120000 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 11485kB Buffers: shared hit=1300 -> Seq Scan on docloc_test docloc (cost=0.00..2500.00 rows=120000 width=51) (actual time=0.004..22.313 rows=120000loops=1) Buffers: shared hit=1300 SubPlan 2 -> Result (cost=6.25..6.26 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1496) Buffers: shared hit=5837 InitPlan 1 (returns $1) -> Limit (cost=0.42..6.25 rows=1 width=6) (actual time=0.005..0.005 rows=1 loops=1496) Buffers: shared hit=5837 -> Index Scan Backward using xak2collection_pit on collection_pit (cost=0.42..64.50 rows=11 width=6)(actual time=0.004..0.004 rows=1 loops=1496) Index Cond: (((collection_name)::text = (collection.collection_name)::text) AND (stage_code = 'F'::bpchar)AND (stage_id IS NOT NULL)) Filter: (pit_id <= '2147483647'::numeric) Buffers: shared hit=5837 Planning: Buffers: shared hit=78 Planning Time: 2.287 ms Execution Time: 83.361 ms Thanks in advance. This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential.If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and anyattachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes: > The query and execution plan are shown below. My question is > related to the result set the optimizer is choosing to build the > hash table from. My understanding is for a hash join you want to > build the hash table out of the smaller result set. That's *a* consideration, but not the only one. We also consider whether the hash key has a flat distribution; if it is too skewed, we might find specific hash chains getting too long. > When running some tests I forgot to create the PK on table > docloc_test. When the PK was not on the table the optimizer decided > to create the hash table off the 1000 rows from collection. But as > soon as I put the PK on that table it then decides to use > docloc_test to build the hash table. I think that the presence of a unique index overrides the statistics (or the lack of any) so that the planner knows that the column is unique and thus safe to use as a hash key. Now, it should have known that anyway, unless maybe this is a freshly-built table that auto-analyze hasn't gotten to yet? regards, tom lane