Thread: Please help! Query jumps from 1s -> 4m

Please help! Query jumps from 1s -> 4m

From

James Thompson

Date:

03 May 2020, 08:58:27

Hi,

Hoping someone can help with this performance issue that's been driving a few of us crazy :-) Any guidance greatly appreciated.

A description of what you are trying to achieve and what results you expect.:

- I'd like to get an understanding of why the following query (presented in full, but there are specific parts that are confusing me) starts off taking ~second in duration but 'switches' to taking over 4 minutes.
- we initially saw this behaviour for the exact same sql with a different index that resulted in an index scan. To try and fix the issue we've created an additional index with additional included fields so we now have Index Only Scans, but are still seeing the same problem.
- execution plan is from auto_explain output when it took just over 4 minutes to execute. The time is shared ~equally between these two index-only scans.
- There are no checkpoints occurring concurrently with this (based on "checkpoint starting" and "checkpoint complete" in logs)
- bloat on the index is about 30%

Segments of interest:
1. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1 (cost=0.56..17.25 rows=1 width=60) (actual time=110.539..123828.134 rows=67000 loops=1)
Index Cond: (col20 = $2005)
Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY ((ARRAY[$1004, ..., $2003])::text[])))
Rows Removed by Filter: 2662652
Heap Fetches: 6940
Buffers: shared hit=46619 read=42784 written=52

2. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual time=142.855..122034.039 rows=67000 loops=1)
Index Cond: (col20 = $1001)
Filter: ((col8)::text = ANY ((ARRAY[$1, ..., $1000])::text[]))
Rows Removed by Filter: 2662652
Heap Fetches: 6891
Buffers: shared hit=47062 read=42331 written=37

If I run the same queries now:
Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1 (cost=0.56..2549.69 rows=69 width=36)
(actual time=1.017..1221.375 rows=67000 loops=1)
Heap Fetches: 24
Buffers: shared hit=2849 read=2483

buffers do look different - but still, reading 42k doesn't seem like it would cause a delay of 4m?

Actually, here's another example of segment 2 from logs.

Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual time=36.559..120649.742 rows=65000 loops=1)
Index Cond: (col20 = $1001)
Filter: ((col8)::text = ANY ((ARRAY[$1, $1000]::text[]))
Rows Removed by Filter: 2664256
Heap Fetches: 6306
Buffers: shared hit=87712 read=1507

One note: I've replaced table/column names (sorry, a requirement).

Full subquery execution plan (i've stripped out the view materialization from row 14 onwards but left the header in):
https://explain.depesz.com/s/vsdH

Full Sql:
SELECT
subquery.id
FROM (
SELECT
table1alias1.id,
table1alias1.uniqueid,
table1alias1.col16 AS order_by
FROM
table1 AS table1alias1
LEFT OUTER JOIN (
SELECT
inlinealias1.id,
inlinealias1.uniqueid,
inlinealias1.col4,
inlinealias1.col5,
inlinealias1.col6,
inlinealias1.col7
FROM (
SELECT
table2alias.id,
table2alias.uniqueid,
table2alias.col3,
table2alias.col4,
table2alias.col5,
table2alias.col6,
row_number() OVER (PARTITION BY table2alias.uniqueid ORDER BY table2alias.col13 DESC, table2alias.col3 DESC, table2alias.id DESC) AS rn
FROM
table2 AS table2alias
JOIN ( SELECT DISTINCT
table1alias2.uniqueid
FROM
table1 AS table1alias2
WHERE (table1alias2.col8 IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106, $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118, $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130, $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142, $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154, $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166, $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178, $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190, $191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202, $203, $204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214, $215, $216, $217, $218, $219, $220, $221, $222, $223, $224, $225, $226, $227, $228, $229, $230, $231, $232, $233, $234, $235, $236, $237, $238, $239, $240, $241, $242, $243, $244, $245, $246, $247, $248, $249, $250, $251, $252, $253, $254, $255, $256, $257, $258, $259, $260, $261, $262, $263, $264, $265, $266, $267, $268, $269, $270, $271, $272, $273, $274, $275, $276, $277, $278, $279, $280, $281, $282, $283, $284, $285, $286, $287, $288, $289, $290, $291, $292, $293, $294, $295, $296, $297, $298, $299, $300, $301, $302, $303, $304, $305, $306, $307, $308, $309, $310, $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, $321, $322, $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, $334, $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346, $347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358, $359, $360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370, $371, $372, $373, $374, $375, $376, $377, $378, $379, $380, $381, $382, $383, $384, $385, $386, $387, $388, $389, $390, $391, $392, $393, $394, $395, $396, $397, $398, $399, $400, $401, $402, $403, $404, $405, $406, $407, $408, $409, $410, $411, $412, $413, $414, $415, $416, $417, $418, $419, $420, $421, $422, $423, $424, $425, $426, $427, $428, $429, $430, $431, $432, $433, $434, $435, $436, $437, $438, $439, $440, $441, $442, $443, $444, $445, $446, $447, $448, $449, $450, $451, $452, $453, $454, $455, $456, $457, $458, $459, $460, $461, $462, $463, $464, $465, $466, $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, $477, $478, $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, $490, $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502, $503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514, $515, $516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526, $527, $528, $529, $530, $531, $532, $533, $534, $535, $536, $537, $538, $539, $540, $541, $542, $543, $544, $545, $546, $547, $548, $549, $550, $551, $552, $553, $554, $555, $556, $557, $558, $559, $560, $561, $562, $563, $564, $565, $566, $567, $568, $569, $570, $571, $572, $573, $574, $575, $576, $577, $578, $579, $580, $581, $582, $583, $584, $585, $586, $587, $588, $589, $590, $591, $592, $593, $594, $595, $596, $597, $598, $599, $600, $601, $602, $603, $604, $605, $606, $607, $608, $609, $610, $611, $612, $613, $614, $615, $616, $617, $618, $619, $620, $621, $622, $623, $624, $625, $626, $627, $628, $629, $630, $631, $632, $633, $634, $635, $636, $637, $638, $639, $640, $641, $642, $643, $644, $645, $646, $647, $648, $649, $650, $651, $652, $653, $654, $655, $656, $657, $658, $659, $660, $661, $662, $663, $664, $665, $666, $667, $668, $669, $670, $671, $672, $673, $674, $675, $676, $677, $678, $679, $680, $681, $682, $683, $684, $685, $686, $687, $688, $689, $690, $691, $692, $693, $694, $695, $696, $697, $698, $699, $700, $701, $702, $703, $704, $705, $706, $707, $708, $709, $710, $711, $712, $713, $714, $715, $716, $717, $718, $719, $720, $721, $722, $723, $724, $725, $726, $727, $728, $729, $730, $731, $732, $733, $734, $735, $736, $737, $738, $739, $740, $741, $742, $743, $744, $745, $746, $747, $748, $749, $750, $751, $752, $753, $754, $755, $756, $757, $758, $759, $760, $761, $762, $763, $764, $765, $766, $767, $768, $769, $770, $771, $772, $773, $774, $775, $776, $777, $778, $779, $780, $781, $782, $783, $784, $785, $786, $787, $788, $789, $790, $791, $792, $793, $794, $795, $796, $797, $798, $799, $800, $801, $802, $803, $804, $805, $806, $807, $808, $809, $810, $811, $812, $813, $814, $815, $816, $817, $818, $819, $820, $821, $822, $823, $824, $825, $826, $827, $828, $829, $830, $831, $832, $833, $834, $835, $836, $837, $838, $839, $840, $841, $842, $843, $844, $845, $846, $847, $848, $849, $850, $851, $852, $853, $854, $855, $856, $857, $858, $859, $860, $861, $862, $863, $864, $865, $866, $867, $868, $869, $870, $871, $872, $873, $874, $875, $876, $877, $878, $879, $880, $881, $882, $883, $884, $885, $886, $887, $888, $889, $890, $891, $892, $893, $894, $895, $896, $897, $898, $899, $900, $901, $902, $903, $904, $905, $906, $907, $908, $909, $910, $911, $912, $913, $914, $915, $916, $917, $918, $919, $920, $921, $922, $923, $924, $925, $926, $927, $928, $929, $930, $931, $932, $933, $934, $935, $936, $937, $938, $939, $940, $941, $942, $943, $944, $945, $946, $947, $948, $949, $950, $951, $952, $953, $954, $955, $956, $957, $958, $959, $960, $961, $962, $963, $964, $965, $966, $967, $968, $969, $970, $971, $972, $973, $974, $975, $976, $977, $978, $979, $980, $981, $982, $983, $984, $985, $986, $987, $988, $989, $990, $991, $992, $993, $994, $995, $996, $997, $998, $999, $1000)
AND (table1alias2.datatype IN (CAST('TypeA' AS datatype_enum)))
AND table1alias2.col20 IN ($1001))) AS candidateUniqueId ON table2alias.uniqueid = candidateUniqueId.uniqueid) AS inlinealias1
WHERE
inlinealias1.rn = $1002) AS inlinealias2 ON table1alias1.uniqueid = inlinealias2.uniqueid
WHERE (EXISTS (
SELECT
1 AS one
FROM
view1
WHERE (col8 = $1003
AND table1alias1.col20 = col2))
AND table1alias1.col8 IN ($1004, $1005, $1006, $1007, $1008, $1009, $1010, $1011, $1012, $1013, $1014, $1015, $1016, $1017, $1018, $1019, $1020, $1021, $1022, $1023, $1024, $1025, $1026, $1027, $1028, $1029, $1030, $1031, $1032, $1033, $1034, $1035, $1036, $1037, $1038, $1039, $1040, $1041, $1042, $1043, $1044, $1045, $1046, $1047, $1048, $1049, $1050, $1051, $1052, $1053, $1054, $1055, $1056, $1057, $1058, $1059, $1060, $1061, $1062, $1063, $1064, $1065, $1066, $1067, $1068, $1069, $1070, $1071, $1072, $1073, $1074, $1075, $1076, $1077, $1078, $1079, $1080, $1081, $1082, $1083, $1084, $1085, $1086, $1087, $1088, $1089, $1090, $1091, $1092, $1093, $1094, $1095, $1096, $1097, $1098, $1099, $1100, $1101, $1102, $1103, $1104, $1105, $1106, $1107, $1108, $1109, $1110, $1111, $1112, $1113, $1114, $1115, $1116, $1117, $1118, $1119, $1120, $1121, $1122, $1123, $1124, $1125, $1126, $1127, $1128, $1129, $1130, $1131, $1132, $1133, $1134, $1135, $1136, $1137, $1138, $1139, $1140, $1141, $1142, $1143, $1144, $1145, $1146, $1147, $1148, $1149, $1150, $1151, $1152, $1153, $1154, $1155, $1156, $1157, $1158, $1159, $1160, $1161, $1162, $1163, $1164, $1165, $1166, $1167, $1168, $1169, $1170, $1171, $1172, $1173, $1174, $1175, $1176, $1177, $1178, $1179, $1180, $1181, $1182, $1183, $1184, $1185, $1186, $1187, $1188, $1189, $1190, $1191, $1192, $1193, $1194, $1195, $1196, $1197, $1198, $1199, $1200, $1201, $1202, $1203, $1204, $1205, $1206, $1207, $1208, $1209, $1210, $1211, $1212, $1213, $1214, $1215, $1216, $1217, $1218, $1219, $1220, $1221, $1222, $1223, $1224, $1225, $1226, $1227, $1228, $1229, $1230, $1231, $1232, $1233, $1234, $1235, $1236, $1237, $1238, $1239, $1240, $1241, $1242, $1243, $1244, $1245, $1246, $1247, $1248, $1249, $1250, $1251, $1252, $1253, $1254, $1255, $1256, $1257, $1258, $1259, $1260, $1261, $1262, $1263, $1264, $1265, $1266, $1267, $1268, $1269, $1270, $1271, $1272, $1273, $1274, $1275, $1276, $1277, $1278, $1279, $1280, $1281, $1282, $1283, $1284, $1285, $1286, $1287, $1288, $1289, $1290, $1291, $1292, $1293, $1294, $1295, $1296, $1297, $1298, $1299, $1300, $1301, $1302, $1303, $1304, $1305, $1306, $1307, $1308, $1309, $1310, $1311, $1312, $1313, $1314, $1315, $1316, $1317, $1318, $1319, $1320, $1321, $1322, $1323, $1324, $1325, $1326, $1327, $1328, $1329, $1330, $1331, $1332, $1333, $1334, $1335, $1336, $1337, $1338, $1339, $1340, $1341, $1342, $1343, $1344, $1345, $1346, $1347, $1348, $1349, $1350, $1351, $1352, $1353, $1354, $1355, $1356, $1357, $1358, $1359, $1360, $1361, $1362, $1363, $1364, $1365, $1366, $1367, $1368, $1369, $1370, $1371, $1372, $1373, $1374, $1375, $1376, $1377, $1378, $1379, $1380, $1381, $1382, $1383, $1384, $1385, $1386, $1387, $1388, $1389, $1390, $1391, $1392, $1393, $1394, $1395, $1396, $1397, $1398, $1399, $1400, $1401, $1402, $1403, $1404, $1405, $1406, $1407, $1408, $1409, $1410, $1411, $1412, $1413, $1414, $1415, $1416, $1417, $1418, $1419, $1420, $1421, $1422, $1423, $1424, $1425, $1426, $1427, $1428, $1429, $1430, $1431, $1432, $1433, $1434, $1435, $1436, $1437, $1438, $1439, $1440, $1441, $1442, $1443, $1444, $1445, $1446, $1447, $1448, $1449, $1450, $1451, $1452, $1453, $1454, $1455, $1456, $1457, $1458, $1459, $1460, $1461, $1462, $1463, $1464, $1465, $1466, $1467, $1468, $1469, $1470, $1471, $1472, $1473, $1474, $1475, $1476, $1477, $1478, $1479, $1480, $1481, $1482, $1483, $1484, $1485, $1486, $1487, $1488, $1489, $1490, $1491, $1492, $1493, $1494, $1495, $1496, $1497, $1498, $1499, $1500, $1501, $1502, $1503, $1504, $1505, $1506, $1507, $1508, $1509, $1510, $1511, $1512, $1513, $1514, $1515, $1516, $1517, $1518, $1519, $1520, $1521, $1522, $1523, $1524, $1525, $1526, $1527, $1528, $1529, $1530, $1531, $1532, $1533, $1534, $1535, $1536, $1537, $1538, $1539, $1540, $1541, $1542, $1543, $1544, $1545, $1546, $1547, $1548, $1549, $1550, $1551, $1552, $1553, $1554, $1555, $1556, $1557, $1558, $1559, $1560, $1561, $1562, $1563, $1564, $1565, $1566, $1567, $1568, $1569, $1570, $1571, $1572, $1573, $1574, $1575, $1576, $1577, $1578, $1579, $1580, $1581, $1582, $1583, $1584, $1585, $1586, $1587, $1588, $1589, $1590, $1591, $1592, $1593, $1594, $1595, $1596, $1597, $1598, $1599, $1600, $1601, $1602, $1603, $1604, $1605, $1606, $1607, $1608, $1609, $1610, $1611, $1612, $1613, $1614, $1615, $1616, $1617, $1618, $1619, $1620, $1621, $1622, $1623, $1624, $1625, $1626, $1627, $1628, $1629, $1630, $1631, $1632, $1633, $1634, $1635, $1636, $1637, $1638, $1639, $1640, $1641, $1642, $1643, $1644, $1645, $1646, $1647, $1648, $1649, $1650, $1651, $1652, $1653, $1654, $1655, $1656, $1657, $1658, $1659, $1660, $1661, $1662, $1663, $1664, $1665, $1666, $1667, $1668, $1669, $1670, $1671, $1672, $1673, $1674, $1675, $1676, $1677, $1678, $1679, $1680, $1681, $1682, $1683, $1684, $1685, $1686, $1687, $1688, $1689, $1690, $1691, $1692, $1693, $1694, $1695, $1696, $1697, $1698, $1699, $1700, $1701, $1702, $1703, $1704, $1705, $1706, $1707, $1708, $1709, $1710, $1711, $1712, $1713, $1714, $1715, $1716, $1717, $1718, $1719, $1720, $1721, $1722, $1723, $1724, $1725, $1726, $1727, $1728, $1729, $1730, $1731, $1732, $1733, $1734, $1735, $1736, $1737, $1738, $1739, $1740, $1741, $1742, $1743, $1744, $1745, $1746, $1747, $1748, $1749, $1750, $1751, $1752, $1753, $1754, $1755, $1756, $1757, $1758, $1759, $1760, $1761, $1762, $1763, $1764, $1765, $1766, $1767, $1768, $1769, $1770, $1771, $1772, $1773, $1774, $1775, $1776, $1777, $1778, $1779, $1780, $1781, $1782, $1783, $1784, $1785, $1786, $1787, $1788, $1789, $1790, $1791, $1792, $1793, $1794, $1795, $1796, $1797, $1798, $1799, $1800, $1801, $1802, $1803, $1804, $1805, $1806, $1807, $1808, $1809, $1810, $1811, $1812, $1813, $1814, $1815, $1816, $1817, $1818, $1819, $1820, $1821, $1822, $1823, $1824, $1825, $1826, $1827, $1828, $1829, $1830, $1831, $1832, $1833, $1834, $1835, $1836, $1837, $1838, $1839, $1840, $1841, $1842, $1843, $1844, $1845, $1846, $1847, $1848, $1849, $1850, $1851, $1852, $1853, $1854, $1855, $1856, $1857, $1858, $1859, $1860, $1861, $1862, $1863, $1864, $1865, $1866, $1867, $1868, $1869, $1870, $1871, $1872, $1873, $1874, $1875, $1876, $1877, $1878, $1879, $1880, $1881, $1882, $1883, $1884, $1885, $1886, $1887, $1888, $1889, $1890, $1891, $1892, $1893, $1894, $1895, $1896, $1897, $1898, $1899, $1900, $1901, $1902, $1903, $1904, $1905, $1906, $1907, $1908, $1909, $1910, $1911, $1912, $1913, $1914, $1915, $1916, $1917, $1918, $1919, $1920, $1921, $1922, $1923, $1924, $1925, $1926, $1927, $1928, $1929, $1930, $1931, $1932, $1933, $1934, $1935, $1936, $1937, $1938, $1939, $1940, $1941, $1942, $1943, $1944, $1945, $1946, $1947, $1948, $1949, $1950, $1951, $1952, $1953, $1954, $1955, $1956, $1957, $1958, $1959, $1960, $1961, $1962, $1963, $1964, $1965, $1966, $1967, $1968, $1969, $1970, $1971, $1972, $1973, $1974, $1975, $1976, $1977, $1978, $1979, $1980, $1981, $1982, $1983, $1984, $1985, $1986, $1987, $1988, $1989, $1990, $1991, $1992, $1993, $1994, $1995, $1996, $1997, $1998, $1999, $2000, $2001, $2002, $2003)
AND (table1alias1.col3 = $2004
OR table1alias1.col3 IS NULL)
AND (table1alias1.datatype IN (CAST('TypeA' AS datatype_enum)))
AND table1alias1.col20 IN ($2005))
ORDER BY
table1alias1.col16 ASC) AS subquery
ORDER BY
subquery.order_by ASC

Tables:
\d table1
Table "table1"
Column | Type | Collation | Nullable | Default
-----------------------------------+-----------------------------+-----------+----------+---------------------------------
id | bigint | | not null |
col2 | bigint | | |
col3 | boolean | | |
col4 | timestamp without time zone | | |
col5 | timestamp without time zone | | |
col6 | timestamp without time zone | | |
col7 | timestamp without time zone | | |
col8 | character varying(1000) | | |
col9 | character varying(1000) | | |
col10 | character varying(1000) | | |
col11 | character varying(1000) | | |
col12 | character varying(1000) | | |
col13 | character varying(1000) | | |
col14 | character varying(1000) | | |
col15 | character varying(1000) | | |
col16 | bigint | | |
col17 | bigint | | |
col18 | bigint | | |
col19 | bigint | | |
col20 | bigint | | |
col21 | character varying(255) | | |
uniqueid | character varying(255) | | |
col23 | timestamp without time zone | | |
col24 | boolean | | |
col25 | boolean | | |
col26 | character varying(255) | | |
col27 | timestamp without time zone | | |
col28 | timestamp without time zone | | |
col29 | bigint | | |
col30 | integer | | not null | 0
col31 | character varying(255) | | |
col32 | boolean | | |
col33 | boolean | | |
col34 | boolean | | |
col35 | boolean | | |
col36 | character varying(1000) | | |
col37 | character varying(1000) | | |
col38 | boolean | | |
col39 | character varying(1000) | | |
col40 | character varying(1000) | | |
col41 | character varying(1000) | | |
col42 | character varying(255) | | |
col43 | character varying(1000) | | |
col44 | other_enum | | not null |
datatype | datatype_enum | | not null |
col46 | bigint | | |
col47 | text | | |
col48 | bytea | | not null |
col49 | bytea | | |
col50 | bigint | | |
col51 | bigint | | |
col52 | bigint | | |
col53 | bigint | | |
col54 | bigint | | |
Indexes:
"table1_pkey" PRIMARY KEY, btree (id)
"table1_unique_col47_for_col46" UNIQUE CONSTRAINT, btree (col47, col46)
"table1_col20_datatype_idx" btree (col20, datatype)
"table1_col2_datatype_col20_idx" btree (col2, datatype, col20)
"table1_col42_idx" btree (col42)
"table1_col28_idx" btree (col28)
"table1_col52_idx" btree (col52)
"table1_col50_idx" btree (col50)
"table1_col12_idx" btree (col12)
"table1_col37_idx" btree (col37) WHERE col37 IS NOT NULL
"table1_col40_notnull_idx" btree (col40) WHERE col40 IS NOT NULL
"table1_typea_idx" btree (col20, (col8::text)) WHERE datatype = 'TypeA'::table1_type
"table1_typea_include_uniqueid_col16_idx" btree (col20, col8, deleted) INCLUDE (uniqueid, col16, id) WHERE datatype = 'TypeA'::table1_type
"table1_typea_idx" btree (col20, (col8::text)) WHERE datatype = 'TypeA'::table1_type INVALID
"table1_uniqueid_idx" btree (uniqueid)
Check constraints:
"table1_col32_not_null" CHECK (col32 IS NOT NULL)
"table1_col34_not_null" CHECK (col34 IS NOT NULL)
"table1_col33_not_null" CHECK (col33 IS NOT NULL)
"table1_col35_not_null" CHECK (col35 IS NOT NULL) NOT VALID
"col49_or_col46" CHECK (col49 IS NOT NULL OR col46 IS NOT NULL) NOT VALID
Foreign-key constraints:
"fk_table1_col20" FOREIGN KEY (col20) REFERENCES constainttable4(id)
"fk_table1_col29" FOREIGN KEY (col29) REFERENCES constainttable5(id)
"table1_col46_fkey" FOREIGN KEY (col46) REFERENCES constainttable6(id)
Referenced by:
TABLE "referencetable2" CONSTRAINT "fk_table1_id" FOREIGN KEY (table1_id) REFERENCES table1(id)
TABLE "referencetable3" CONSTRAINT "referencetable3_table1_id_fkey" FOREIGN KEY (table1_id) REFERENCES table1(id)
TABLE "referencetable4" CONSTRAINT "referencetable4_table1_fkey" FOREIGN KEY (table1_id) REFERENCES table1(id)
Publications:
"puba"

\d view1
View "view1"
Column | Type | Collation | Nullable | Default
-----------------------------------------+------------------------+-----------+----------+---------
col1 | text | | |
col2 | bigint | | |
col3 | bytea | | |
col4 | integer | | |
col5 | character varying(255) | | |
col6 | bytea | | |
col7 | character varying(255) | | |
col8 | bigint | | |
col9 | bigint | | |
col10 | bytea | | |
col11 | integer | | |
col12 | character varying | | |
col13 | bytea | | |
col14 | character varying | | |

\d table2
Table "table2"
Column | Type | Collation | Nullable | Default
-----------------------+-----------------------------+-----------+----------+-------------------------------------------
id | bigint | | not null | nextval('table2_id_seq'::regclass)
uniqueid | character varying(255) | | not null |
col3 | timestamp without time zone | | not null |
col4 | boolean | | not null |
col5 | boolean | | not null |
col6 | boolean | | not null |
col7 | boolean | | not null |
col8 | bigint | | |
col9 | bigint | | |
col10 | bigint | | |
col11 | character varying(255) | | |
col12 | character varying(255) | | not null |
col13 | boolean | | not null | true
Indexes:
"id" PRIMARY KEY, btree (id)
"idx_table2_uniqueid" btree (uniqueid)
Foreign-key constraints:
"fk_table2_source_constainttable1_id" FOREIGN KEY (source_constainttable1_id) REFERENCES constainttable1(id)
"fk_table2_source_constainttable2_id" FOREIGN KEY (source_constainttable2_id) REFERENCES constainttable2(id)
"fk_table2_source_constainttable3_id" FOREIGN KEY (source_constainttable3_id) REFERENCES constainttable3(id)
Referenced by:
TABLE "referencetable1" CONSTRAINT "fk_referencetable1_table2_id" FOREIGN KEY (table2_id) REFERENCES table2(id)
TABLE "referencetable1" CONSTRAINT "fk_referencetable1_table2_id_old" FOREIGN KEY (table2_id_old) REFERENCES table2(id)

\d app.table1_typea_include_uniqueid_col16_idx
Index "table1_typea_include_uniqueid_col16_idx"
Column | Type | Key? | Definition
------------+-------------------------+------+------------
col20 | bigint | yes | account_id
col8 | character varying(1000) | yes | string01
col3 | boolean | yes | deleted
col26 | character varying(255) | no | uniqueid
col16 | bigint | no | long01
id | bigint | no | id
btree, for table "table1", predicate (datatype = 'TypeA'::table1_type)

PostgreSQL version number you are running:
SELECT version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 (Ubuntu 11.6-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
(1 row)

How you installed PostgreSQL:

Official apt repo

Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all.
SELECT name, current_setting(name), SOURCE
FROM pg_settings
WHERE SOURCE NOT IN ('default', 'override');
name | current_setting | source
---------------------------------+---------------------------------------+----------------------
application_name | psql | client
archive_command | true | configuration file
archive_mode | on | configuration file
auto_explain.log_analyze | on | configuration file
auto_explain.log_buffers | on | configuration file
auto_explain.log_min_duration | 10s | configuration file
auto_explain.log_timing | on | configuration file
autovacuum | on | configuration file
checkpoint_timeout | 30min | configuration file
client_encoding | UTF8 | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
dynamic_shared_memory_type | posix | configuration file
effective_cache_size | 8GB | configuration file
effective_io_concurrency | 100 | configuration file
external_pid_file | /var/run/postgresql/11-main.pid | command line
hot_standby | on | configuration file
lc_messages | en_US.utf8 | configuration file
lc_monetary | en_US.utf8 | configuration file
lc_numeric | en_US.utf8 | configuration file
lc_time | en_US.utf8 | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 0 | configuration file
log_checkpoints | on | configuration file
log_connections | on | configuration file
log_destination | syslog | configuration file
log_directory | pg_log | configuration file
log_disconnections | on | configuration file
log_filename | postgresql-%a.log | configuration file

Operating system and version:
5.0.0-1029-gcp #30~18.04.1-Ubuntu SMP Mon Jan 13 05:40:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

What program you're using to connect to PostgreSQL:
Java JDBC 4.2 (JRE 8+) driver for PostgreSQL database

Is there anything relevant or unusual in the PostgreSQL server logs?:
nothing in the logs, but I've done a bunch of pg_locks/stat_activity dumps whilst the query is running. pg_locks/activity show:
- wait_event_type/wait_event NULL each time for this query
- granted = true for all modes for this query
- an autovacuum was running, but for an unrelated table.

Extra perf details:
CPU: 16 vCPUs on GCP (Intel Xeon E5 v4)
RAM: 60GB
Disk: 3TB persistent SSD on GCP

sudo time dd if=/dev/sdc of=/dev/null bs=1M count=1k skip=$((128*RANDOM/32)):
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.855067 s, 1.3 GB/s
0.00user 0.45system 0:00.85elapsed 53%CPU (0avgtext+0avgdata 3236maxresident)k
2097816inputs+0outputs (1major+348minor)pagefaults 0swaps

Estimated performance (GCP quoted)
Operation type Read Write
Sustained random IOPS limit 25,000.00 25,000.00
Sustained throughput limit (MB/s) 1,200.00 800.00

Re: Please help! Query jumps from 1s -> 4m

From

Justin Pryzby

Date:

03 May 2020, 15:38:26

On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:
> Hi,
> 
> Hoping someone can help with this performance issue that's been driving a
> few of us crazy :-) Any guidance greatly appreciated.
> 
> A description of what you are trying to achieve and what results you
> expect.:
>  - I'd like to get an understanding of why the following query (presented
> in full, but there are specific parts that are confusing me) starts off
> taking ~second in duration but 'switches' to taking over 4 minutes.

Does it "switch" abruptly or do you get progressively slower queries ?
If it's abrupt following the 5th execution, I guess you're hitting this:

https://www.postgresql.org/message-id/Pine.BSO.4.64.0802131404090.6785@leary.csoft.net
https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B50FB8D5E@ntex2010i.host.magwien.gv.at

>  - we initially saw this behaviour for the exact same sql with a different
> index that resulted in an index scan. To try and fix the issue we've
> created an additional index with additional included fields so we now have
> Index Only Scans, but are still seeing the same problem.

>  Segments of interest:
>  1. ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
> time=110.539..123828.134 rows=67000 loops=1)
>         Index Cond: (col20 = $2005)
>         Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY
> ((ARRAY[$1004, ..., $2003])::text[])))
>         Rows Removed by Filter: 2662652
>         Heap Fetches: 6940
>         Buffers: shared hit=46619 read=42784 written=52

> If I run the same queries now:
> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias1  (cost=0.56..2549.69 rows=69 width=36)
> (actual time=1.017..1221.375 rows=67000 loops=1)
> Heap Fetches: 24
> Buffers: shared hit=2849 read=2483

It looks to me like you're getting good performance following a vacuum, when
Heap Fetches is low.  So you'd want to run vacuum more often, like:
| ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).

But maybe I've missed something - you showed the bad query plan, but not the
good one, and I wonder if they may be subtly different, and that's maybe masked
by the replaced identifiers.

-- 
Justin

Re: Please help! Query jumps from 1s -> 4m

From

Justin Pryzby

Date:

04 May 2020, 19:12:01

On Mon, May 04, 2020 at 08:07:07PM +0100, Jamie Thompson wrote:
> Additionally, the execution plans for the 10th + following queries look
> fine, they have the same structure as if I run the query manually. It's not
> that the query plan switches, it seems as though the same query plan is
> just > 200X slower than usual.

Are you able to reproduce the problem manually ?

With/without PREPARE ?
https://www.postgresql.org/docs/current/sql-prepare.html

-- 
Justin

Re: Please help! Query jumps from 1s -> 4m

From

James Thompson

Date:

04 May 2020, 19:12:06

The change is abrupt, on the 10th execution (but I hadn't spotted it was always after the same number of executions until your suggestion - thanks for pointing me in that direction).

I don't see any custom configuration on our end that changes the threshold for this from 5->10. Debugging the query call I also see that PgConnection has the prepareThreshold set to 5.

Additionally, the execution plans for the 10th + following queries look fine, they have the same structure as if I run the query manually. It's not that the query plan switches, it seems as though the same query plan is just > 200X slower than usual.

As for the heap fetches -> as far as I can tell, on both occasions the fetches are relatively low and shouldn't account for minutes of execution (even if one is lower than the other). Looking through one days logs I do find cases with lower heap fetches too, for example as below which has 1977 fetches instead of the previous 6940 but took approx the same time:
-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1 (cost=0.56..17.25 rows=1 width=60) (actual time=56.858..120893.874 rows=67000 loops=1)
Index Cond: (col20 = $2005)
Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY ((ARRAY[$1004, ..., $2003])::text[])))
Rows Removed by Filter: 2662793
Heap Fetches: 1977
Buffers: shared hit=84574 read=3522

Would you agree the statement threshold / heap fetches seems unlikely to be causing this? Any other thoughts?

Thanks!

On Sun, 3 May 2020 at 16:38, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:
> Hi,
>
> Hoping someone can help with this performance issue that's been driving a
> few of us crazy :-) Any guidance greatly appreciated.
>
> A description of what you are trying to achieve and what results you
> expect.:
> - I'd like to get an understanding of why the following query (presented
> in full, but there are specific parts that are confusing me) starts off
> taking ~second in duration but 'switches' to taking over 4 minutes.

Does it "switch" abruptly or do you get progressively slower queries ?
If it's abrupt following the 5th execution, I guess you're hitting this:

https://www.postgresql.org/message-id/Pine.BSO.4.64.0802131404090.6785@leary.csoft.net
https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B50FB8D5E@ntex2010i.host.magwien.gv.at

> - we initially saw this behaviour for the exact same sql with a different
> index that resulted in an index scan. To try and fix the issue we've
> created an additional index with additional included fields so we now have
> Index Only Scans, but are still seeing the same problem.

> Segments of interest:
> 1. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1 table1alias1 (cost=0.56..17.25 rows=1 width=60) (actual
> time=110.539..123828.134 rows=67000 loops=1)
> Index Cond: (col20 = $2005)
> Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY
> ((ARRAY[$1004, ..., $2003])::text[])))
> Rows Removed by Filter: 2662652
> Heap Fetches: 6940
> Buffers: shared hit=46619 read=42784 written=52

> If I run the same queries now:
> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias1 (cost=0.56..2549.69 rows=69 width=36)
> (actual time=1.017..1221.375 rows=67000 loops=1)
> Heap Fetches: 24
> Buffers: shared hit=2849 read=2483

It looks to me like you're getting good performance following a vacuum, when
Heap Fetches is low. So you'd want to run vacuum more often, like:
| ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).

But maybe I've missed something - you showed the bad query plan, but not the
good one, and I wonder if they may be subtly different, and that's maybe masked
by the replaced identifiers.

--
Justin

Re: Please help! Query jumps from 1s -> 4m

From

David Rowley

Date:

04 May 2020, 20:12:51

On Mon, 4 May 2020 at 02:35, James Thompson <james@jthompson.dev> wrote:
> buffers do look different - but still, reading 42k doesn't seem like it would cause a delay of 4m?

You could do: SET track_io_timing TO on;

then: EXPLAIN (ANALYZE, BUFFERS) your query and see if the time is
spent doing IO.

David

Re: Please help! Query jumps from 1s -> 4m

From

Laurenz Albe

Date:

05 May 2020, 08:08:20

On Mon, 2020-05-04 at 20:12 +0100, James Thompson wrote:
> The change is abrupt, on the 10th execution (but I hadn't spotted it was always after the
> same number of executions until your suggestion - thanks for pointing me in that direction).
> 
> I don't see any custom configuration on our end that changes the threshold for this from 5->10.
> Debugging the query call I also see that PgConnection has the prepareThreshold set to 5.
> 
> Additionally, the execution plans for the 10th + following queries look fine, they have the
> same structure as if I run the query manually. It's not that the query plan switches,
> it seems as though the same query plan is just > 200X slower than usual.
> 
> As for the heap fetches -> as far as I can tell, on both occasions the fetches are relatively
> low and shouldn't account for minutes of execution (even if one is lower than the other).
> Looking through one days logs I do find cases with lower heap fetches too, for example as
> below which has 1977 fetches instead of the previous 6940 but took approx the same time:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1  (cost=0.56..17.25 rows=1
width=60)(actual time=56.858..120893.874 rows=67000 loops=1)
 
>         Index Cond: (col20 = $2005)
>         Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY ((ARRAY[$1004, ..., $2003])::text[])))
>         Rows Removed by Filter: 2662793
>         Heap Fetches: 1977
>         Buffers: shared hit=84574 read=3522
> 
> Would you agree the statement threshold / heap fetches seems unlikely to be causing this? Any other thoughts?

It does sound suspiciously like custom plans vs. generic plan.

If you are using JDBC, then the cut-off of 10 would make sense:
the JDBC driver uses (server) prepared statements only after the
fifth execution, and the prepared statement will use a generic plan
only after the fifth execution.

It would be good to see the execution plan from the third, seventh
and thirteenth execution.  You could use "auto_explain" for that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Please help! Query jumps from 1s -> 4m

From

Justin Pryzby

Date:

05 May 2020, 19:28:36

On Mon, May 04, 2020 at 02:12:01PM -0500, Justin Pryzby wrote:
> On Mon, May 04, 2020 at 08:07:07PM +0100, Jamie Thompson wrote:
> > Additionally, the execution plans for the 10th + following queries look
> > fine, they have the same structure as if I run the query manually. It's not
> > that the query plan switches, it seems as though the same query plan is
> > just > 200X slower than usual.
> 
> Are you able to reproduce the problem manually ?
> 
> With/without PREPARE ?
> https://www.postgresql.org/docs/current/sql-prepare.html

Also, you should be able to check if that's the problem by doing either:
plan_cache_mode = force_generic_plan;
Or (I would think) DISCARD PLANS;

https://www.postgresql.org/docs/12/runtime-config-query.html

-- 
Justin

Re: Please help! Query jumps from 1s -> 4m

From

James Thompson

Date:

05 May 2020, 21:10:18

I've managed to replicate this now with prepared statements. Thanks for all the guidance so far.

The slowness occurs when the prepared statement changes to a generic plan.

Initial plan:

-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..2549.70 rows=70 width=36) (actual time=1.901..45.256 rows=65000 loops=1)
Output: table1alias2.uniqueid
Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,... 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
Heap Fetches: 10
Buffers: shared hit=5048

after 5 executions of the statement:

-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual time=125.344..126877.822 rows=65000 loops=1)
Output: table1alias2.uniqueid
Index Cond: (table1alias2.col20 = $1001)
Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ..., $1000])::text[]))
Rows Removed by Filter: 2670023
Heap Fetches: 428
Buffers: shared hit=45933 read=42060 dirtied=4

The second plan looks worse to me as it's applying a filter rather than using an index condition? I don't understand why it's not part of the condition and also why this is so much slower though.

If I force a retrieval of all index rows for col20 = '12345' using an ad-hoc query (below, which in my mind is what the 'bad' plan is doing), that only takes 2s (2.7 mil rows). Where's the difference?

EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1 WHERE datatype='TypeA' AND col20 = 12345;

-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 (cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645 rows=2735023 loops=1)
Index Cond: (col20 = 12345)
Heap Fetches: 417
Buffers: shared hit=43843 read=44147 dirtied=8

>You could do: SET track_io_timing TO on;

I've not tried this yet, and haven't used it before - sounds like there's some risks associated with running it on a production server / clocks going backwards?

>Also, you should be able to check if that's the problem by doing either:
>plan_cache_mode = force_generic_plan;
>Or (I would think) DISCARD PLANS;

I think plan_cache_mode is just for pg12+ unfortunately? We're on 11 currently.

Just tested DISCARD PLANS locally, it didn't switch back from the generic plan. Was that your expectation?

Re: Please help! Query jumps from 1s -> 4m

From

Justin Pryzby

Date:

05 May 2020, 21:35:54

On Tue, May 05, 2020 at 10:10:18PM +0100, James Thompson wrote:
> I've managed to replicate this now with prepared statements. Thanks for all
> the guidance so far.
> 
> The slowness occurs when the prepared statement changes to a generic plan.
> 
> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..2549.70 rows=70
width=36)(actual time=1.901..45.256 rows=65000 loops=1)
 
>     Output: table1alias2.uniqueid
>     Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.col8 = ANY
('{c5986b02-3a02-4639-8147-f286972413ba,...98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))

The rowcount is off by a factor of 1000x.

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..17.23 rows=1
width=36)(actual time=125.344..126877.822 rows=65000 loops=1)
 
>     Output: table1alias2.uniqueid
>     Index Cond: (table1alias2.col20 = $1001)
>     Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,$1000])::text[]))
>     Rows Removed by Filter: 2670023

And the generic plan is cheaper than the previous, custom plan; but slower,
undoubtedly due to rowcount mis-estimate.

> The second plan looks worse to me as it's applying a filter rather than
> using an index condition? I don't understand why it's not part of the
> condition and also why this is so much slower though.
> If I force a retrieval of all index rows for col20 = '12345' using an
> ad-hoc query (below, which in my mind is what the 'bad' plan is doing),
> that only takes 2s (2.7 mil rows). Where's the difference?
> 
> EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1
> WHERE datatype='TypeA' AND col20 = 12345;
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> (cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645 rows=2735023 loops=1)

I see you're querying on datetype, which I think isn't in the original query
(but I'm not sure if it's just renamed?).

Underestimate usually means that the conditions are redundant or correlated.
You could mitigate that either by creating an index on both columns (or add
datatype to the existing index), or CREATE STATISTICS ndistinct on those
columns.  Or maybe you just need to increase the stats target for col20 (?).
Then re-analyze.

-- 
Justin

Re: Please help! Query jumps from 1s -> 4m

From

Tom Lane

Date:

05 May 2020, 21:42:48

James Thompson <james@jthompson.dev> writes:
> The slowness occurs when the prepared statement changes to a generic plan.

> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
> time=1.901..45.256 rows=65000 loops=1)
>     Output: table1alias2.uniqueid
>     Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
> col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
>     Heap Fetches: 10
>     Buffers: shared hit=5048

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
> time=125.344..126877.822 rows=65000 loops=1)
>     Output: table1alias2.uniqueid
>     Index Cond: (table1alias2.col20 = $1001)
>     Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> $1000])::text[]))
>     Rows Removed by Filter: 2670023
>     Heap Fetches: 428
>     Buffers: shared hit=45933 read=42060 dirtied=4

Yeah, this is a dynamic we've seen before.  The rowcount estimate, and
hence the cost estimate, for the plan with explicit parameter values is
way off; but the estimate for the generic plan is even more way off,
causing the system to falsely decide that the latter is cheaper.

I've speculated about refusing to believe generic cost estimates if they are
more than epsilon less than the concrete cost estimate, but it's not quite
clear how that should work or whether it'd have its own failure modes.

The one thing that is totally clear is that these rowcount estimates are
crappy.  Can you improve them by increasing the stats target for that
table?  Maybe with less-garbage-y inputs, the system would make the right
plan choice here.

            regards, tom lane

Re: Please help! Query jumps from 1s -> 4m

From

James Thompson

Date:

13 May 2020, 14:17:03

Just to follow up on this...

Tried increasing stats targets last week + re-analyzing but the query was just as bad.

Ended up increasing the prepareThreshold to prevent server-side prepares for now (and thus later generic statements). This 'fixed' the issue and had no noticeable negative effect for our workloads.

I still don't understand why the plan being off makes the query so much slower in this case (the plans I shared in the last email don't look too different, I don't understand how the filter can add on 2mins of execution time to an index-only scan). If anyone does have thoughts on what could be happening I would be very interested to hear, but the main performance problem is effectively solved.

Thanks all for the valuable help getting to the bottom of what was happening.

On Tue, 5 May 2020 at 22:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Thompson <james@jthompson.dev> writes:
> The slowness occurs when the prepared statement changes to a generic plan.

> Initial plan:
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2 (cost=0.56..2549.70 rows=70 width=36) (actual
> time=1.901..45.256 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
> col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> Heap Fetches: 10
> Buffers: shared hit=5048

> after 5 executions of the statement:
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual
> time=125.344..126877.822 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: (table1alias2.col20 = $1001)
> Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> $1000])::text[]))
> Rows Removed by Filter: 2670023
> Heap Fetches: 428
> Buffers: shared hit=45933 read=42060 dirtied=4

Yeah, this is a dynamic we've seen before. The rowcount estimate, and
hence the cost estimate, for the plan with explicit parameter values is
way off; but the estimate for the generic plan is even more way off,
causing the system to falsely decide that the latter is cheaper.

I've speculated about refusing to believe generic cost estimates if they are
more than epsilon less than the concrete cost estimate, but it's not quite
clear how that should work or whether it'd have its own failure modes.

The one thing that is totally clear is that these rowcount estimates are
crappy. Can you improve them by increasing the stats target for that
table? Maybe with less-garbage-y inputs, the system would make the right
plan choice here.

regards, tom lane