Thread: CTE with JOIN of two tables is much faster than a regular query
Running PostgreSQL 9.5 on Windows.
The CTE mentioned below completes the query in 4.5 seconds while the regular query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query starts with a full table scan over “Doc” while the CTE joins the two tables first and applies the filter condition in the 2nd step.
I believe that some rows in “Doc” which are not referenced by “F” contain a large amount of data in the field “szText” and this will slow down the ILIKE operator.
What can I do to improve the performance of the regular query without using a CTE?
This is a much simplified extract from a larger application:
CREATE TABLE Doc (
oID UUID NOT NULL PRIMARY KEY,
uDocID UUID NOT NULL UNIQUE,
szText TEXT
);
CREATE TABLE F (
oID UUID NOT NULL PRIMARY KEY,
uDocRef UUID,
CONSTRAINT F_fkey1 FOREIGN KEY (uDocRef) REFERENCES Doc (uDocID)
);
-- just in case …
ALTER TABLE Doc ALTER uDocID SET STATISTICS 10000;
ALTER TABLE Doc ALTER szText SET STATISTICS 10000;
VACUUM ANALYSE Doc;
SELECT COUNT(*) FROM Doc;
=> 125946 records
ALTER TABLE F ALTER uDocRef SET STATISTICS 10000;
VACUUM ANALYSE F;
SELECT COUNT(*) FROM F;
=> 32605 records
Result with CTE:
EXPLAIN ANALYSE
WITH a AS (
SELECT F.oID, Doc.szText
FROM F
JOIN Doc ON F.uDocRef = Doc.udocid
)
SELECT *
FROM a
WHERE szText ILIKE '%480GB%';
"CTE Scan on a (cost=9463.42..10197.03 rows=52 width=48) (actual time=478.770..4551.613 rows=10 loops=1)"
" Filter: (sztext ~~* '%480GB%'::text)"
" Rows Removed by Filter: 32595"
" CTE a"
" -> Hash Join (cost=973.61..9463.42 rows=32605 width=359) (actual time=36.998..100.337 rows=32605 loops=1)"
" Hash Cond: (doc.udocid = f.udocref)"
" -> Seq Scan on doc (cost=0.00..7691.46 rows=125946 width=359) (actual time=0.008..18.269 rows=125946 loops=1)"
" -> Hash (cost=566.05..566.05 rows=32605 width=32) (actual time=35.825..35.825 rows=32605 loops=1)"
" Buckets: 32768 Batches: 1 Memory Usage: 2294kB"
" -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual time=0.005..14.677 rows=32605 loops=1)"
"Planning time: 4.689 ms"
"Execution time: 4554.893 ms"
Result with regular query:
EXPLAIN ANALYSE
SELECT F.oID, Doc.szText
FROM F
JOIN Doc ON F.uDocRef = Doc.udocid
WHERE szText ILIKE '%480GB%';
"Hash Join (cost=8006.56..8694.93 rows=5 width=359) (actual time=66500.415..66506.978 rows=10 loops=1)"
" Hash Cond: (f.udocref = doc.udocid)"
" -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual time=0.002..3.143 rows=32605 loops=1)"
" -> Hash (cost=8006.32..8006.32 rows=19 width=359) (actual time=66500.023..66500.023 rows=16 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 19kB"
" -> Seq Scan on doc (cost=0.00..8006.32 rows=19 width=359) (actual time=8864.720..66499.991 rows=16 loops=1)"
" Filter: (sztext ~~* '%480GB%'::text)"
" Rows Removed by Filter: 125930"
"Planning time: 263.542 ms"
"Execution time: 66507.003 ms"
Am 18.08.2018 um 11:36 schrieb kpi6288@gmail.com: > What can I do to improve the performance of the regular query without > using a CTE? try to rewrite it to a subselect: select ... from ... join (selec ... from ... where ...) x on ... Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
> -----Ursprüngliche Nachricht----- > Von: Andreas Kretschmer <andreas@a-kretschmer.de> > Gesendet: Samstag, 18. August 2018 12:27 > Am 18.08.2018 um 11:36 schrieb kpi6288@gmail.com: > > What can I do to improve the performance of the regular query without > > using a CTE? > > try to rewrite it to a subselect: > > select ... from ... join (selec ... from ... where ...) x on ... > Do mean like this? EXPLAIN ANALYSE SELECT F.oID, D.szText FROM F JOIN (SELECT Doc.uDocID, Doc.szText FROM Doc WHERE szText ILIKE '%480GB%') AS D ON D.uDocID = F.uDocRef; Just as bad as my regular query: "Hash Join (cost=8006.56..8694.93 rows=5 width=359) (actual time=66777.898..66784.630 rows=10 loops=1)" " Hash Cond: (f.udocref = doc.udocid)" " -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual time=0.002..3.563 rows=32605 loops=1)" " -> Hash (cost=8006.32..8006.32 rows=19 width=359) (actual time=66777.471..66777.471 rows=16 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 19kB" " -> Seq Scan on doc (cost=0.00..8006.32 rows=19 width=359) (actual time=9013.317..66777.438 rows=16 loops=1)" " Filter: (sztext ~~* '%480GB%'::text)" " Rows Removed by Filter: 125930" "Planning time: 236.354 ms" "Execution time: 66784.651 ms"
On 08/18/2018 04:08 AM, kpi6288@gmail.com wrote: > > >> -----Ursprüngliche Nachricht----- >> Von: Andreas Kretschmer <andreas@a-kretschmer.de> >> Gesendet: Samstag, 18. August 2018 12:27 > >> Am 18.08.2018 um 11:36 schrieb kpi6288@gmail.com: >>> What can I do to improve the performance of the regular query without >>> using a CTE? >> >> try to rewrite it to a subselect: >> >> select ... from ... join (selec ... from ... where ...) x on ... >> > > Do mean like this? > > EXPLAIN ANALYSE > SELECT F.oID, D.szText > FROM F > JOIN (SELECT Doc.uDocID, Doc.szText FROM Doc WHERE szText ILIKE '%480GB%') > AS D ON D.uDocID = F.uDocRef; To try to replicate what the CTE is doing I would try: SELECT * FROM Doc JOIN (SELECT uDocRef, F.oID, Doc.szText FROM F JOIN Doc ON F.uDocRef = Doc.udocid ) AS D ON D.uDocRef = Doc.udocid WHERE D.szText ILIKE '%480GB%' > > Just as bad as my regular query: > > "Hash Join (cost=8006.56..8694.93 rows=5 width=359) (actual > time=66777.898..66784.630 rows=10 loops=1)" > " Hash Cond: (f.udocref = doc.udocid)" > " -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual > time=0.002..3.563 rows=32605 loops=1)" > " -> Hash (cost=8006.32..8006.32 rows=19 width=359) (actual > time=66777.471..66777.471 rows=16 loops=1)" > " Buckets: 1024 Batches: 1 Memory Usage: 19kB" > " -> Seq Scan on doc (cost=0.00..8006.32 rows=19 width=359) (actual > time=9013.317..66777.438 rows=16 loops=1)" > " Filter: (sztext ~~* '%480GB%'::text)" > " Rows Removed by Filter: 125930" > "Planning time: 236.354 ms" > "Execution time: 66784.651 ms" > > > -- Adrian Klaver adrian.klaver@aklaver.com
> -----Ursprüngliche Nachricht----- > Von: Adrian Klaver <adrian.klaver@aklaver.com> > Gesendet: Samstag, 18. August 2018 16:24 > > To try to replicate what the CTE is doing I would try: > SELECT * > FROM Doc > JOIN (SELECT uDocRef, F.oID, Doc.szText > FROM F JOIN Doc ON F.uDocRef = Doc.udocid) AS D > ON D.uDocRef = Doc.udocid > WHERE D.szText ILIKE '%480GB%' No difference - still starting with the full scan on Doc and lasting 67 seconds: "Nested Loop (cost=8006.98..8700.40 rows=5 width=750) (actual time=66845.857..66852.705 rows=10 loops=1)" " -> Hash Join (cost=8006.56..8694.93 rows=5 width=391) (actual time=66845.838..66852.613 rows=10 loops=1)" " Hash Cond: (f.udocref = doc_1.udocid)" " -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual time=0.002..3.428 rows=32605 loops=1)" " -> Hash (cost=8006.32..8006.32 rows=19 width=359) (actual time=66845.431..66845.431 rows=16 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 19kB" " -> Seq Scan on doc doc_1 (cost=0.00..8006.32 rows=19 width=359) (actual time=9042.984..66845.398 rows=16loops=1)" " Filter: (sztext ~~* '%480GB%'::text)" " Rows Removed by Filter: 125930" " -> Index Scan using doc_udocid_key on doc (cost=0.42..1.08 rows=1 width=375) (actual time=0.008..0.008 rows=1 loops=10)" " Index Cond: (udocid = f.udocref)" "Planning time: 252.162 ms" "Execution time: 66852.737 ms"
Greetings, * kpi6288@gmail.com (kpi6288@gmail.com) wrote: > The CTE mentioned below completes the query in 4.5 seconds while the regular > query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query > starts with a full table scan over "Doc" while the CTE joins the two tables > first and applies the filter condition in the 2nd step. > > I believe that some rows in "Doc" which are not referenced by "F" contain a > large amount of data in the field "szText" and this will slow down the ILIKE > operator. Yup, that appears to be what's happening. > What can I do to improve the performance of the regular query without using > a CTE? You could possibly build a trigram index on the field you're searching, which could avoid the full table scan. Of course, that index could be quite large, so there's downsides to that. If these are words you're looking for then you could use PG's full text indexing to build indexes on the words and then use that instead. If you are fine working with words but are concerned about misspellings then you can extract out the distinct words, build a trigram index on those, find the most similar words based on the input and then search for those words using the FTI. Unfortunately, we don't currently pay attention to things like average string length when considering the cost of performing an 'ilike', so we figure that doing the filtering first and then the join will be faster, but that obviously falls over in some cases, like this one. Using the CTE forces PG to (today, at least) do the join first, but that isn't really good to rely on. Thanks! Stephen
Attachment
> -----Ursprüngliche Nachricht----- > Von: Stephen Frost <sfrost@snowman.net> > Gesendet: Samstag, 18. August 2018 16:39 Hello, > > > What can I do to improve the performance of the regular query without > > using a CTE? > > You could possibly build a trigram index on the field you're searching, which > could avoid the full table scan. Of course, that index could be quite large, so > there's downsides to that. If these are words you're looking for then you > could use PG's full text indexing to build indexes on the words and then use > that instead. If you are fine working with words but are concerned about > misspellings then you can extract out the distinct words, build a trigram index > on those, find the most similar words based on the input and then search for > those words using the FTI. > > Unfortunately, we don't currently pay attention to things like average string > length when considering the cost of performing an 'ilike', so we figure that > doing the filtering first and then the join will be faster, but that obviously falls > over in some cases, like this one. Using the CTE forces PG to (today, at least) > do the join first, but that isn't really good to rely on. A trigram index would be a possible help in this particular scenario but size and updating the index in other parts of the application would be probably create other issues. I may try it, though. But thanks to confirming my assumption. I just thought that it should be obvious to the optimizer to do the join first and filter on this result. But I'm reading you r post that there is nothing that I can do to modify the behavior of the optimizer. Or is there a way to specify the cost for an operator (ILIKE in this case) on a specific column? Thanks Klaus
Stephen Frost <sfrost@snowman.net> writes: > * kpi6288@gmail.com (kpi6288@gmail.com) wrote: >> The CTE mentioned below completes the query in 4.5 seconds while the regular >> query takes 66 seconds. > Unfortunately, we don't currently pay attention to things like average > string length when considering the cost of performing an 'ilike', so we > figure that doing the filtering first and then the join will be faster, > but that obviously falls over in some cases, like this one. Using the > CTE forces PG to (today, at least) do the join first, but that isn't > really good to rely on. Well, it's simpler than that: filter quals are always evaluated at the lowest possible plan level. One of the Berkeley PhD theses that we ripped out ages ago tried to be smarter about that, but the cost/benefit/complexity ratio just wasn't very good, mainly because it's so darn hard to estimate the selectivity of quals on subsets of relations. It's not very apparent why the results are so bad in this case, either. One of the plans has the ILIKE being applied to circa 32600 rows, and the other one runs it on circa 126000 rows. That should produce less than a 4x penalty, not 14x. Do the rows removed by the join have significantly-longer-on-average sztext fields? (If so, the odds that the planner would ever recognize such a correlation seem pretty small.) In any case, given that the ILIKE selects so few rows (and the planner knows it!), finding a way to index that is clearly the right answer. regards, tom lane
> -----Ursprüngliche Nachricht----- > Von: Tom Lane <tgl@sss.pgh.pa.us> > Gesendet: Samstag, 18. August 2018 17:29 > > Well, it's simpler than that: filter quals are always evaluated at the lowest > possible plan level. Thank you. This "always" was not clear to me, but it explains a few similar cases (with not-so-extreme differences) that I could not understand. Regards Klaus
> What can I do to improve the performance of the regular query without using a CTE? Why do you care ? When I find that I can write a SQL 3 different ways, I will go for the most efficient one. So why not accept the CTE version of this SQL. Just curious.
> -----Ursprüngliche Nachricht----- > Von: Ravi Krishna <sravikrishna@aol.com> > Gesendet: Samstag, 18. August 2018 18:25 > > > What can I do to improve the performance of the regular query without > using a CTE? > > Why do you care ? When I find that I can write a SQL 3 different ways, I will > go for the most efficient one. So why not accept the CTE version of this SQL. > Just curious. We're using object mapping / entity frameworks (e.g. XPO, Entity Framework Core). These frameworks support regular queriesout-of-the box; a CTEs require additional effort and are more difficult to maintain. Regards Klaus
kpi6288@gmail.com writes: >> -----Ursprüngliche Nachricht----- >> Von: Ravi Krishna <sravikrishna@aol.com> >> Gesendet: Samstag, 18. August 2018 18:25 >> >> > What can I do to improve the performance of the regular query without >> using a CTE? >> >> Why do you care ? When I find that I can write a SQL 3 different ways, I will >> go for the most efficient one. So why not accept the CTE version of this SQL. >> Just curious. > > We're using object mapping / entity frameworks (e.g. XPO, Entity Framework Core). These frameworks support regular queriesout-of-the box; a CTEs require additional effort and are more difficult to maintain. > Ah, another reason to avoid object mapping/entity frameworks! I guess really the same reason - loss of flexibility and expressive power. Sorry, having a similar battle with some developers who are insisting on using a particular framework because it makes maintenance easier as it 'automates' creation of controllers (MVC). However, they are frustrated by performance and I'm frustrated as the framework also fails to pass additional information, such as PGAPPNAME, which would make some analysis easier. Part of the reason for the performance issues is because the developers are doing things with result sets within the client that would be far more efficient performed within the database. One way I have resolved this in the past is to create database procedures which present a 'mapped' view back to the framework layer which hides the SQL from the framework. Works well, with the only main downside being you now have SQL in a different (another) place, which can make some people uncomfortable and can be a maintenance issue if all your developers are just front-end devs who treat a database as just a key/value repository. . Tim -- Tim Cross
> -----Ursprüngliche Nachricht----- > Von: Tom Lane <tgl@sss.pgh.pa.us> > Gesendet: Samstag, 18. August 2018 17:29 > > In any case, given that the ILIKE selects so few rows (and the planner knows > it!), finding a way to index that is clearly the right answer. A trigram index took 9 minutes to build but improved the regular query from 67 seconds down to 500 ms. Although this is an impressive improvement, I'm afraid that the index might create a delays in other parts of the application (INSERT / UPDATE). We will probably rework the design of this particular table. Thanks to everyone who helped me in this matter. Regards Klaus
> -----Ursprüngliche Nachricht----- > Von: Tim Cross <theophilusx@gmail.com> > Gesendet: Sonntag, 19. August 2018 04:57 > > > > We're using object mapping / entity frameworks (e.g. XPO, Entity > Framework Core). These frameworks support regular queries out-of-the > box; a CTEs require additional effort and are more difficult to maintain. > > > > Ah, another reason to avoid object mapping/entity frameworks! I guess > really the same reason - loss of flexibility and expressive power. While I agree that you loose control over certain details, we are overall quite happy using the frameworks. The frameworks nowadays provide the ability to call procedures if required - but using the objects directly is more convenientfor the developers. SQL procedures add (just like a CTE) an additional layer to the application design which needsmaintenance. That's fine if it really helps overall but we try to avoid it if it isn't needed. Regards Klaus
Am 18.08.18 11:36 schrieb(en) kpi6288@gmail.com: [snip] > What can I do to improve the performance of the regular query without using a CTE? Sorry for jumping into this discussion late – I'm facing similar problems with Postgres choosing strange and inefficientquery plans for no (for me) apparent reason. I use the DEB packages postgresql-10, version 10.5-1.pgdg90+1, ona Debian stretch box. The relevant part of the database structure is: --8<----------------------------------------------------------------------------------------------- mydb=> \d strings Table "public.strings" Column | Type | Collation | Nullable | Default --------+--------+-----------+----------+-------------------------------------- iid | bigint | | not null | sid | bigint | | not null | nextval('strings_sid_seq'::regclass) stype | text | | | string | text | | | Indexes: "strings_pkey" PRIMARY KEY, btree (iid, sid) "idx_strings_string_gin" gin (string gin_trgm_ops) "idx_stype" btree (stype) Foreign-key constraints: "strings_iid_fkey" FOREIGN KEY (iid) REFERENCES items(iid) ON DELETE CASCADE mydb=> \d items Table "public.items" Column | Type | Collation | Nullable | Default ---------------+---------------+-----------+----------+------------------------------------ dbid | bigint | | not null | iid | bigint | | not null | nextval('items_iid_seq'::regclass) riid | integer | | | […more columns…] Indexes: "items_pkey" PRIMARY KEY, btree (iid) "idx_items_riid" btree (riid) "items_dbid" btree (dbid) […more indexes…] Referenced by: TABLE "strings" CONSTRAINT "strings_iid_fkey" FOREIGN KEY (iid) REFERENCES items(iid) ON DELETE CASCADE […more references…] --8<----------------------------------------------------------------------------------------------- The table “strings” contains about 2 * 10e7 active rows, “items” about 10e8. The “instability” occurs with the following somewhat trivial query. In the correct (IMO) case, the indexes are used: --8<----------------------------------------------------------------------------------------------- mydb=> EXPLAIN ANALYZE SELECT items.iid, stype, string, riid FROM items LEFT JOIN strings USING(iid) WHERE stype ~ E'^tag\\..*(?<\!\\.\\d+)$'AND dbid = 7416000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.13..522716.95 rows=8 width=133) (actual time=0.078..0.715 rows=16 loops=1) -> Index Scan using items_dbid on items (cost=0.57..1377.96 rows=773 width=12) (actual time=0.021..0.038 rows=19 loops=1) Index Cond: (dbid = 7416000) -> Index Scan using strings_pkey on strings (cost=0.56..674.18 rows=26 width=129) (actual time=0.030..0.035 rows=1loops=19) Index Cond: (iid = items.iid) Filter: (stype ~ '^tag\..*(?<!\.\d+)$'::text) Rows Removed by Filter: 3 Planning time: 1.685 ms Execution time: 0.762 ms (9 rows) --8<----------------------------------------------------------------------------------------------- However, seemingly at random, Postgres chooses the following plan which is (planning plus execution) ~1500 times slower: --8<----------------------------------------------------------------------------------------------- mydb=> EXPLAIN ANALYZE SELECT items.iid, stype, string, riid FROM items LEFT JOIN strings USING(iid) WHERE stype ~ E'^tag\\..*(?<\!\\.\\d+)$'AND dbid = 7416000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=84945.47..522033.97 rows=9 width=133) (actual time=1401.570..3868.239 rows=16 loops=1) Workers Planned: 2 Workers Launched: 2 -> Hash Join (cost=83945.47..521033.07 rows=4 width=133) (actual time=2206.088..3823.982 rows=5 loops=3) Hash Cond: (strings.iid = items.iid) -> Parallel Bitmap Heap Scan on strings (cost=82539.52..518233.10 rows=531057 width=129) (actual time=390.479..3795.902rows=401149 loops=3) Filter: (stype ~ '^tag\..*(?<!\.\d+)$'::text) Rows Removed by Filter: 384802 Heap Blocks: exact=76067 -> Bitmap Index Scan on idx_stype (cost=0.00..82220.88 rows=2334832 width=0) (actual time=340.725..340.725rows=2357863 loops=1) Index Cond: ((stype >= 'tag.'::text) AND (stype < 'tag/'::text)) -> Hash (cost=1395.77..1395.77 rows=814 width=12) (actual time=0.137..0.137 rows=19 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using items_dbid on items (cost=0.57..1395.77 rows=814 width=12) (actual time=0.072..0.126rows=19 loops=3) Index Cond: (dbid = 7416000) Planning time: 2.617 ms Execution time: 3868.303 ms (17 rows) --8<----------------------------------------------------------------------------------------------- It looks as if the selection of the plan is more or less random, and does /not/ depend on the statistics state. I.e. running“vacuum analyze strings; vacuum analyze items;” immediately before the query does /not/ result in a reproducible behaviour(a /very/ small number if entries may have been added or deleted between the calls in both tables, though). My solution for a stable (but slower than the query utilising the indexes) response time is also using a CTE. However, itwould be helpful to fix (or at least understand) the behaviour. Best, Albrecht.