Thread: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Jesper Krogh
Date:
Hi. I'm currently trying to figure out why the tsearch performance seems to vary a lot between different queryplans. I have created a sample dataset that sort of resembles the data I have to work on. The script that builds the dataset is at: http://krogh.cc/~jesper/build-test.pl and http://krogh.cc/~jesper/words.txt is needed for it to run. Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. The dataset consists of words randomized, but .. all records contains "commonterm", around 80% contains commonterm80 and so on.. my $rand = rand(); push @doc,"commonterm" if $commonpos == $j; push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8; Results are run multiple times after each other so they should be reproducible: ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on ftstest (cost=0.00..10750.00 rows=40188 width=4) (actual time=0.102..1792.215 rows=40082 loops=1) Filter: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 1809.437 ms (3 rows) ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on ftstest (cost=115389.14..125991.96 rows=40188 width=4) (actual time=17.445..197.356 rows=40082 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..115379.09 rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 204.201 ms (5 rows) Given that the seq-scan have to visit 50K row to create the result and the bitmap heap scan only have to visit 40K (but search the index) we would expect the seq-scan to be at most 25% more expensive than the bitmap-heap scan.. e.g. less than 300ms. Jesper -- Jesper
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Craig Ringer
Date:
On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote: > Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. > > The dataset consists of words randomized, but .. all records contains > "commonterm", around 80% contains commonterm80 and so on.. > > my $rand = rand(); > push @doc,"commonterm" if $commonpos == $j; > push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8; You should probably re-generate your random value for each call rather than store it. Currently, every document with commonterm20 is guaranteed to also have commonterm40, commonterm60, etc, which probably isn't very realistic, and also makes doc size correlate with word rarity. > Given that the seq-scan have to visit 50K row to create the result and > the bitmap heap scan only have to visit 40K (but search the index) we > would expect the seq-scan to be at most 25% more expensive than the > bitmap-heap scan.. e.g. less than 300ms. I suspect table bloat. Try VACUUMing your table and trying again. In this sort of test it's often a good idea to TRUNCATE the table before populating it with a newly generated data set. That helps avoid any residual effects from table bloat etc from lingering between test runs. -- Craig Ringer
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Jesper Krogh
Date:
Craig Ringer wrote: > On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote: > >> Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. >> >> The dataset consists of words randomized, but .. all records contains >> "commonterm", around 80% contains commonterm80 and so on.. >> >> my $rand = rand(); >> push @doc,"commonterm" if $commonpos == $j; >> push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8; > > You should probably re-generate your random value for each call rather > than store it. Currently, every document with commonterm20 is guaranteed > to also have commonterm40, commonterm60, etc, which probably isn't very > realistic, and also makes doc size correlate with word rarity. I had that in the first version, but I wanted to have the gaurantee that a commonterm60 was indeed a subset of commonterm80, so that why its sturctured like that. I know its not realistic, but it gives measureable results since I know my queries will hit the same tuples. I fail to see how this should have any direct effect on query time? >> Given that the seq-scan have to visit 50K row to create the result and >> the bitmap heap scan only have to visit 40K (but search the index) we >> would expect the seq-scan to be at most 25% more expensive than the >> bitmap-heap scan.. e.g. less than 300ms. > > I suspect table bloat. Try VACUUMing your table and trying again. No bloat here: ftstest=# VACUUM FULL VERBOSE ftstest; INFO: vacuuming "public.ftstest" INFO: "ftstest": found 0 removable, 50000 nonremovable row versions in 10000 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 1352 to 1652 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 6859832 bytes. 0 pages are or will become empty, including 0 at the end of the table. 536 pages containing 456072 free bytes are potential move destinations. CPU 0.03s/0.03u sec elapsed 0.06 sec. INFO: index "ftstest_id_key" now contains 50000 row versions in 139 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.13 sec. INFO: index "ftstest_gin_idx" now contains 50000 row versions in 35792 pages DETAIL: 0 index pages have been deleted, 25022 are currently reusable. CPU 0.46s/0.11u sec elapsed 11.16 sec. INFO: "ftstest": moved 0 row versions, truncated 10000 to 10000 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: vacuuming "pg_toast.pg_toast_908525" INFO: "pg_toast_908525": found 0 removable, 100000 nonremovable row versions in 16710 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 270 to 2032 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 3695712 bytes. 0 pages are or will become empty, including 0 at the end of the table. 5063 pages containing 1918692 free bytes are potential move destinations. CPU 0.38s/0.17u sec elapsed 2.64 sec. INFO: index "pg_toast_908525_index" now contains 100000 row versions in 276 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.28 sec. INFO: "pg_toast_908525": moved 0 row versions, truncated 16710 to 16710 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ftstest=# > In this sort of test it's often a good idea to TRUNCATE the table before > populating it with a newly generated data set. That helps avoid any > residual effects from table bloat etc from lingering between test runs. As you could see in the scripts, the table is dropped just before its recreated and filled with data. Did you try to re-run the test? Jesper -- Jesper
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Craig Ringer
Date:
On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: > > You should probably re-generate your random value for each call rather > > than store it. Currently, every document with commonterm20 is guaranteed > > to also have commonterm40, commonterm60, etc, which probably isn't very > > realistic, and also makes doc size correlate with word rarity. > > I had that in the first version, but I wanted to have the gaurantee that > a commonterm60 was indeed a subset of commonterm80, so that why its > sturctured like that. I know its not realistic, but it gives measureable > results since I know my queries will hit the same tuples. > > I fail to see how this should have any direct effect on query time? Probably not, in truth, but with the statistics-based planner I'm occasionally surprised by what can happen. > > > In this sort of test it's often a good idea to TRUNCATE the table before > > populating it with a newly generated data set. That helps avoid any > > residual effects from table bloat etc from lingering between test runs. > > As you could see in the scripts, the table is dropped just before its > recreated and filled with data. > > Did you try to re-run the test? No, I didn't. I thought it worth checking if bloat might be the result first, though I should've read the scripts to confirm you weren't already handling that possibility. Anyway, I've done a run to generate your data set and run a test. After executing the test statement twice (once with and once without enable_seqscan) to make sure all data is in cache and not being read from disk, when I run the tests here are my results: test=> set enable_seqscan=on; SET test=> explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on ftstest (cost=36.96..227.10 rows=50 width=4) (actual time=15.830..134.194 rows=40061 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..36.95 rows=50 width=0) (actual time=11.905..11.905 rows=40061 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 148.477 ms (5 rows) test=> set enable_seqscan=off; SET test=> explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on ftstest (cost=36.96..227.10 rows=50 width=4) (actual time=15.427..134.156 rows=40061 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..36.95 rows=50 width=0) (actual time=11.739..11.739 rows=40061 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 148.583 ms (5 rows) Any chance your disk cache was cold on the first test run, so Pg was having to read the table from disk during the seqscan, and could just use shared_buffers when you repeated the test for the index scan? -- Craig Ringer
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Jesper Krogh
Date:
Craig Ringer wrote: > On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: > >>> You should probably re-generate your random value for each call rather >>> than store it. Currently, every document with commonterm20 is guaranteed >>> to also have commonterm40, commonterm60, etc, which probably isn't very >>> realistic, and also makes doc size correlate with word rarity. >> I had that in the first version, but I wanted to have the gaurantee that >> a commonterm60 was indeed a subset of commonterm80, so that why its >> sturctured like that. I know its not realistic, but it gives measureable >> results since I know my queries will hit the same tuples. >> >> I fail to see how this should have any direct effect on query time? > > Probably not, in truth, but with the statistics-based planner I'm > occasionally surprised by what can happen. > >>> In this sort of test it's often a good idea to TRUNCATE the table before >>> populating it with a newly generated data set. That helps avoid any >>> residual effects from table bloat etc from lingering between test runs. >> As you could see in the scripts, the table is dropped just before its >> recreated and filled with data. >> >> Did you try to re-run the test? > > No, I didn't. I thought it worth checking if bloat might be the result > first, though I should've read the scripts to confirm you weren't > already handling that possibility. > > Anyway, I've done a run to generate your data set and run a test. After > executing the test statement twice (once with and once without > enable_seqscan) to make sure all data is in cache and not being read > from disk, when I run the tests here are my results: > > > test=> set enable_seqscan=on; > SET > test=> explain analyze select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); Here you should search for "commonterm" not "commonterm80", commonterm will go into a seq-scan. You're not testing the same thing as I did. > Any chance your disk cache was cold on the first test run, so Pg was > having to read the table from disk during the seqscan, and could just > use shared_buffers when you repeated the test for the index scan? they were run repeatedly.
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Craig Ringer
Date:
On Tue, 2009-10-27 at 06:44 +0100, Jesper Krogh wrote: > Here you should search for "commonterm" not "commonterm80", commonterm > will go into a seq-scan. You're not testing the same thing as I did. Point taken. I ran the same commands as you, but as the planner picked different plans it wasn't much use. The fact that I didn't notice that is a bit worrying, as it suggests and even worse than normal degree of brain-fade. Sorry for the waste of time. Anyway, testing more usefully: On 8.4 on a different system Pg uses the seq scan by preference, with a runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when searching for `commonterm' even when enable_seqscan is set to `off'. A search for `commonterm80' also uses a seq scan (1067ms), but if enable_seqscan is set to off it'll use a bitmap heap scan at 237ms. On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ... odd. If I force it not to use a bitmap heap scan it'll use an index scan. Preventing that too results in a seq scan with a runtime of 1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like a pretty strange result on face value. So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT longer than the bitmap heap scan, though similar numbers of tuples are being read by both. I see the same results when actually reading the results rather than just doing an `explain analyze'. With psql set to send output to /dev/null and with \timing enabled: test=> \o /dev/null test=> set enable_seqscan = on; Time: 0.282 ms test=> select id from ftstest where body_fts @@ to_tsquery('commonterm80'); Time: 988.880 ms test=> set enable_seqscan = off; Time: 0.286 ms test=> select id from ftstest where body_fts @@ to_tsquery('commonterm80'); Time: 159.167 ms so - nearly 1s vs 0.15s is a big difference between what I previously confirmed to be bitmap heap scan and seq scan respectively for the same query. The same number of records are being returned in both cases. If I "select *" rather than just reading the `id' field, the runtimes are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE' results are still quite different, though, at 1020ms seq scan vs 233ms bitmap heap, suggesting that the similarity is created only by the time taken to actually transfer the data to the client. The time difference between the two is much the same. So - for some reason the seq scan takes 800ms or so longer than the bitmap heap scan. I can see why you're puzzled. I can reproduce it on two different machines with two different Pg versions, and using two slightly different methods for loading the data as well. So, I can confirm your test results now that I'm actually testing properly. test=> explain analyze select * from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on ftstest (cost=25836.66..36432.95 rows=39753 width=54) (actual time=27.452..175.481 rows=39852 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..25826.72 rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 233.473 ms (5 rows) test=> set enable_seqscan = on; SET test=> explain analyze select * from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on ftstest (cost=0.00..10750.00 rows=39753 width=54) (actual time=0.141..956.496 rows=39852 loops=1) Filter: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 1020.936 ms (3 rows) By the way, for the 8.4 test I modifed the loader script so it wouldn't take quite so painfully long to run second time 'round. I turned autocommit off, wrapped all the inserts up in a single transaction, and moved the fts index creation to after all the data has been inserted. It's a *LOT* faster, and the test results match yours. > they were run repeatedly. Yeah, just saw that in your original mail. Sorry. -- Craig Ringer
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Jesper Krogh
Date:
Craig Ringer wrote: > On 8.4 on a different system Pg uses the seq scan by preference, with a > runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when > searching for `commonterm' even when enable_seqscan is set to `off'. A > search for `commonterm80' also uses a seq scan (1067ms), but if > enable_seqscan is set to off it'll use a bitmap heap scan at 237ms. Ok, thats excactly as my number. > On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ... > odd. If I force it not to use a bitmap heap scan it'll use an index > scan. Preventing that too results in a seq scan with a runtime of > 1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like > a pretty strange result on face value. PG 8.3 doesnt have statistics data available for gin-indexes so that may be why the query-planner can do otherwise on 8.3. It also means that it is a regression since in these cases 8.4 will perform worse than 8.3 did. (allthough the statistics makes a lot other cases way better). > So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT > longer than the bitmap heap scan, though similar numbers of tuples are > being read by both. > > I see the same results when actually reading the results rather than > just doing an `explain analyze'. With psql set to send output > to /dev/null and with \timing enabled: > > test=> \o /dev/null > test=> set enable_seqscan = on; > Time: 0.282 ms > test=> select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); > Time: 988.880 ms > test=> set enable_seqscan = off; > Time: 0.286 ms > test=> select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); > Time: 159.167 ms > > so - nearly 1s vs 0.15s is a big difference between what I previously > confirmed to be bitmap heap scan and seq scan respectively for the same > query. The same number of records are being returned in both cases. > > If I "select *" rather than just reading the `id' field, the runtimes > are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both > reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE' > results are still quite different, though, at 1020ms seq scan vs 233ms > bitmap heap, suggesting that the similarity is created only by the time > taken to actually transfer the data to the client. The time difference > between the two is much the same. > > So - for some reason the seq scan takes 800ms or so longer than the > bitmap heap scan. I can see why you're puzzled. I can reproduce it on > two different machines with two different Pg versions, and using two > slightly different methods for loading the data as well. So, I can > confirm your test results now that I'm actually testing properly. Thanks a lot. > test=> explain analyze select * from ftstest where body_fts @@ > to_tsquery('commonterm80'); > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on ftstest (cost=25836.66..36432.95 rows=39753 > width=54) (actual time=27.452..175.481 rows=39852 loops=1) > Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) > -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..25826.72 > rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1) > Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) > Total runtime: 233.473 ms > (5 rows) > > test=> set enable_seqscan = on; > SET > test=> explain analyze select * from ftstest where body_fts @@ > to_tsquery('commonterm80'); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------- > Seq Scan on ftstest (cost=0.00..10750.00 rows=39753 width=54) (actual > time=0.141..956.496 rows=39852 loops=1) > Filter: (body_fts @@ to_tsquery('commonterm80'::text)) > Total runtime: 1020.936 ms > (3 rows) My systems seems more to prefer bitmap-scans a bit more, but given the actual number it seems to be preferrablem. Thats about query-planning, my main reason for posting was the actual run time. > By the way, for the 8.4 test I modifed the loader script so it wouldn't > take quite so painfully long to run second time 'round. I turned > autocommit off, wrapped all the inserts up in a single transaction, and > moved the fts index creation to after all the data has been inserted. > It's a *LOT* faster, and the test results match yours. I'll make that change if I have to work a bit more with it. Thanks for speding time confirming my findings. (the I know its not just me getting blind at some problem). Jesper -- Jesper
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Robert Haas
Date:
On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh <jesper@krogh.cc> wrote: > Hi. > > I'm currently trying to figure out why the tsearch performance seems to > vary a lot between different queryplans. I have created a sample dataset > that sort of resembles the data I have to work on. > > The script that builds the dataset is at: > http://krogh.cc/~jesper/build-test.pl > and http://krogh.cc/~jesper/words.txt is needed for it to run. > > Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. > > The dataset consists of words randomized, but .. all records contains > "commonterm", around 80% contains commonterm80 and so on.. > > my $rand = rand(); > push @doc,"commonterm" if $commonpos == $j; > push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8; > > Results are run multiple times after each other so they should be > reproducible: > > ftstest=# explain analyze select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------- > Seq Scan on ftstest (cost=0.00..10750.00 rows=40188 width=4) (actual > time=0.102..1792.215 rows=40082 loops=1) > Filter: (body_fts @@ to_tsquery('commonterm80'::text)) > Total runtime: 1809.437 ms > (3 rows) > > ftstest=# set enable_seqscan=off; > SET > ftstest=# explain analyze select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on ftstest (cost=115389.14..125991.96 rows=40188 > width=4) (actual time=17.445..197.356 rows=40082 loops=1) > Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) > -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..115379.09 > rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1) > Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) > Total runtime: 204.201 ms > (5 rows) > > Given that the seq-scan have to visit 50K row to create the result and > the bitmap heap scan only have to visit 40K (but search the index) we > would expect the seq-scan to be at most 25% more expensive than the > bitmap-heap scan.. e.g. less than 300ms. I've seen behavior similar to this in the past with a plain old B-tree index. As in your case, a bitmap index scan was significantly faster than a sequential scan even though essentially all the heap pages had to be scanned, but the planner expected the opposite to be true. The planner's expectation is that the dominent cost will be fetching the pages, and it furthermore thinks that fetching things in sequential order is much better than skipping around randomly. However, if all the pages are memory-resident - possibly even in L2 or L3 CPU cache - fetching the pages is nearly free, so the dominant cost becomes the CPU time to process the tuples. My best guess is that in cases like this index cond is cheaper to evaluate than the recheck cond/filter, so the index scan wins not by reading fewer pages but by avoiding the need to examine some of the tuples on each page. I might be all wet, though. If your whole database fits in RAM, you could try changing your seq_page_cost and random_page_cost variables from the default values of 1 and 4 to something around 0.05, or maybe even 0.01, and see whether that helps. But if it's just this query that is in cache and you have lots of other things that are going to disk, that's harder to tune. You can probably still lower the default values somewhat, but if you go crazy with it you'll start to have problems in the other direction. ...Robert
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
jesper@krogh.cc
Date:
> On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh <jesper@krogh.cc> wrote: >> Given that the seq-scan have to visit 50K row to create the result and >> the bitmap heap scan only have to visit 40K (but search the index) we >> would expect the seq-scan to be at most 25% more expensive than the >> bitmap-heap scan.. e.g. less than 300ms. > > I've seen behavior similar to this in the past with a plain old B-tree > index. As in your case, a bitmap index scan was significantly faster > than a sequential scan even though essentially all the heap pages had > to be scanned, but the planner expected the opposite to be true. The > planner's expectation is that the dominent cost will be fetching the > pages, and it furthermore thinks that fetching things in sequential > order is much better than skipping around randomly. However, if all > the pages are memory-resident - possibly even in L2 or L3 CPU cache - > fetching the pages is nearly free, so the dominant cost becomes the > CPU time to process the tuples. Well, no. This topic is not at all about the query-planner. It is about the actual run-time of the two "allmost" identical queries. It may be that we're seeing the results because one fits better into L2 or L3 cache, but the complete dataset is memory resident and run multiple times in a row to eliminate disk-access. > My best guess is that in cases like this index cond is cheaper to > evaluate than the recheck cond/filter, so the index scan wins not by > reading fewer pages but by avoiding the need to examine some of the > tuples on each page. I might be all wet, though. In my example the seq-scan evaulates 50K tuples and the heap-scan 40K. The question is why does the "per-tuple" evaluation become that much more expensive (x7.5)[1] on the seq-scan than on the index-scan, when the complete dataset indeed is in memory? > If your whole database fits in RAM, you could try changing your > seq_page_cost and random_page_cost variables from the default values > of 1 and 4 to something around 0.05, or maybe even 0.01, and see > whether that helps. This is about planning the query. We're talking actual runtimes here. [1] 50K tuples in 1.800ms vs. 40K tuples in 200ms -- Jesper
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Robert Haas
Date:
On Tue, Oct 27, 2009 at 11:08 AM, <jesper@krogh.cc> wrote: > In my example the seq-scan evaulates 50K tuples and the heap-scan 40K. > The question is why does the "per-tuple" evaluation become that much more > expensive (x7.5)[1] on the seq-scan than on the index-scan, when the > complete dataset indeed is in memory? [ ... thinks a little more ... ] The bitmap index scan returns a TID bitmap. From a quick look at nodeBitmapHeapScan.c, it appears that the recheck cond only gets evaluated for those portions of the TID bitmap that are lossy. So I'm guessing what may be happening here is that although the bitmap heap scan is returning 40K rows, it's doing very few (possibly no) qual evaluations, and mostly just checking tuple visibility. >> If your whole database fits in RAM, you could try changing your >> seq_page_cost and random_page_cost variables from the default values >> of 1 and 4 to something around 0.05, or maybe even 0.01, and see >> whether that helps. > > This is about planning the query. We're talking actual runtimes here. Sorry, I assumed you were trying to get the planner to pick the faster plan. If not, never mind. ...Robert
Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
From
Tom Lane
Date:
Jesper Krogh <jesper@krogh.cc> writes: > I'm currently trying to figure out why the tsearch performance seems to > vary a lot between different queryplans. I have created a sample dataset > that sort of resembles the data I have to work on. > The script that builds the dataset is at: > http://krogh.cc/~jesper/build-test.pl > and http://krogh.cc/~jesper/words.txt is needed for it to run. I got around to looking at this example finally, and I can reproduce your results pretty closely. I think there are two things going on: 1. The cost estimates for to_tsquery and ts_match_vq don't reflect the actually-rather-high costs of those functions. Since the seqscan plan executes these functions many more times than the indexscan plan, that results in a relative cost error. There's already been some discussion of changing the default costs for the tsearch functions, but nothing's been done yet. However, that seems to be a relatively small problem compared to... 2. The planner is estimating that most of the GIN index has to be examined --- specifically, it estimates (pretty accurately) that 40188 out of 50000 table rows will match, and the default assumption is that that means 40188/50000 of the index blocks will have to be read. On my machine the index amounts to 39076 blocks, so we estimate 31407 index blocks have to be read, and that's why the cost estimate for the indexscan is so huge. The *actual* number of index blocks read for the query, according to the stats collector, is 66. So it appears that genericcostestimate() is just completely inappropriate for GIN indexes, at least when considering common terms. I guess that's not so astonishing when you remember that GIN isn't built around per-heap-tuple entries the way the other index types are. Oleg, Teodor, can you suggest any better cost metric to use for GIN? regards, tom lane