Thread: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
From
Emil Iggland
Date:
Dear mailing list We are investigating a strange performance issue with our database. Our use case is a sensor reading database where we have sensor location (called channels), parameter settings (called valueseries) and reading (called datavalues). Datavalues is partitioned per month. Like this: channel <-(FK)- valueseries <-(FK)- datavalues We have a query that returns the latest sensor reading. When we have no readings in datavalues the query is SLOW, when we have 1 or more readings in datavalues the query is FAST. (slow being ~1second, fast ~5ms) It isn't the slowness that is the main problem, but rather the odd behaviour. The query that is giving us issues is the following, channel 752433 has NO values, 752431 has values. (Channel 752433 only has valueseries 752434) select * from datavalue where dataview in ( select id from valueseries where channel = %channel_idx%) ORDER BY VALUETIMESTAMP DESC FETCH FIRST ROW only; Running explain analyze shows strange numbers, 52'000 rows are being returned but there are no rows there. For channel 752433 -> Index Scan Backward using datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual time=0.008..32.831 rows=119601 loops=1) -> Index Scan Backward using datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 datavalue_7 (cost=0.29..4002.79 rows=52499 width=227) (actual time=0.011..15.005 rows=52499 loops=1) For channel 752431 -> Index Scan Backward using datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual time=0.008..0.008 rows=1 loops=1) -> Index Scan Backward using datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 datavalue_7 (cost=0.29..4002.79 rows=52499 width=227) (actual time=0.011..0.011 rows=1 loops=1) Inserting even a single row changes the offending rows to the expected values: insert into maclient.datavalue (dataview, valuetimestamp, datavalue) values (752434, '2022-03-01 00:00:00', 234); -> Index Scan Backward using datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual time=0.006..0.006 rows=1 loops=1) Full explain analyze on https://paste.depesz.com/s/ZwJ with buffers and track_io_timing: https://paste.depesz.com/s/Ss Disabling indexscan (set enable_indexscan=false;) hides the problem, it does not show up with a bitmap index scan. Running autovacuum analyze doesn't seem to help, the results are the same. SELECT version(); "PostgreSQL 14.2, compiled by Visual C++ build 1914, 64-bit" Can anyone explain what is going on here. * Why is the database returning 52'000+ rows when it should be returning 0? * Is my query badly formulated? * Is there something wrong with the indexes and I need to rebuild them? We are stumped, and would greatly appreciate any input. Regards Emil
Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
From
Tom Lane
Date:
Emil Iggland <emil.iggland@metrima.com> writes: > The query that is giving us issues is the following, channel 752433 has > NO values, 752431 has values. > (Channel 752433 only has valueseries 752434) > select * from datavalue > where dataview in ( select id from valueseries where channel = > %channel_idx%) > ORDER BY VALUETIMESTAMP DESC > FETCH FIRST ROW only; > Running explain analyze shows strange numbers, 52'000 rows are being > returned but there are no rows there. > For channel 752433 > -> Index Scan Backward using > datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 > datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual > time=0.008..32.831 rows=119601 loops=1) You've got the wrong column order (for this query anyway) in that index. It'd work a lot better if dataview were the first column; or at least, it wouldn't tempt the planner to try this unstably- performing plan. It's trying to use the index ordering to satisfy the ORDER BY, which works great as long as it finds a dataview match in some reasonably recent index entry. Otherwise, it's going to crawl the whole index to discover that there's no match. regards, tom lane
Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
From
Emil Iggland
Date:
> You've got the wrong column order (for this query anyway) in that > index. It'd work a lot better if dataview were the first column; I might be misunderstanding you, but I assume that you are suggesting an index on (dataview, valuetimestamp). We have that index, it is the primary key. For some reason it isn't being selected. I can understand that it has to go through the whole index, potentially even the whole table, but I do not why it takes so long. Even a query that should take equally long (probably longer) is substantially faster: explain (analyze, buffers) select valuetimestamp from datavalue where valuetimestamp <> '1965-01-07 05:50:59'; Completes in less than 500ms using a sequential scan, ... -> Seq Scan on datavalue_2022_04 datavalue_7 (cost=0.00..1450.39 rows=56339 width=8) (actual time=0.013..5.988 rows=56109 loops=1)" Filter: (valuetimestamp <> '1965-01-07 05:50:59'::timestamp without time zone) Buffers: shared hit=742 read=4 ... Planning Time: 0.781 ms Execution Time: 394.408 ms while the original query takes over 1 second. ... -> Index Scan Backward using datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 datavalue_7 (cost=0.29..4292.48 rows=56351 width=227) (actual time=0.166..17.340 rows=56109 loops=1) Buffers: shared hit=42013 read=278 ... Planning Time: 0.964 ms Execution Time: 1291.509 ms I do not understand how looking at every value in the index and returning none be slower than looking at every table in the table and returning none. If it takes 500ms to return every value in the table via a sequential scan, then it should take less via an index scan. In case we never solve it, and someone else runs into similiar problems, we (hopefully temporarily) worked around it by reformulating the query to use a lateral join: EXPLAIN (analyze, buffers) SELECT dv.* FROM valueseries vs LEFT JOIN LATERAL ( SELECT * FROM datavalue dv WHERE dv.dataview = vs.id ORDER BY VALUETIMESTAMP FETCH FIRST 1 ROWS ONLY ) dv ON TRUE where vs.channel = 752433 This causes it to use the correct index: -> Index Scan using datavalue_2022_01_pkey on datavalue_2022_01 dv_4 (cost=0.42..2951.17 rows=1032 width=228) (actual time=0.034..0.034 rows=0 loops=1) Index Cond: (dataview = vs.id) Buffers: shared read=3 ... Planning Time: 1.169 ms Execution Time: 0.524 ms Regards Emil On 2022-04-25 18:00, Tom Lane wrote: > Emil Iggland <emil.iggland@metrima.com> writes: >> The query that is giving us issues is the following, channel 752433 has >> NO values, 752431 has values. >> (Channel 752433 only has valueseries 752434) > >> select * from datavalue >> where dataview in ( select id from valueseries where channel = >> %channel_idx%) >> ORDER BY VALUETIMESTAMP DESC >> FETCH FIRST ROW only; > >> Running explain analyze shows strange numbers, 52'000 rows are being >> returned but there are no rows there. > >> For channel 752433 >> -> Index Scan Backward using >> datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 >> datavalue_6 (cost=0.42..7166.19 rows=119673 width=226) (actual >> time=0.008..32.831 rows=119601 loops=1) > > You've got the wrong column order (for this query anyway) in that > index. It'd work a lot better if dataview were the first column; > or at least, it wouldn't tempt the planner to try this unstably- > performing plan. It's trying to use the index ordering to satisfy > the ORDER BY, which works great as long as it finds a dataview > match in some reasonably recent index entry. Otherwise, it's > going to crawl the whole index to discover that there's no match. > > regards, tom lane
Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
From
David Rowley
Date:
On Wed, 27 Apr 2022 at 19:54, Emil Iggland <emil.iggland@metrima.com> wrote: > > > You've got the wrong column order (for this query anyway) in that > > index. It'd work a lot better if dataview were the first column; > I might be misunderstanding you, but I assume that you are suggesting an > index on (dataview, valuetimestamp). > We have that index, it is the primary key. For some reason it isn't > being selected. I don't think that index can be used for your original query. It could only be used if "channel" is unique in "valueseries" and you'd written the query as: select * from datavalue where dataview = (select id from valueseries where channel = 752433) ORDER BY VALUETIMESTAMP DESC FETCH FIRST ROW only; that would allow a backwards index scan using the (dataview, valuetimestamp) index. Because you're using the IN clause to possibly look for multiple "dataview" values matching the given "channel", the index range scan does not have a single point to start at. What you've done with the LATERAL query allows the index to be scanned once for each "valueseries" row with a "channel" value matching your WHERE clause. I guess "channel" must not be the primary key to "valueseries" and that's why you use an IN(). The above query would return an error if multiple rows were returned by the subquery. David
Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
From
Emil Iggland
Date:
> I don't think that index can be used for your original query. It could > only be used if "channel" is unique in "valueseries" and you'd written > the query as: Thanks! That explanation I can understand, now I know how to avoid this in future. > I guess "channel" must not be the primary key to "valueseries" and > that's why you use an IN(). Correct. We create a new valueseries in some circumstances, so multiple valueseries can point to the same channel. On 2022-04-27 10:22, David Rowley wrote: > On Wed, 27 Apr 2022 at 19:54, Emil Iggland <emil.iggland@metrima.com> wrote: >> >> > You've got the wrong column order (for this query anyway) in that >> > index. It'd work a lot better if dataview were the first column; > >> I might be misunderstanding you, but I assume that you are suggesting an >> index on (dataview, valuetimestamp). >> We have that index, it is the primary key. For some reason it isn't >> being selected. > > I don't think that index can be used for your original query. It could > only be used if "channel" is unique in "valueseries" and you'd written > the query as: > > select * from datavalue > where dataview = (select id from valueseries where channel = 752433) > ORDER BY VALUETIMESTAMP DESC > FETCH FIRST ROW only; > > that would allow a backwards index scan using the (dataview, > valuetimestamp) index. Because you're using the IN clause to possibly > look for multiple "dataview" values matching the given "channel", the > index range scan does not have a single point to start at. What > you've done with the LATERAL query allows the index to be scanned once > for each "valueseries" row with a "channel" value matching your WHERE > clause. > > I guess "channel" must not be the primary key to "valueseries" and > that's why you use an IN(). > > The above query would return an error if multiple rows were returned > by the subquery. > > David