Thread: Index no longer being used, destroying and recreating it restores use.
Greetings,
The following is using version 11.2 of PostgreSQL.
I've got a table with about 30 million rows and a particular index that up until recently was actively being used.
And then it stopped being used and the query that the index was made for, is now doing sequential scans.
Deleting the index and creating it again, seems to fix the problem. The new index, which is identical in composition, is being used and the query in question no longer uses sequential scans.
It's the exact same query and the index is identical in composition. Yet after a while the database stops using it. I'd like to find out why that is and how to prevent it.
Also, I'm assuming this is the correct list for such a question?
In the past, I had asked this somewhere else, but then no longer had time to spend on it: https://dba.stackexchange.com/questions/264237/
Some data I gathered then:
- Size of the index not being used is 101MB.
- Size of the index being used is 16MB.
Here is the "explain analyze" for index used: https://explain.depesz.com/s/H5X9y
Here is the "explain analyze" for index not used: https://explain.depesz.com/s/n6bP
On 6/4/20 8:37 AM, Koen De Groote wrote: > Greetings, > > The following is using version 11.2 of PostgreSQL. > > I've got a table with about 30 million rows and a particular index that > up until recently was actively being used. > > And then it stopped being used and the query that the index was made > for, is now doing sequential scans. Did you ANALYZE the table at that point? > > Deleting the index and creating it again, seems to fix the problem. The > new index, which is identical in composition, is being used and the > query in question no longer uses sequential scans. > > > It's the exact same query and the index is identical in composition. Yet > after a while the database stops using it. I'd like to find out why that > is and how to prevent it. > > > Also, I'm assuming this is the correct list for such a question? > > > In the past, I had asked this somewhere else, but then no longer had > time to spend on it: https://dba.stackexchange.com/questions/264237/ > > Some data I gathered then: > > 1. Size of the index not being used is 101MB. > 2. Size of the index being used is 16MB. > > The query takes the form of: > > "select * from myTable where bool1 = true and bool2 = false and > timestamp <= ('timestampField'::timestamp without time zone) order by > stringField asc, id asc limit 100 offset 30000;" > > 30000 is an example value. > # > > Here is the "explain analyze" for index used: > https://explain.depesz.com/s/H5X9y > > # > > Here is the "explain analyze" for index not used: > https://explain.depesz.com/s/n6bP > > > And I'm frankly stumped.An index growing from 16MB to 101MB isn't that > big of an increase, I would think? Is that the reason it's no longer > being used? Or is something else going on here? > > The entire database, in which this table belongs, undergoes a "vacuum > analyze" every single night, which takes about 8 minutes. Do I perhaps > need to do something additional in terms of cleanup/maintenance? > > I've tried altering statistics, to very large values even, but no > changes there either. > > Any help or suggestion would be appreciated. > > Kind regards, > Koen De Groote > -- Adrian Klaver adrian.klaver@aklaver.com
Koen De Groote <kdg.dev@gmail.com> writes: > I've got a table with about 30 million rows and a particular index that up > until recently was actively being used. > And then it stopped being used and the query that the index was made for, > is now doing sequential scans. > Deleting the index and creating it again, seems to fix the problem. The new > index, which is identical in composition, is being used and the query in > question no longer uses sequential scans. It's possible that the index had bloated to the point where the planner thought it was cheaper to use a seqscan. Did you make a note of the cost estimates for the different plans? regards, tom lane
On 6/4/20 9:43 AM, Tom Lane wrote: > Koen De Groote <kdg.dev@gmail.com> writes: >> I've got a table with about 30 million rows and a particular index that up >> until recently was actively being used. >> And then it stopped being used and the query that the index was made for, >> is now doing sequential scans. >> Deleting the index and creating it again, seems to fix the problem. The new >> index, which is identical in composition, is being used and the query in >> question no longer uses sequential scans. > > It's possible that the index had bloated to the point where the planner > thought it was cheaper to use a seqscan. Did you make a note of the > cost estimates for the different plans? I missed the part where the OP pointed to a SO question. In that question where links to explain.depesz.com output. So: With index(https://explain.depesz.com/s/H5X9y): Limit (cost=5964059.790..5964071.460 rows=100 width=3141) (actual time=2534.648..2547.352 rows=100 loops=1) Buffers: shared hit=30 read=27753 -> Gather Merge (cost=5959707.820..6516383.180 rows=4771170 width=3141) (actual time=2193.611..2542.835 rows=37400 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=30 read=27753 -> Sort (cost=5958707.790..5964671.760 rows=2385585 width=3141) (actual time=1833.278..1838.731 rows=12528 loops=3) Sort Key: mike_romeo, quebec Sort Method: quicksort Memory: 24449kB Worker 0: Sort Method: quicksort Memory: 28634kB Worker 1: Sort Method: quicksort Memory: 18065kB Buffers: shared hit=30 read=27753 -> Parallel Bitmap Heap Scan on oscar mike_three (cost=4282.840..1923544.730 rows=2385585 width=3141) (actual time=150.297..659.047 rows=23586 loops=3) Recheck Cond: ((zulu <= 'echo'::timestamp without time zone) AND four AND (NOT bravo)) Heap Blocks: exact=9759 Buffers: shared read=27753 -> Bitmap Index Scan on foxtrot (cost=0.000..2851.490 rows=5725405 width=0) (actual time=245.459..245.459 rows=70759 loops=1) Index Cond: (zulu <= 'echo'::timestamp without time zone) Buffers: shared read=2028 Planning time: 437.187 ms Execution time: 2549.633 ms W/O index(https://explain.depesz.com/s/n6bP): Limit (cost=5975083.560..5975095.230 rows=100 width=3141) (actual time=159708.476..159718.368 rows=100 loops=1) Buffers: shared hit=1082 read=1798654 -> Gather Merge (cost=5970731.590..6527406.950 rows=4771170 width=3141) (actual time=159309.819..159713.861 rows=37400 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=1082 read=1798654 -> Sort (cost=5969731.570..5975695.530 rows=2385585 width=3141) (actual time=159214.623..159219.579 rows=12531 loops=3) Sort Key: mike_romeo, quebec Sort Method: quicksort Memory: 24459kB Worker 0: Sort Method: quicksort Memory: 22650kB Worker 1: Sort Method: quicksort Memory: 24038kB Buffers: shared hit=1082 read=1798654 -> Parallel Seq Scan on oscar mike_three (cost=0.000..1934568.500 rows=2385585 width=3141) (actual time=159.800..158018.961 rows=23586 loops=3) Filter: (four AND (NOT bravo) AND (zulu <= 'echo'::timestamp without time zone)) Rows Removed by Filter: 8610174 Buffers: shared hit=1014 read=1798652 Planning time: 0.807 ms Execution time: 159720.208 ms > > regards, tom lane > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 6/4/20 9:43 AM, Tom Lane wrote: >> It's possible that the index had bloated to the point where the planner >> thought it was cheaper to use a seqscan. Did you make a note of the >> cost estimates for the different plans? > I missed the part where the OP pointed to a SO question. In that > question where links to explain.depesz.com output. Ah, I didn't bother to chase that link either. So the cost estimates are only a fraction of a percent apart, making it unsurprising for not-so-large changes in the index size to cause a flip in the apparently-cheapest plan. The real question then is why the cost estimates aren't actually modeling the real execution times very well; and I'd venture that that question boils down to why is this rowcount estimate so far off: > -> Parallel Seq Scan on oscar mike_three > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual > time=159.800..158018.961 rows=23586 loops=3) > Filter: (four AND (NOT bravo) AND (zulu <= > 'echo'::timestamp without time zone)) > Rows Removed by Filter: 8610174 We're not going to be able to answer that if the OP doesn't wish to decloak his data a bit more ... but a reasonable guess is that those filter conditions are correlated. With late-model Postgres you might be able to improve matters by creating extended statistics for this table. regards, tom lane
Re: Index no longer being used, destroying and recreating it restores use.
The table in question is used to keep filepath data, of files on a harddrive.
The query in question is used to retrieve items which should be backed up, but have not yet been.
The relevant columns of the table:
Table "public.item"
Column | Type | Collation | Nullable | Default
------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
id | bigint | | not null | nextval('item_id_seq'::regclass)
shouldbebackedup | boolean | | not null | true
backupperformed | boolean | | not null | false
itemCreated | timestamp without time zone | | | now()
filepath | text | | |
The existing index, which no longer gets used:
"index_in_question" btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false
The new index, made out of the exact same columns and conditions, get used immediately after creation:
CREATE INDEX CONCURRENTLY index_test ON item USING btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false;
The query in question will look something like this:
select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;
Having done a count, there are around 13000 items here, without the offset and limit.
That being said, the amount is entirely dependant on what was added on a previous day.
I tried creating an extended statistic, like this, but it had no effect:
CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed FROM item;
Settings from the conf file I think are related:
shared_buffers = 1024MB
effective_cache_size = 2048MB
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 32MB
Finally, I state again that this database gets a nightly "vacuum analyze".
My thanks for looking at this and any suggestions one might have.
Regards,
Koen
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 6/4/20 9:43 AM, Tom Lane wrote:
>> It's possible that the index had bloated to the point where the planner
>> thought it was cheaper to use a seqscan. Did you make a note of the
>> cost estimates for the different plans?
> I missed the part where the OP pointed to a SO question. In that
> question where links to explain.depesz.com output.
Ah, I didn't bother to chase that link either.
So the cost estimates are only a fraction of a percent apart, making
it unsurprising for not-so-large changes in the index size to cause
a flip in the apparently-cheapest plan. The real question then is
why the cost estimates aren't actually modeling the real execution
times very well; and I'd venture that that question boils down to
why is this rowcount estimate so far off:
> -> Parallel Seq Scan on oscar mike_three
> (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
> time=159.800..158018.961 rows=23586 loops=3)
> Filter: (four AND (NOT bravo) AND (zulu <=
> 'echo'::timestamp without time zone))
> Rows Removed by Filter: 8610174
We're not going to be able to answer that if the OP doesn't wish
to decloak his data a bit more ... but a reasonable guess is that
those filter conditions are correlated. With late-model Postgres
you might be able to improve matters by creating extended statistics
for this table.
regards, tom lane
On 6/5/20 7:05 AM, Koen De Groote wrote: > I've collected all relevant info(I think so at least) and put it here: > > The table in question is used to keep filepath data, of files on a > harddrive. > The query in question is used to retrieve items which should be backed > up, but have not yet been. > > The relevant columns of the table: > > Table "public.item" > Column | Type | > Collation | Nullable | Default > ------------------------------------+-----------------------------+-----------+----------+---------------------------------------------- > id | bigint | > | not null | nextval('item_id_seq'::regclass) > shouldbebackedup | boolean | > | not null | true > backupperformed | boolean | > | not null | false > itemCreated | timestamp without time zone | > | | now() > filepath | text | > | | > > > The existing index, which no longer gets used: > "index_in_question" btree (shouldbebackedup, backupperformed, > itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed > = false > > The new index, made out of the exact same columns and conditions, get > used immediately after creation: > CREATE INDEX CONCURRENTLY index_test ON item USING btree > (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE > shouldbebackedup = true AND backupperformed = false; > > > The query in question will look something like this: > select * from item where shouldbebackedup=true and > itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order > by filepath asc, id asc limit 100 offset 10400; The result of EXPLAIN ANALYZE for above. > > Having done a count, there are around 13000 items here, without the > offset and limit. > That being said, the amount is entirely dependant on what was added on a > previous day. > > > I tried creating an extended statistic, like this, but it had no effect: > CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed > FROM item; > > Settings from the conf file I think are related: > > shared_buffers = 1024MB > effective_cache_size = 2048MB > random_page_cost = 1.1 > effective_io_concurrency = 200 > work_mem = 32MB > > Finally, I state again that this database gets a nightly "vacuum analyze". > > My thanks for looking at this and any suggestions one might have. > > Regards, > Koen > > On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> writes: > > On 6/4/20 9:43 AM, Tom Lane wrote: > >> It's possible that the index had bloated to the point where the > planner > >> thought it was cheaper to use a seqscan. Did you make a note of the > >> cost estimates for the different plans? > > > I missed the part where the OP pointed to a SO question. In that > > question where links to explain.depesz.com > <http://explain.depesz.com> output. > > Ah, I didn't bother to chase that link either. > > So the cost estimates are only a fraction of a percent apart, making > it unsurprising for not-so-large changes in the index size to cause > a flip in the apparently-cheapest plan. The real question then is > why the cost estimates aren't actually modeling the real execution > times very well; and I'd venture that that question boils down to > why is this rowcount estimate so far off: > > > -> Parallel Seq Scan on oscar mike_three > > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual > > time=159.800..158018.961 rows=23586 loops=3) > > Filter: (four AND (NOT bravo) AND (zulu <= > > 'echo'::timestamp without time zone)) > > Rows Removed by Filter: 8610174 > > We're not going to be able to answer that if the OP doesn't wish > to decloak his data a bit more ... but a reasonable guess is that > those filter conditions are correlated. With late-model Postgres > you might be able to improve matters by creating extended statistics > for this table. > > regards, tom lane > -- Adrian Klaver adrian.klaver@aklaver.com
Re: Index no longer being used, destroying and recreating it restores use.
=> SELECT stxname, stxkeys, stxdependencies
-> FROM pg_statistic_ext
-> WHERE stxname = 's1';
stxname | stxkeys | stxdependencies
---------+---------+-----------------
s1 | 29 35 | <NULL>
On 6/5/20 7:05 AM, Koen De Groote wrote:
> I've collected all relevant info(I think so at least) and put it here:
>
> The table in question is used to keep filepath data, of files on a
> harddrive.
> The query in question is used to retrieve items which should be backed
> up, but have not yet been.
>
> The relevant columns of the table:
>
> Table "public.item"
> Column | Type |
> Collation | Nullable | Default
> ------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
> id | bigint |
> | not null | nextval('item_id_seq'::regclass)
> shouldbebackedup | boolean |
> | not null | true
> backupperformed | boolean |
> | not null | false
> itemCreated | timestamp without time zone |
> | | now()
> filepath | text |
> | |
>
>
> The existing index, which no longer gets used:
> "index_in_question" btree (shouldbebackedup, backupperformed,
> itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed
> = false
>
> The new index, made out of the exact same columns and conditions, get
> used immediately after creation:
> CREATE INDEX CONCURRENTLY index_test ON item USING btree
> (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE
> shouldbebackedup = true AND backupperformed = false;
>
>
> The query in question will look something like this:
> select * from item where shouldbebackedup=true and
> itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order
> by filepath asc, id asc limit 100 offset 10400;
The result of EXPLAIN ANALYZE for above.
>
> Having done a count, there are around 13000 items here, without the
> offset and limit.
> That being said, the amount is entirely dependant on what was added on a
> previous day.
>
>
> I tried creating an extended statistic, like this, but it had no effect:
> CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed
> FROM item;
>
> Settings from the conf file I think are related:
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> random_page_cost = 1.1
> effective_io_concurrency = 200
> work_mem = 32MB
>
> Finally, I state again that this database gets a nightly "vacuum analyze".
>
> My thanks for looking at this and any suggestions one might have.
>
> Regards,
> Koen
>
> On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
> Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> writes:
> > On 6/4/20 9:43 AM, Tom Lane wrote:
> >> It's possible that the index had bloated to the point where the
> planner
> >> thought it was cheaper to use a seqscan. Did you make a note of the
> >> cost estimates for the different plans?
>
> > I missed the part where the OP pointed to a SO question. In that
> > question where links to explain.depesz.com
> <http://explain.depesz.com> output.
>
> Ah, I didn't bother to chase that link either.
>
> So the cost estimates are only a fraction of a percent apart, making
> it unsurprising for not-so-large changes in the index size to cause
> a flip in the apparently-cheapest plan. The real question then is
> why the cost estimates aren't actually modeling the real execution
> times very well; and I'd venture that that question boils down to
> why is this rowcount estimate so far off:
>
> > -> Parallel Seq Scan on oscar mike_three
> > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
> > time=159.800..158018.961 rows=23586 loops=3)
> > Filter: (four AND (NOT bravo) AND (zulu <=
> > 'echo'::timestamp without time zone))
> > Rows Removed by Filter: 8610174
>
> We're not going to be able to answer that if the OP doesn't wish
> to decloak his data a bit more ... but a reasonable guess is that
> those filter conditions are correlated. With late-model Postgres
> you might be able to improve matters by creating extended statistics
> for this table.
>
> regards, tom lane
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Standard indexes and partial indexes don't get custom statistics created on them, but functional indexes do. I wonder if a small function needs_backup( shouldbebackedup, backupperformed ) and an index created on that function would nicely alleviate the pain. I would expect PG12 and extended statistics of type MCVs would bypass the need for that work around though.
Re: Index no longer being used, destroying and recreating it restores use.
Those row estimates are pretty far off.
Standard indexes and partial indexes don't get custom statistics created on them, but functional indexes do. I wonder if a small function needs_backup( shouldbebackedup, backupperformed ) and an index created on that function would nicely alleviate the pain. I would expect PG12 and extended statistics of type MCVs would bypass the need for that work around though.
Re: Index no longer being used, destroying and recreating it restores use.
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and itemCreated<=$1::timestamp without time zone and backupperformed=false order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';
---------------------------------------------------------------------------------------------------------------------------
Function Scan on public.needsbackup (cost=0.25..10.25 rows=1000 width=5275) (actual time=80.107..80.115 rows=100 loops=1)
Output: id, shouldbebackedup, backupperformed, itemCreated, filepath
Function Call: needsbackup('2020-06-08 12:00:00.016'::text, 2100, 100)
Buffers: shared read=1572
Planning Time: 0.199 ms
Execution Time: 80.499 ms
(6 rows)
I'll attempt this next week.On Fri, Jun 5, 2020, 21:11 Michael Lewis <mlewis@entrata.com> wrote:Those row estimates are pretty far off.
Standard indexes and partial indexes don't get custom statistics created on them, but functional indexes do. I wonder if a small function needs_backup( shouldbebackedup, backupperformed ) and an index created on that function would nicely alleviate the pain. I would expect PG12 and extended statistics of type MCVs would bypass the need for that work around though.
So, this query:select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;Was made into a function:create or replace function NeedsBackup(text, int, int default 100)
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and itemCreated<=$1::timestamp without time zone and backupperformed=false order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';
Re: Index no longer being used, destroying and recreating it restores use.
returns BOOLEAN as $$
BEGIN
PERFORM 1 from item where shouldbebackedup=$1 and backupperformed=$2;
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
language plpgsql;
On Mon, Jun 8, 2020 at 2:34 PM Koen De Groote <kdg.dev@gmail.com> wrote:So, this query:select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;Was made into a function:create or replace function NeedsBackup(text, int, int default 100)
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and itemCreated<=$1::timestamp without time zone and backupperformed=false order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';What I had meant was a function perhaps called backup_needed_still( backupperformed bool, shouldbebackedup bool) which would return bool; This could be written in SQL only with no need for plpgsql. By the way, the language name being in single quotes is deprecated.Then you could create an index on the table- either on the timestamp column where that function returns true, or just creating the index directly on the boolean result of that function call if that is what is needed to get the custom stats from a functional index. Then you would include the function call in your query instead of the two individual boolean columns.
On Tue, Jun 9, 2020 at 02:23:51PM +0200, Koen De Groote wrote: > Right. In that case, the function I ended up with is this: > > create or replace function still_needs_backup(bool, bool) > returns BOOLEAN as $$ > BEGIN > PERFORM 1 from item where shouldbebackedup=$1 and backupperformed=$2; > IF FOUND THEN > RETURN TRUE; > ELSE > RETURN FALSE; > END IF; > END; > $$ > language plpgsql; > > > And the index for that should be this, I think: "CREATE INDEX CONCURRENTLY > index_test ON item USING btree (still_needs_backup, itemCreated, filepath) > WHERE still_needs_backup(true, false) = true;" > However postgres throws an error here, saying "ERROR: functions in index > predicate must be marked IMMUTABLE". > > I tried it also without the first argument, same error. > > And I don't think I can do that, because the return is not IMMUTABLE. It is at > best STABLE, but certainly not IMMUTABLE. > > So yeah, I'm probably not understanding the suggestion properly? Either way, I > still have questions about the earlier function I created, namely how reliable > that performance is. If not the same thing will happen as with the re-created > index. I think you need to look at EXPLAIN ANALYZE and see how close the estimate is from the actual counts for various stages. The original query had these quite different, leading to misestimation and wrong plans. If the new EXPLAIN ANALYZE has estimates closer to actual, the problem should not reappear. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Right. In that case, the function I ended up with is this:create or replace function still_needs_backup(bool, bool)
returns BOOLEAN as $$
BEGIN
PERFORM 1 from item where shouldbebackedup=$1 and backupperformed=$2;
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
language plpgsql;
returns BOOLEAN as $$
BEGIN
return $1 AND NOT $2;
END;
$$
language sql;
Re: Index no longer being used, destroying and recreating it restores use.
returns BOOLEAN as $$
select $1 AND NOT $2;
$$
language sql immutable;
On Tue, Jun 9, 2020 at 6:24 AM Koen De Groote <kdg.dev@gmail.com> wrote:Right. In that case, the function I ended up with is this:create or replace function still_needs_backup(bool, bool)
returns BOOLEAN as $$
BEGIN
PERFORM 1 from item where shouldbebackedup=$1 and backupperformed=$2;
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
language plpgsql;I meant something like the below (not tested)-create or replace function still_needs_backup(shouldbebackedup bool, backupperformed bool)
returns BOOLEAN as $$
BEGIN
return $1 AND NOT $2;
END;
$$
language sql;CREATE INDEX CONCURRENTLY index_test ON item USING btree (itemCreated) WHERE still_needs_backup(shouldbebackedup, backupperformed);ANALYZE item;
On Tue, Jun 16, 2020 at 11:49:15AM +0200, Koen De Groote wrote: > Alright, I've done that, and that seems to be a very good result: https:// > explain.depesz.com/s/xIph > > The method I ended up using: > > create or replace function still_needs_backup(shouldbebackedup bool, > backupperformed bool) > returns BOOLEAN as $$ > select $1 AND NOT $2; > $$ > language sql immutable; > > And the index is as suggested. > > It seems the amount of rows we end up with has improved. > > Thank you for your help. I wasn't aware functions could interact with indexes > in such a manner. This blog entry explains how statistics on expression/function indexes can help: https://momjian.us/main/blogs/pgblog/2017.html#February_20_2017 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee