For some kinds of queries a composite index can be dramatically faster.
While Postgres can combine indexes that means scanning both indexes and
combining the result, which may need a lot more disk I/O than scanning a
composite index. Indeed, in the cases where a composite index would be
useful but doesn't exist, PostgreSQL usually just chooses the best of
the single column indexes and ignores the rest.
That said, my rule of thumb is to create just single column indexes at
first and only create composite indexes if they are necessary.
Thank you so much. As I understand optimizer uses indexed column as "access criteria" and rest of the predicate as "filter criteria" while evaluating the query predicate. And if the majority of the rows are getting eliminated in the filtered step , that means adding that filtered criteria column to the index could give us better performance.
So I was trying to understand say in below query having TABLE1 as driving table ( if we forget about column selectivity for a moment),
Can the optimizer, only scan the TABLE1 using ACCESS criteria " TABLE1.MID in (XXXX)" or "TABLE1.CID in (XXXX)" which will be catered by two different index i.e one index on column "MID" and other on column "CID"?
OR
It can utilize other columns as access criteria those used in join conditions like MID, PR_ID, in which case a composite index on the columns(CID,PR_ID) (MID, PR_ID) will provide better selectivity and faster access?
Similarly for TABLE2 a composite index on (ACN_NBR,PR_ID,MID) or just an index on (ACN_NBR)?
select .......
from TABLE1
Left join schema1.TABLE2 on TABLE2.PR_ID = TABLE1.PR_ID and TABLE2.MID = TABLE1.MID
and TABLE2.processing_date=TABLE1.processing_date
where TABLE1.processing_date between '2023-04-20' and '2023-05-21' -- Considering processing_date here as partition key.
and TABLE2.ACN_NBR = 'XXXX'
and ( TABLE1.MID in (XXXX) OR TABLE1.CID in (XXXX))
order by TABLE1.PR_TIME DESC