Thread: BTREE index: field ordering

BTREE index: field ordering

From
Moreno Andreo
Date:
Hi,
Postgres 16.4 (planning to go on 17.4)
I'm creating some indexes based on some slow query reported by logs.
These queries involve a WHERE with more than 5 fields, that are matching by =, <>, LIKE and IN()
I read that equality fields must be first, then the others.
Is it correct?

Based on this query

SELECT COUNT(id) AS total
                FROM nx.tbl1
                WHERE
            (date_order >= '2025-03-21')
            AND (date_order <= '2025-03-29')
            AND (flag = TRUE)
            AND ((
              -- (flag = TRUE)
              -- AND (((tipo <> 'C') AND (tipo <> 'V') AND (tipo <> 'F')) OR (tipo IS NULL) OR (tipo = ''))
              (((op <> 'C') OR (op IS NULL)) OR (tipo = 'F'))
              AND (s_state IN ('ENQ','WFR','BLQ','BLR','WFA','FRW','FRO','0000','0001'))
              AND (tiporic IS NOT NULL)
              AND (tiporic NOT LIKE '%cart%')
            ) OR (
              (tiporic LIKE '%cart%') AND (S_state <> 'CON') AND (s_state <> '0002') AND ((op <> 'C') OR (op IS NULL))
            ))  AND (priv IS NULL OR priv = false OR (priv = true and idpriv = 'TEST')));

Should the following index be correct?

CREATE INDEX IF NOT EXISTS tbl1_count_idx on nx.tbl1 USING BTREE(flag, tipo, op, priv, idpriv, date_order, s_state, tiporic);

Would it be better to create a separate GIN/GIST index for the field matched with LIKE?

Thanks in advance,
Moreno

Re: BTREE index: field ordering

From
Laurenz Albe
Date:
On Fri, 2025-03-28 at 13:38 +0100, Moreno Andreo wrote:
>  Postgres 16.4 (planning to go on 17.4)
>  I'm creating some indexes based on some slow query reported by logs.
>  These queries involve a WHERE with more than 5 fields, that are matching by =, <>, LIKE and IN()
>  I read that equality fields must be first, then the others.
>  Is it correct?

Fundamentally yes, but you also have to consider how selective the conditions are.
Putting a column in the index where the condition will only filter out few rows
is not going to help; such rows should be omitted from the index.

> Based on this query
>                  SELECT COUNT(id) AS total 
>                  FROM nx.tbl1
>                  WHERE
>              (date_order >= '2025-03-21')
>              AND (date_order <= '2025-03-29')
>              AND (flag = TRUE)
>              AND ((
>                -- (flag = TRUE)
>                -- AND (((tipo <> 'C') AND (tipo <> 'V') AND (tipo <> 'F')) OR (tipo IS NULL) OR (tipo = ''))
>                (((op <> 'C') OR (op IS NULL)) OR (tipo = 'F'))
>                AND (s_state IN ('ENQ','WFR','BLQ','BLR','WFA','FRW','FRO','0000','0001'))
>                AND (tiporic IS NOT NULL)
>                AND (tiporic NOT LIKE '%cart%')
>              ) OR (
>                (tiporic LIKE '%cart%') AND (S_state <> 'CON') AND (s_state <> '0002') AND ((op <> 'C') OR (op IS
NULL)) 
>              ))  AND (priv IS NULL OR priv = false OR (priv = true and idpriv = 'TEST')));
>
>  Should the following index be correct?
>
>  CREATE INDEX IF NOT EXISTS tbl1_count_idx on nx.tbl1 USING BTREE(flag, tipo, op, priv, idpriv, date_order, s_state,
tiporic); 
>  
>  Would it be better to create a separate GIN/GIST index for the field matched with LIKE?

The ORs will be a problem. Get rid of them as much as possible by using UNION,
at least for WHERE conditions that are selective.

Yours,
Laurenz Albe

Re: BTREE index: field ordering

From
Ron Johnson
Date:
On Fri, Mar 28, 2025 at 9:35 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2025-03-28 at 13:38 +0100, Moreno Andreo wrote:
>  Postgres 16.4 (planning to go on 17.4)
>  I'm creating some indexes based on some slow query reported by logs.
>  These queries involve a WHERE with more than 5 fields, that are matching by =, <>, LIKE and IN()
>  I read that equality fields must be first, then the others.
>  Is it correct?

Fundamentally yes, but you also have to consider how selective the conditions are.
Putting a column in the index where the condition will only filter out few rows
is not going to help; such rows should be omitted from the index.

> Based on this query
>                  SELECT COUNT(id) AS total 
>                  FROM nx.tbl1
>                  WHERE
>              (date_order >= '2025-03-21')
>              AND (date_order <= '2025-03-29')
>              AND (flag = TRUE)
>              AND ((
>                -- (flag = TRUE)
>                -- AND (((tipo <> 'C') AND (tipo <> 'V') AND (tipo <> 'F')) OR (tipo IS NULL) OR (tipo = ''))
>                (((op <> 'C') OR (op IS NULL)) OR (tipo = 'F'))
>                AND (s_state IN ('ENQ','WFR','BLQ','BLR','WFA','FRW','FRO','0000','0001'))
>                AND (tiporic IS NOT NULL)
>                AND (tiporic NOT LIKE '%cart%')
>              ) OR (
>                (tiporic LIKE '%cart%') AND (S_state <> 'CON') AND (s_state <> '0002') AND ((op <> 'C') OR (op IS NULL))
>              ))  AND (priv IS NULL OR priv = false OR (priv = true and idpriv = 'TEST')));
>
>  Should the following index be correct?
>
>  CREATE INDEX IF NOT EXISTS tbl1_count_idx on nx.tbl1 USING BTREE(flag, tipo, op, priv, idpriv, date_order, s_state, tiporic);
>  
>  Would it be better to create a separate GIN/GIST index for the field matched with LIKE?

The ORs will be a problem. Get rid of them as much as possible by using UNION,
at least for WHERE conditions that are selective.
 
"at least for WHERE conditions that are selective" confuses me.  Aren't _all_ WHERE clauses selective?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: BTREE index: field ordering

From
"David G. Johnston"
Date:
On Fri, Mar 28, 2025 at 9:24 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

"at least for WHERE conditions that are selective" confuses me.  Aren't _all_ WHERE clauses selective?


From earlier in the email, selectivity is a scale, the wording here implies "has a meaningful selectivity".

"Fundamentally yes, but you also have to consider how selective the conditions are." - Laurenz

where true - selectivity 0
where not(is_deleted) - selectivity < .01 ish in many cases

Selectivity is evaluating cardinality with an eye on the frequency of the values you are actually going to be filtering on.  So low cardinality booleans can be highly selective in usage if you are looking for the rare false in a sea of trues but low selectivity if looking through those trues.

David J.

Re: BTREE index: field ordering

From
"David G. Johnston"
Date:
On Fri, Mar 28, 2025 at 10:02 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
Selectivity is evaluating cardinality with an eye on the frequency of the values you are actually going to be filtering on.  So low cardinality booleans can be highly selective in usage if you are looking for the rare false in a sea of trues but low selectivity if looking through those trues.

And to accommodate the highly-selective case a partial index is much better than indexing the low cardinality boolean column values.

David J.

Re: BTREE index: field ordering

From
Kevin Stephenson
Date:
Moreno,

You have two lines in your WHERE clause commented out, the first line is a duplicate check on flag = TRUE, and the other line involves several checks on the "tipo" column. Will the final query or set of related queries actually need to filter on the "tipo" column? You currently have "tipo" in second place in your candidate index.

Thanks,
Kevin

From: Moreno Andreo <moreno.andreo@evolu-s.it>
Sent: Friday, March 28, 2025 5:38 AM
To: PostgreSQL mailing lists <pgsql-general@postgresql.org>
Subject: BTREE index: field ordering
 
Hi,
Postgres 16.4 (planning to go on 17.4)
I'm creating some indexes based on some slow query reported by logs.
These queries involve a WHERE with more than 5 fields, that are matching by =, <>, LIKE and IN()
I read that equality fields must be first, then the others.
Is it correct?

Based on this query

SELECT COUNT(id) AS total
                FROM nx.tbl1
                WHERE
            (date_order >= '2025-03-21')
            AND (date_order <= '2025-03-29')
            AND (flag = TRUE)
            AND ((
              -- (flag = TRUE)
              -- AND (((tipo <> 'C') AND (tipo <> 'V') AND (tipo <> 'F')) OR (tipo IS NULL) OR (tipo = ''))
              (((op <> 'C') OR (op IS NULL)) OR (tipo = 'F'))
              AND (s_state IN ('ENQ','WFR','BLQ','BLR','WFA','FRW','FRO','0000','0001'))
              AND (tiporic IS NOT NULL)
              AND (tiporic NOT LIKE '%cart%')
            ) OR (
              (tiporic LIKE '%cart%') AND (S_state <> 'CON') AND (s_state <> '0002') AND ((op <> 'C') OR (op IS NULL))
            ))  AND (priv IS NULL OR priv = false OR (priv = true and idpriv = 'TEST')));

Should the following index be correct?

CREATE INDEX IF NOT EXISTS tbl1_count_idx on nx.tbl1 USING BTREE(flag, tipo, op, priv, idpriv, date_order, s_state, tiporic);

Would it be better to create a separate GIN/GIST index for the field matched with LIKE?

Thanks in advance,
Moreno