Home > mailing lists

Thread: hash as an search key and hash collision

hash as an search key and hash collision

From

Andy Fan

Date:

19 June 2020, 08:24:01

I want to maintain an internal table which the primary key is sql_text and

planstmt::text, it is efficient since it both may be very long. So a general

idea is to use sql_hash_value and plan_hash_value. Then we have to

handle the hash collision case. However I checked the codes both in sr_plans[1]

and pg_stat_statements[2], both of them didn't handle such cases, IIUC. so

how can I understand this situation?

[1] https://github.com/postgrespro/sr_plan/blob/41d96bf136ec072dac77dddf8d9765bba39190ff/sr_plan.c#L383

[2] https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c#L154

Best Regards

Andy Fan

Re: hash as an search key and hash collision

From

Tomas Vondra

Date:

19 June 2020, 16:34:16

On Fri, Jun 19, 2020 at 04:24:01PM +0800, Andy Fan wrote:
>I want to maintain an internal table which the primary key is sql_text and
>planstmt::text, it is efficient since it both may be very long.  So a
>general
>idea is to use sql_hash_value and plan_hash_value. Then we have to
>handle the hash collision case.  However I checked the codes both in
>sr_plans[1]
>and pg_stat_statements[2],  both of them didn't handle such cases, IIUC.  so
>how can I understand this situation?
>

IIRC pg_stat_statements simply accepts the hash collision risk. This is
what the docs say:

     In some cases, queries with visibly different texts might get merged
     into a single pg_stat_statements entry. Normally this will happen
     only for semantically equivalent queries, but there is a small
     chance of hash collisions causing unrelated queries to be merged
     into one entry. (This cannot happen for queries belonging to
     different users or databases, however.)

The consequences of a hash collision are relatively harmless, enough to
make it not worth the extra checks (e.g. because the SQL text may not be
available in memory and would need to be read from the file).

I suppose sr_plan does the same thing, but I haven't checked.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hash as an search key and hash collision

From

Andy Fan

Date:

19 June 2020, 23:33:39

On Sat, Jun 20, 2020 at 12:34 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Jun 19, 2020 at 04:24:01PM +0800, Andy Fan wrote:
>I want to maintain an internal table which the primary key is sql_text and
>planstmt::text, it is efficient since it both may be very long. So a
>general
>idea is to use sql_hash_value and plan_hash_value. Then we have to
>handle the hash collision case. However I checked the codes both in
>sr_plans[1]
>and pg_stat_statements[2], both of them didn't handle such cases, IIUC. so
>how can I understand this situation?
>

IIRC pg_stat_statements simply accepts the hash collision risk. This is
what the docs say:

In some cases, queries with visibly different texts might get merged
into a single pg_stat_statements entry. Normally this will happen
only for semantically equivalent queries, but there is a small
chance of hash collisions causing unrelated queries to be merged
into one entry. (This cannot happen for queries belonging to
different users or databases, however.)

The consequences of a hash collision are relatively harmless, enough to
make it not worth the extra checks (e.g. because the SQL text may not be
available in memory and would need to be read from the file).

I see. Thank you for this information, this does make sense.

I suppose sr_plan does the same thing, but I haven't checked.

sr_plans is used to map a sql hash value to a PlannedStmts, if hash collisions

happen, it may execute a query B while the user wants to execute Query A.

this should be more sensitive than pg_stat_statements which doesn't require

exact data. I added Ildus who is the author of sr_plan to the cc list

in case he wants to take a look.

Best Regards

Andy Fan