Thread: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop
BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop
From
tcook@blackducksoftware.com
Date:
The following bug has been logged on the website: Bug reference: 14932 Logged by: Todd Cook Email address: tcook@blackducksoftware.com PostgreSQL version: 10.1 Operating system: CentOS Linux release 7.4.1708 (Core) Description: It hangs on a table with 167834 rows, though it works fine with only 167833 rows. When it hangs, CTRL-C does not interrupt it, and the backend has to be killed to stop it. Some sample stack traces: #0 0x00007f66f1ee9860 in __memset_sse2 () from /lib64/libc.so.6 #1 0x000000000083e085 in memset (__len=51539607552, __ch=0, __dest=0x7f4cdf35c048) at /usr/include/bits/string3.h:84 #2 MemoryContextAllocExtended (context=<optimized out>, size=51539607552, flags=flags@entry=5) at mcxt.c:843 #3 0x00000000005ec804 in tuplehash_allocate (type=0x163ecc8, size=<optimized out>) at ../../../src/include/lib/simplehash.h:305 #4 tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:379 #5 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #6 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #7 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #8 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #9 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #10 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #11 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #12 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #13 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #14 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #15 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #16 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #17 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #18 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #19 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #20 ServerLoop () at postmaster.c:1753 #21 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #22 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 Continuing. Program received signal SIGINT, Interrupt. 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 443 newentry = &newdata[curelem]; #0 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 #1 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #2 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #3 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #4 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #5 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #6 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #7 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #8 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #9 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #11 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #13 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #15 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #16 ServerLoop () at postmaster.c:1753 #17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 Continuing. Program received signal SIGINT, Interrupt. 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 443 newentry = &newdata[curelem]; #0 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 #1 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #2 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #3 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #4 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #5 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #6 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #7 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #8 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #9 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #11 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #13 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #15 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #16 ServerLoop () at postmaster.c:1753 #17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 Continuing. Program received signal SIGINT, Interrupt. tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:450 450 curelem = SH_NEXT(tb, curelem, startelem); #0 tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:450 #1 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #2 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #3 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #4 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #5 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #6 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #7 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #8 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #9 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #11 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #13 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #15 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #16 ServerLoop () at postmaster.c:1753 #17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 Continuing. Program received signal SIGINT, Interrupt. 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 443 newentry = &newdata[curelem]; #0 0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:443 #1 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #2 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #3 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #4 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #5 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #6 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #7 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #8 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #9 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #11 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #13 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #15 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #16 ServerLoop () at postmaster.c:1753 #17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 Continuing. Program received signal SIGINT, Interrupt. 0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:445 445 if (newentry->status == SH_STATUS_EMPTY) #0 0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at ../../../src/include/lib/simplehash.h:445 #1 0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0, found=found@entry=0x7ffdaff3eb77 "") at ../../../src/include/lib/simplehash.h:504 #2 0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38, slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at execGrouping.c:387 #3 0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at nodeAgg.c:2075 #4 lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at nodeAgg.c:2106 #5 0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at nodeAgg.c:2536 #6 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140 #7 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at ../../../src/include/executor/executor.h:250 #8 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722 #9 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x163bbf8) at pquery.c:932 #11 0x000000000071a1ef in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at pquery.c:773 #12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>) at postgres.c:1099 #13 0x000000000071745c in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088 #14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357 #15 BackendStartup (port=0x1592ab0) at postmaster.c:4029 #16 ServerLoop () at postmaster.c:1753 #17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x156bed0) at postmaster.c:1361 #18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228 select name, setting, unit, source, pending_restart from pg_settings where source <> 'default' and context <> 'internal' order by lower(name) ; name | setting | unit | source | pending_restart ------------------------------+--------------------+------+----------------------+----------------- application_name | psql | | client | f autovacuum_work_mem | 131072 | kB | configuration file | f checkpoint_completion_target | 0.8 | | configuration file | f checkpoint_timeout | 1800 | s | configuration file | f client_encoding | SQL_ASCII | | client | f cluster_name | PG 10 | | configuration file | f DateStyle | ISO, MDY | | configuration file | f default_text_search_config | pg_catalog.english | | configuration file | f dynamic_shared_memory_type | posix | | configuration file | f effective_cache_size | 8388608 | 8kB | configuration file | f lc_messages | C | | configuration file | f lc_monetary | C | | configuration file | f lc_numeric | C | | configuration file | f lc_time | C | | configuration file | f listen_addresses | * | | configuration file | f log_destination | stderr | | configuration file | f log_line_prefix | %m [%p] | | configuration file | f log_lock_waits | on | | configuration file | f log_min_duration_statement | 20000 | ms | configuration file | f log_rotation_age | 1440 | min | configuration file | f log_rotation_size | 0 | kB | configuration file | f log_temp_files | 1024 | kB | configuration file | f log_timezone | US/Eastern | | configuration file | f log_truncate_on_rotation | off | | configuration file | f logging_collector | on | | configuration file | f maintenance_work_mem | 1048576 | kB | configuration file | f max_connections | 100 | | configuration file | f max_stack_depth | 2048 | kB | environment variable | f max_wal_senders | 0 | | configuration file | f max_wal_size | 4096 | MB | configuration file | f port | 54310 | | configuration file | f shared_buffers | 1048576 | 8kB | configuration file | f synchronous_commit | off | | configuration file | f TimeZone | US/Eastern | | configuration file | f transaction_deferrable | off | | override | f transaction_isolation | read committed | | override | f transaction_read_only | off | | override | f wal_buffers | 2048 | 8kB | override | f wal_level | minimal | | configuration file | f work_mem | 65536 | kB | configuration file | f select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
Hi, On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: > The following bug has been logged on the website: > > Bug reference: 14932 > Logged by: Todd Cook > Email address: tcook@blackducksoftware.com > PostgreSQL version: 10.1 > Operating system: CentOS Linux release 7.4.1708 (Core) > Description: > > It hangs on a table with 167834 rows, though it works fine with only 167833 > rows. When it hangs, CTRL-C does not interrupt it, and the backend has to > be killed to stop it. > Can you share the query and data, so that we can reproduce the issue? Based on the stack traces this smells like a bug in the simplehash, introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), which gets triggered for 167834 rows (but not for 167833). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 14:17, Tomas Vondra wrote: > Hi, > > On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: >> The following bug has been logged on the website: >> >> Bug reference: 14932 >> Logged by: Todd Cook >> Email address: tcook@blackducksoftware.com >> PostgreSQL version: 10.1 >> Operating system: CentOS Linux release 7.4.1708 (Core) >> Description: >> >> It hangs on a table with 167834 rows, though it works fine with only 167833 >> rows. When it hangs, CTRL-C does not interrupt it, and the backend has to >> be killed to stop it. >> > > Can you share the query and data, so that we can reproduce the issue? > > Based on the stack traces this smells like a bug in the simplehash, > introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), > which gets triggered for 167834 rows (but not for 167833). I've attached a reproducer using real data. (FWIW, I wasn't able to reproduce with fake data made with generate_series().) Also, I forgot to mention that this problem is not present in 9.6 or 9.5. -- todd
Attachment
Re: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.
From
Skarsol
Date:
As a workaround for this issue we've dropped the pgbouncer connection lifetime to 60 seconds and that seems to have alleviated this for the most part. No response from pgbouncer about this (either to the recently created issue or the mailing list message last year when I initially reported this). Based on the comments in the past discussions on the postgres cancel protocol it seems that this is not viewed as a big issue because there's no real reports of it causing problems. Are other people just not using pgbouncer in transaction mode with the default settings (or not having two instances of pgbouncer between client and server)? Or typically don't send many cancel requests? Or is there just something silly I'm missing? In one of our medium databases we see 5-15 cancel requests per day and with pgbouncer on the hard coded default setting (3600 second connection lifetime) we would get around 1 or 2 relevant erroneous cancels (one that causes an insert to fail, typically failing a larger transaction) per week. This was up from about 1 a month using the config default of 1800 seconds that we lived with for a long time. Is there something better we should be using other than pgbouncer for connection pooling? On 2017-11-08 22:55, Skarsol wrote: The following bug has been logged on the website: Bug reference: 14891 Logged by: Skarsol Email address: postgresql(at)skarsol(dot)com PostgreSQL version: 9.6.3 Operating system: Linux 4.4.8-hardened-r1 #4 SMP Mon Jun 12 Description: This might be a symptom of the issue discussed in the ToDo "Changes to make cancellations more reliable and more secure" but as it is related to the pgbouncer bug I've opened at https://github.com/pgbouncer/pgbouncer/issues/245 I figured I'd post it over here just to make sure. As the last step of this bug, pgbouncer 1.7.2 presents a cancel request to postgres 9.6.3. This request targets pid 29330 which is connected to pgbouncer on port 33024. That pid then accepts a new query, returns a result set, accepts another new query, and then cancels that one out. Expected behavior would have been for either no cancel (as that pid was between queries at the time) or to cancel the first query. Cancelling the 2nd query is just weird (to me). I have no idea how much of this is related to whatever pgbouncer is doing to delay the cancel in the first place before presenting it to postgres. I'm aware that we're 2 minor versions behind, but I don't see anything that seems relevant to this in the changelogs. Image of the relevant wireshark display at https://user-images.githubusercontent.com/1915152/32578433-d5d4a71c-c4a2-11e7-9d25-f59d5afbb06b.jpg
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
Hi, On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote: > COPY reproducer (val) FROM stdin; > 2976219712004784288 > -6429122065899879392 > -7471109962990387136 > -7471109962990387136 > -2895470491222113184 > -4083509061952565472 > 1019481548263425664 > 4639248884787347648 > -6999443831165647744 > -4199917803455020480 > -4110530183001439680 How are these values generated? They awfully look like hash values (~same lenght, full numerical range)... I'm kinda neck deep in something else, but I'll try to have a look afterwards. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Thomas Munro
Date:
On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote: > On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote: >> COPY reproducer (val) FROM stdin; >> 2976219712004784288 >> -6429122065899879392 >> -7471109962990387136 >> -7471109962990387136 >> -2895470491222113184 >> -4083509061952565472 >> 1019481548263425664 >> 4639248884787347648 >> -6999443831165647744 >> -4199917803455020480 >> -4110530183001439680 > > How are these values generated? They awfully look like hash values > (~same lenght, full numerical range)... When SH_INSERT tries to insert that final extra value, insertdist keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we double the size (at least until my computer gives up, somewhere around 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB to 26 then it succeeds, but I guess some other attack could be crafted for that. What is the theory behind this parameter? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > When SH_INSERT tries to insert that final extra value, insertdist > keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we > double the size (at least until my computer gives up, somewhere around > 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB > to 26 then it succeeds, but I guess some other attack could be crafted > for that. What is the theory behind this parameter? You beat me to it --- after looking at simplehash.h I'd guessed that either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing an infinite loop, but I'd not gotten to determining which one yet. I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither of them are obviously loop-proof. Note that the sample data has a lot of collisions: regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; hashint8 | count -------------+------- 441526644 | 2337-1117776826 | 1221-1202007016 | 935-2068831050 | 620 1156644653 | 538 553783815| 510 259780770 | 444 371047036 | 394 915722575 | 359... etc etc ... It's evidently more complicated than just that the code fails with more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not by a lot. There needs to be a safety valve that prevents letting the hash fill factor approach zero, which is what's happening in this test case. (I wonder whether these loops oughtn't contain CHECK_FOR_INTERRUPTS, btw.) regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Thomas Munro
Date:
On Tue, Nov 28, 2017 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You beat me to it --- after looking at simplehash.h I'd guessed that > either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing > an infinite loop, but I'd not gotten to determining which one yet. > > I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither > of them are obviously loop-proof. > > Note that the sample data has a lot of collisions: > > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; > hashint8 | count > -------------+------- > 441526644 | 2337 > -1117776826 | 1221 > -1202007016 | 935 > -2068831050 | 620 > 1156644653 | 538 > 553783815 | 510 > 259780770 | 444 > 371047036 | 394 > 915722575 | 359 > ... etc etc ... > > It's evidently more complicated than just that the code fails with > more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not > by a lot. There needs to be a safety valve that prevents letting > the hash fill factor approach zero, which is what's happening in > this test case. Yeah. If you count distinct hashint8(val) of *distinct* values, you get: postgres=# select hashint8(val), count(*) from (select distinct val from reproducer) ss group by 1 order by 2 desc limit 1; hashint8 | count ------------+-------1288181237 | 26 (1 row) It doesn't matter how many bits of that you use, you'll get at least 26 collisions, so our loop can't terminate just by increasing the mask size. My guess is that you'd either need a defence based on something like a secondary hash function, or at least a dynamic SH_GROW_MAX_DIB that wouldn't allow the fill factor to plummet as you say. A dataset could be found that would exceed any static value of SH_GROW_MAX_DIB by brute force. In this case, considering the definition of hashint8, there are more straightforward ways to find distinct bigint values that hash to the same value... just swap some bits between the high and low words, or something like that. -- Thomas Munro http://www.enterprisedb.com
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 11/28/2017 03:20 AM, Thomas Munro wrote: > On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote: >>> COPY reproducer (val) FROM stdin; >>> 2976219712004784288 >>> -6429122065899879392 >>> -7471109962990387136 >>> -7471109962990387136 >>> -2895470491222113184 >>> -4083509061952565472 >>> 1019481548263425664 >>> 4639248884787347648 >>> -6999443831165647744 >>> -4199917803455020480 >>> -4110530183001439680 >> >> How are these values generated? They awfully look like hash values >> (~same lenght, full numerical range)... > > When SH_INSERT tries to insert that final extra value, insertdist > keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we > double the size (at least until my computer gives up, somewhere around > 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB > to 26 then it succeeds, but I guess some other attack could be crafted > for that. What is the theory behind this parameter? > Yeah, I came to the same hypothesis yesterday, but I see I failed to keep pgsql-bugs on CC :-( FWIW I believe the last doubling (from 2147483648 to 2*2147483648) is doomed to fail due to the sizemask=0 bug. I mean, if oldsize=2147483648, then newsize=2*2147483648, which triggers this: if (tb->size == SH_MAX_SIZE) tb->sizemask = 0; which would explain why the last grow did not complete even after 40 minutes (while the 1073741824->2147483648 took 15 seconds). Because with sizemask=0 the SH_NEXT/SH_PREV/.. can only ever returns 0. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 17:10, Andres Freund wrote: > Hi, > > On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote: >> COPY reproducer (val) FROM stdin; >> 2976219712004784288 >> -6429122065899879392 >> -7471109962990387136 >> -7471109962990387136 >> -2895470491222113184 >> -4083509061952565472 >> 1019481548263425664 >> 4639248884787347648 >> -6999443831165647744 >> -4199917803455020480 >> -4110530183001439680 > > How are these values generated? They awfully look like hash values > (~same lenght, full numerical range)... They are biased hashes. FWIW, the reproducer data is from one column of a three-column primary key. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 23:03, Tom Lane wrote: > > Note that the sample data has a lot of collisions: > > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; > hashint8 | count > -------------+------- > 441526644 | 2337 > -1117776826 | 1221 > -1202007016 | 935 > -2068831050 | 620 > 1156644653 | 538 > 553783815 | 510 > 259780770 | 444 > 371047036 | 394 > 915722575 | 359 > ... etc etc ... In case it matters, the complete data set will have some outlier values with 10k to 100k collisions in this column. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 11/28/2017 03:55 PM, Todd A. Cook wrote: > On 11/27/17 23:03, Tom Lane wrote: >> >> Note that the sample data has a lot of collisions: >> >> regression=# select hashint8(val), count(*) from reproducer group by 1 >> order by 2 desc; >> hashint8 | count >> -------------+------- >> 441526644 | 2337 >> -1117776826 | 1221 >> -1202007016 | 935 >> -2068831050 | 620 >> 1156644653 | 538 >> 553783815 | 510 >> 259780770 | 444 >> 371047036 | 394 >> 915722575 | 359 >> ... etc etc ... > > In case it matters, the complete data set will have some outlier values > with 10k to 100k collisions in this column. > In the original values? Not a big deal, I guess. It's the hashint8 collisions that's causing the infinite loop, i.e. different values with the same hashint8 result. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote: > On 11/27/17 23:03, Tom Lane wrote: > > > > Note that the sample data has a lot of collisions: > > > > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; > > hashint8 | count > > -------------+------- > > 441526644 | 2337 > > -1117776826 | 1221 > > -1202007016 | 935 > > -2068831050 | 620 > > 1156644653 | 538 > > 553783815 | 510 > > 259780770 | 444 > > 371047036 | 394 > > 915722575 | 359 > > ... etc etc ... > > In case it matters, the complete data set will have some outlier values with 10k to 100k > collisions in this column. To make sure we're on the same page, this is data intentionally created to have a lot of hash collisions, is that right? Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/28/17 13:14, Andres Freund wrote: > On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote: >> On 11/27/17 23:03, Tom Lane wrote: >>> >>> Note that the sample data has a lot of collisions: >>> >>> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; >>> hashint8 | count >>> -------------+------- >>> 441526644 | 2337 >>> -1117776826 | 1221 >>> -1202007016 | 935 >>> -2068831050 | 620 >>> 1156644653 | 538 >>> 553783815 | 510 >>> 259780770 | 444 >>> 371047036 | 394 >>> 915722575 | 359 >>> ... etc etc ... >> >> In case it matters, the complete data set will have some outlier values with 10k to 100k >> collisions in this column. > > To make sure we're on the same page, this is data intentionally created > to have a lot of hash collisions, is that right? More or less. It might be more accurate to say that it's created in such a way that we expect to get lots of collisions. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 23:03, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> When SH_INSERT tries to insert that final extra value, insertdist >> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >> double the size (at least until my computer gives up, somewhere around >> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >> to 26 then it succeeds, but I guess some other attack could be crafted >> for that. What is the theory behind this parameter? > > You beat me to it --- after looking at simplehash.h I'd guessed that > either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing > an infinite loop, but I'd not gotten to determining which one yet. > > I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither > of them are obviously loop-proof. > > Note that the sample data has a lot of collisions: > > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; > hashint8 | count > -------------+------- > 441526644 | 2337 > -1117776826 | 1221 > -1202007016 | 935 > -2068831050 | 620 > 1156644653 | 538 > 553783815 | 510 > 259780770 | 444 > 371047036 | 394 > 915722575 | 359 > ... etc etc ... > > It's evidently more complicated than just that the code fails with > more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not > by a lot. There needs to be a safety valve that prevents letting > the hash fill factor approach zero, which is what's happening in > this test case. FWIW, I can also reproduce the infinite loop with 167834 unique values. It kinda looks like the problematic collisions arise from masking the computed hash values; e.g.: SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash) { return hash & tb->sizemask; } (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.) -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/05/2017 10:23 PM, Todd A. Cook wrote: > On 11/27/17 23:03, Tom Lane wrote: >> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>> When SH_INSERT tries to insert that final extra value, insertdist >>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >>> double the size (at least until my computer gives up, somewhere around >>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >>> to 26 then it succeeds, but I guess some other attack could be crafted >>> for that. What is the theory behind this parameter? >> >> You beat me to it --- after looking at simplehash.h I'd guessed that >> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing >> an infinite loop, but I'd not gotten to determining which one yet. >> >> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither >> of them are obviously loop-proof. >> >> Note that the sample data has a lot of collisions: >> >> regression=# select hashint8(val), count(*) from reproducer group by 1 >> order by 2 desc; >> hashint8 | count >> -------------+------- >> 441526644 | 2337 >> -1117776826 | 1221 >> -1202007016 | 935 >> -2068831050 | 620 >> 1156644653 | 538 >> 553783815 | 510 >> 259780770 | 444 >> 371047036 | 394 >> 915722575 | 359 >> ... etc etc ... >> >> It's evidently more complicated than just that the code fails with >> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not >> by a lot. There needs to be a safety valve that prevents letting >> the hash fill factor approach zero, which is what's happening in >> this test case. > > FWIW, I can also reproduce the infinite loop with 167834 unique values. > Unique values or unique *hash* values? Can you share the data, so that whoever fixes the bug can verify it also fixes your example? > It kinda looks like the problematic collisions arise from masking the > computed hash values; e.g.: > > SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash) > { > return hash & tb->sizemask; > } > That's pretty standard way to do modulo (when computing index of the hash bucket), and certainly not the root cause. > (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.) > Based on the initial discussion, the problem here is twofold. Firstly, the code assumes that if it gets certain number of bucket collisions (essentially, the initial bucket and certain number of neighboring buckets already being full), making the table larger is guaranteed to reduce the number of collisions. Which is obviously untrue for duplicate hash values, but it may also happen for distinct hash values that form a chain (think a sequence of hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). This causes the insane growth to the largest possible hash table. Secondly, there's a bug that for the largest hash table we set sizemask=0, which means we never ever get bucket index other than 0. This causes the infinite loop. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 12/05/17 17:01, Tomas Vondra wrote: > > > On 12/05/2017 10:23 PM, Todd A. Cook wrote: >> On 11/27/17 23:03, Tom Lane wrote: >>> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>>> When SH_INSERT tries to insert that final extra value, insertdist >>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >>>> double the size (at least until my computer gives up, somewhere around >>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >>>> to 26 then it succeeds, but I guess some other attack could be crafted >>>> for that. What is the theory behind this parameter? >>> >>> You beat me to it --- after looking at simplehash.h I'd guessed that >>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing >>> an infinite loop, but I'd not gotten to determining which one yet. >>> >>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither >>> of them are obviously loop-proof. >>> >>> Note that the sample data has a lot of collisions: >>> >>> regression=# select hashint8(val), count(*) from reproducer group by 1 >>> order by 2 desc; >>> hashint8 | count >>> -------------+------- >>> 441526644 | 2337 >>> -1117776826 | 1221 >>> -1202007016 | 935 >>> -2068831050 | 620 >>> 1156644653 | 538 >>> 553783815 | 510 >>> 259780770 | 444 >>> 371047036 | 394 >>> 915722575 | 359 >>> ... etc etc ... >>> >>> It's evidently more complicated than just that the code fails with >>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not >>> by a lot. There needs to be a safety valve that prevents letting >>> the hash fill factor approach zero, which is what's happening in >>> this test case. >> >> FWIW, I can also reproduce the infinite loop with 167834 unique values. >> > > Unique values or unique *hash* values? Unique values. > Can you share the data, so that whoever fixes the bug can verify it also > fixes your example? Sure. It's attached. -- todd
Attachment
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/05/2017 11:33 PM, Todd A. Cook wrote: > On 12/05/17 17:01, Tomas Vondra wrote: >> >> >> On 12/05/2017 10:23 PM, Todd A. Cook wrote: >>> On 11/27/17 23:03, Tom Lane wrote: >>>> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>>>> When SH_INSERT tries to insert that final extra value, insertdist >>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >>>>> double the size (at least until my computer gives up, somewhere around >>>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >>>>> to 26 then it succeeds, but I guess some other attack could be crafted >>>>> for that. What is the theory behind this parameter? >>>> >>>> You beat me to it --- after looking at simplehash.h I'd guessed that >>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing >>>> an infinite loop, but I'd not gotten to determining which one yet. >>>> >>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither >>>> of them are obviously loop-proof. >>>> >>>> Note that the sample data has a lot of collisions: >>>> >>>> regression=# select hashint8(val), count(*) from reproducer group by 1 >>>> order by 2 desc; >>>> hashint8 | count >>>> -------------+------- >>>> 441526644 | 2337 >>>> -1117776826 | 1221 >>>> -1202007016 | 935 >>>> -2068831050 | 620 >>>> 1156644653 | 538 >>>> 553783815 | 510 >>>> 259780770 | 444 >>>> 371047036 | 394 >>>> 915722575 | 359 >>>> ... etc etc ... >>>> >>>> It's evidently more complicated than just that the code fails with >>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not >>>> by a lot. There needs to be a safety valve that prevents letting >>>> the hash fill factor approach zero, which is what's happening in >>>> this test case. >>> >>> FWIW, I can also reproduce the infinite loop with 167834 unique values. >>> >> >> Unique values or unique *hash* values? > > Unique values. > > >> Can you share the data, so that whoever fixes the bug can verify it also >> fixes your example? > > Sure. It's attached. > Seems the dataset has pretty much the same issue as the one reported before, that is select hashint8(val), count(distinct val), count(*) from temp_f_03 group by 1 order by 2 desc; hashint8 | count | count -------------+-------+------- -1971396144 | 45 | 45 2035896403 | 42 | 42 -1633843397 | 30 | 30 1425704662 | 29 | 29 -455482779 | 22 | 22 -300163565 | 17 | 17 -1803963420 | 17 | 17 -537082846 | 14 | 14 603707034 | 13 | 13 -176369887 | 12 | 12 1274957136 | 11 | 11 1465522632 | 11 | 11 -1589862230 | 10 | 10 -1145403239 | 10 | 10 i.e. there are many hash collisions (more than in the other data set). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 12/06/17 08:33, Tomas Vondra wrote: > >>> Can you share the data, so that whoever fixes the bug can verify it also >>> fixes your example? >> >> Sure. It's attached. >> > > Seems the dataset has pretty much the same issue as the one reported > before, that is > > select hashint8(val), count(distinct val), count(*) from temp_f_03 group > by 1 order by 2 desc; > > hashint8 | count | count > -------------+-------+------- > -1971396144 | 45 | 45 > 2035896403 | 42 | 42 > -1633843397 | 30 | 30 > 1425704662 | 29 | 29 > -455482779 | 22 | 22 > -300163565 | 17 | 17 > -1803963420 | 17 | 17 > -537082846 | 14 | 14 > 603707034 | 13 | 13 > -176369887 | 12 | 12 > 1274957136 | 11 | 11 > 1465522632 | 11 | 11 > -1589862230 | 10 | 10 > -1145403239 | 10 | 10 > > i.e. there are many hash collisions (more than in the other data set). If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c, it might be magnifying the difficulties here. The least significant bits, which are used as the bucket number in simplehash.h, are not very well distributed: select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ; val | to_hex | to_hex ----------------------+------------------+---------- 4444319256653758784 | 3dad64d121468140 | 805ffffe 554179993563924608 | 7b0d7c49a018880 | 84dffffb -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7 -4706811054739454944 | beae0c48915f8420 | 191ffff6 618200668902031424 | 8944a3ba5d08040 | 2a3ffff0 5074043922812601024 | 466aa01079f982c0 | 7effffee -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea -4597800992953433792 | c031545b6b128140 | b1dfffea 8563040839807173408 | 76d608465dde8320 | 7d9fffe6 6092569112843158816 | 548d27540c888520 | 6f9fffe6 -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5 -1786712428165627488 | e7345283536981a0 | 73ffffe5 -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8 88426174078092128 | 13a272306c58360 | b57fffd8 -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3 -- todd
"Todd A. Cook" <tcook@blackducksoftware.com> writes: > If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c, > it might be magnifying the difficulties here. The least significant bits, > which are used as the bucket number in simplehash.h, are not very well > distributed: > select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ; > val | to_hex | to_hex > ----------------------+------------------+---------- > 4444319256653758784 | 3dad64d121468140 | 805ffffe > 554179993563924608 | 7b0d7c49a018880 | 84dffffb > -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7 > -4706811054739454944 | beae0c48915f8420 | 191ffff6 > 618200668902031424 | 8944a3ba5d08040 | 2a3ffff0 > 5074043922812601024 | 466aa01079f982c0 | 7effffee > -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea > -4597800992953433792 | c031545b6b128140 | b1dfffea > 8563040839807173408 | 76d608465dde8320 | 7d9fffe6 > 6092569112843158816 | 548d27540c888520 | 6f9fffe6 > -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5 > -1786712428165627488 | e7345283536981a0 | 73ffffe5 > -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8 > 88426174078092128 | 13a272306c58360 | b57fffd8 > -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3 Checking out hashint8() on random data shows no such obvious fault. You've apparently got a data set that exposes a weakness in hashint8, but it's not very clear what that is. In any case, the hashtable code needs to not fall over in the presence of a lot of collisions, regardless of the exact reason for there being a lot. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-12-06 12:14:24 -0500, Tom Lane wrote: > Checking out hashint8() on random data shows no such obvious fault. > You've apparently got a data set that exposes a weakness in hashint8, > but it's not very clear what that is. It's intentionally designed to cause problems afaict: http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com > In any case, the hashtable code needs to not fall over in the > presence of a lot of collisions, regardless of the exact reason > for there being a lot. Yes, we need to be more resilient about it. Working on a patch. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 12/06/17 12:19, Andres Freund wrote: > On 2017-12-06 12:14:24 -0500, Tom Lane wrote: >> Checking out hashint8() on random data shows no such obvious fault. >> You've apparently got a data set that exposes a weakness in hashint8, >> but it's not very clear what that is. > > It's intentionally designed to cause problems afaict: > > http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com Definitely not intentionally. My wording there could have been better; perhaps "we expect to get lots of duplicates in that column and we then need to find the unique values". Eventually, a substantial number of the duplicates will be deleted, but that's further along in the process. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/05/2017 11:01 PM, Tomas Vondra wrote: > > Based on the initial discussion, the problem here is twofold. > > Firstly, the code assumes that if it gets certain number of bucket > collisions (essentially, the initial bucket and certain number of > neighboring buckets already being full), making the table larger is > guaranteed to reduce the number of collisions. > > Which is obviously untrue for duplicate hash values, but it may also > happen for distinct hash values that form a chain (think a sequence of > hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). > I've managed to construct (by sheer brute force) an example that breaks simplehash in this way. It triggers the growth by having a sequence of distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and then trying to insert a value with duplicate hash (say, 4). Which will trigger the "emptydist" condition (as opposed to "insertdist", as in the other example). This means there are very few collisions (no hash value matching more than 2 distinct values), and all the hash values are "at the beginning of the range" so growing the hash table can't possible break the hash sequences. The generated datasets are quite large (16MB each), so I've placed them here (along with the ugly C code to generate it): https://github.com/tvondra/simplehash-break Each CSV file contains ~1M of strings with distinct hashes between 0 and 1048576 (essentially, covering about 97.5% of the range). There are "chains" of up to ~300 consecutive hash values. This works: CREATE TABLE hash_text (val text); COPY hash_text FROM '/tmp/data1.csv'; SET work_mem = '1GB'; SELECT DISTINCT val FROM hash_text; but this does not: CREATE TABLE hash_text (val text); COPY hash_text FROM '/tmp/data1.csv'; COPY hash_text FROM '/tmp/data2.csv'; SET work_mem = '1GB'; SELECT DISTINCT val FROM hash_text; In fact, only a small subset of the second data set should be needed. FWIW I first tried to do the same thing with bigint values, but no matter what I did I've been unable to generate a dataset covering more than ~60% of the range (essentially hashint8 only ever produces 60% of values from [0,1000000], which likely increases collision rate). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote: > On 12/05/2017 11:01 PM, Tomas Vondra wrote: > > > > Based on the initial discussion, the problem here is twofold. > > > > Firstly, the code assumes that if it gets certain number of bucket > > collisions (essentially, the initial bucket and certain number of > > neighboring buckets already being full), making the table larger is > > guaranteed to reduce the number of collisions. > > > > Which is obviously untrue for duplicate hash values, but it may also > > happen for distinct hash values that form a chain (think a sequence of > > hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). > I've managed to construct (by sheer brute force) an example that breaks > simplehash in this way. It triggers the growth by having a sequence of > distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and > then trying to insert a value with duplicate hash (say, 4). I fail to be excited by this. That's possible for any sort of hashtable in some way. You might hit issues due to resizing or due to lookup performance, but you'll hit problem. That's a fairly fundamental issue of pure hashtables. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 08:35 PM, Andres Freund wrote: > On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote: >> On 12/05/2017 11:01 PM, Tomas Vondra wrote: >>> >>> Based on the initial discussion, the problem here is twofold. >>> >>> Firstly, the code assumes that if it gets certain number of bucket >>> collisions (essentially, the initial bucket and certain number of >>> neighboring buckets already being full), making the table larger is >>> guaranteed to reduce the number of collisions. >>> >>> Which is obviously untrue for duplicate hash values, but it may also >>> happen for distinct hash values that form a chain (think a sequence of >>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). > >> I've managed to construct (by sheer brute force) an example that breaks >> simplehash in this way. It triggers the growth by having a sequence of >> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and >> then trying to insert a value with duplicate hash (say, 4). > > I fail to be excited by this. That's possible for any sort of hashtable > in some way. You might hit issues due to resizing or due to lookup > performance, but you'll hit problem. That's a fairly fundamental issue > of pure hashtables. > Eh? I think you misunderstood the issue this triggers. I'm perfectly fine with poor lookup performance - that's expected and reasonable. But it actually triggers the same behavior as the already presented example, i.e. the hash table grows indefinitely (so eventually gets OOM). I really doubt allocating 3GB of memory (which is when my laptop gives up and kills the query) to do distinct on 50MB table is a sane behavior. Even if the table is constructed in so intentionally evil way. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
From
Andres Freund
Date:
On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > >On 12/06/2017 08:35 PM, Andres Freund wrote: >> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote: >>> On 12/05/2017 11:01 PM, Tomas Vondra wrote: >>>> >>>> Based on the initial discussion, the problem here is twofold. >>>> >>>> Firstly, the code assumes that if it gets certain number of bucket >>>> collisions (essentially, the initial bucket and certain number of >>>> neighboring buckets already being full), making the table larger is >>>> guaranteed to reduce the number of collisions. >>>> >>>> Which is obviously untrue for duplicate hash values, but it may >also >>>> happen for distinct hash values that form a chain (think a sequence >of >>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). >> >>> I've managed to construct (by sheer brute force) an example that >breaks >>> simplehash in this way. It triggers the growth by having a sequence >of >>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, >and >>> then trying to insert a value with duplicate hash (say, 4). >> >> I fail to be excited by this. That's possible for any sort of >hashtable >> in some way. You might hit issues due to resizing or due to lookup >> performance, but you'll hit problem. That's a fairly fundamental >issue >> of pure hashtables. >> > >Eh? I think you misunderstood the issue this triggers. I'm perfectly >fine with poor lookup performance - that's expected and reasonable. But >it actually triggers the same behavior as the already presented >example, >i.e. the hash table grows indefinitely (so eventually gets OOM). > >I really doubt allocating 3GB of memory (which is when my laptop gives >up and kills the query) to do distinct on 50MB table is a sane >behavior. >Even if the table is constructed in so intentionally evil way. No, I got that. I just don't think time and memory are that different. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 09:00 PM, Andres Freund wrote: > > > On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> >> On 12/06/2017 08:35 PM, Andres Freund wrote: >>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote: >>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote: >>>>> >>>>> Based on the initial discussion, the problem here is twofold. >>>>> >>>>> Firstly, the code assumes that if it gets certain number of bucket >>>>> collisions (essentially, the initial bucket and certain number of >>>>> neighboring buckets already being full), making the table larger is >>>>> guaranteed to reduce the number of collisions. >>>>> >>>>> Which is obviously untrue for duplicate hash values, but it may >> also >>>>> happen for distinct hash values that form a chain (think a sequence >> of >>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). >>> >>>> I've managed to construct (by sheer brute force) an example that >> breaks >>>> simplehash in this way. It triggers the growth by having a sequence >> of >>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, >> and >>>> then trying to insert a value with duplicate hash (say, 4). >>> >>> I fail to be excited by this. That's possible for any sort of >> hashtable >>> in some way. You might hit issues due to resizing or due to lookup >>> performance, but you'll hit problem. That's a fairly fundamental >> issue >>> of pure hashtables. >>> >> >> Eh? I think you misunderstood the issue this triggers. I'm perfectly >> fine with poor lookup performance - that's expected and reasonable. But >> it actually triggers the same behavior as the already presented >> example, >> i.e. the hash table grows indefinitely (so eventually gets OOM). >> >> I really doubt allocating 3GB of memory (which is when my laptop gives >> up and kills the query) to do distinct on 50MB table is a sane >> behavior. >> Even if the table is constructed in so intentionally evil way. > > No, I got that. I just don't think time and memory are that different. > It's one thing when the hash table takes longer to lookup something or when it consumes a bit more memory. Say, ~2x more than needed, give or take. I'm perfectly fine with that, particularly when it's a worst-case evil data set like this one. But it's a very different situation when it consumes tens of GBs of memory (essentially building the largest possible hash table, despite the load factor dropping to 0), because the code grows the hash table assuming it will magically resolve the collisions. And even if the machine has enough RAM, it's guaranteed to fail we'll try growing the table beyond that limit. I don't quite understand why should that be OK? It doesn't seem very sane to me. FWIW I've constructed the data sets for two reasons - to convince myself that my understanding of the simplehash code is correct, and to provide a data set triggering the other growth condition in simplehash code. My understanding is that if we stop growing the table after the load factor drops below some threshold (as TL proposed earlier in this thread), it should address both of these cases. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: > It's one thing when the hash table takes longer to lookup something or longer aka "forever". > when it consumes a bit more memory. Say, ~2x more than needed, give or > take. I'm perfectly fine with that, particularly when it's a worst-case > evil data set like this one. I think the way to prevent that kind of attack is to add randomization. > FWIW I've constructed the data sets for two reasons - to convince myself > that my understanding of the simplehash code is correct, and to provide > a data set triggering the other growth condition in simplehash code. My > understanding is that if we stop growing the table after the load factor > drops below some threshold (as TL proposed earlier in this thread), it > should address both of these cases. Yea, I'm not adverse to adding a few stopgaps that break in a less annoying manner. WAll I'm saying is that I don't think we need to be super concerned about this specific way of breaking things. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Stephen Frost
Date:
Andres, all, * Andres Freund (andres@anarazel.de) wrote: > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: > > FWIW I've constructed the data sets for two reasons - to convince myself > > that my understanding of the simplehash code is correct, and to provide > > a data set triggering the other growth condition in simplehash code. My > > understanding is that if we stop growing the table after the load factor > > drops below some threshold (as TL proposed earlier in this thread), it > > should address both of these cases. > > Yea, I'm not adverse to adding a few stopgaps that break in a less > annoying manner. WAll I'm saying is that I don't think we need to be > super concerned about this specific way of breaking things. I think I'm agreeing with Tomas here when I say that we shouldn't have a system that breaks in the "tries to allocate infinite memory" way when the load factor drops far below some reasonable threshold and that we *should* be concerned that we have this problem in the existing implementation. It's bad, and we should care about that and fix it. Running forever in a loop isn't great either, but at least you only expend at most one CPU to it and can go kill it, assuming we have an appropriate CHECK_FOR_INTERRUPTS() placement. Running the box out of memory can lead to other things *failing* (even entirely reasonable things!), not just being slower. That's an important difference. Thanks! Stephen
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 09:46 PM, Andres Freund wrote: > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: >> It's one thing when the hash table takes longer to lookup something or > > longer aka "forever". > Not necessarily. The datasets I shared are somewhat extreme in the sense that there are many contiguous sequences of hash values, but it only takes one such sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So the hash table may still be perfectly fine for most keys, and only slightly slower for the keys in the sequence. > >> when it consumes a bit more memory. Say, ~2x more than needed, give or >> take. I'm perfectly fine with that, particularly when it's a worst-case >> evil data set like this one. > > I think the way to prevent that kind of attack is to add randomization. > By randomization you mean universal hashing [1], or something else? [1] https://en.wikipedia.org/wiki/Universal_hashing In any case, randomization should make it much harder (or pretty much impossible) to construct datasets that are guaranteed to cause failures of this kind. No problem with that (although if someone hits the issue, it will be more difficult to reproduce). It would still be possible to hit it by chance, but that should be very unlikely (OTOH the current code assumes the same thing about collisions and sequences, and here we are.) > >> FWIW I've constructed the data sets for two reasons - to convince myself >> that my understanding of the simplehash code is correct, and to provide >> a data set triggering the other growth condition in simplehash code. My >> understanding is that if we stop growing the table after the load factor >> drops below some threshold (as TL proposed earlier in this thread), it >> should address both of these cases. > > Yea, I'm not adverse to adding a few stopgaps that break in a less > annoying manner. WAll I'm saying is that I don't think we need to be > super concerned about this specific way of breaking things. > OK, understood. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote: > On 12/06/2017 09:46 PM, Andres Freund wrote: > > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: > >> It's one thing when the hash table takes longer to lookup something or > > > > longer aka "forever". > > > > Not necessarily. > The datasets I shared are somewhat extreme in the sense that there are > many contiguous sequences of hash values, but it only takes one such > sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So > the hash table may still be perfectly fine for most keys, and only > slightly slower for the keys in the sequence. Meh, we're talking about adversarial attacks here. > >> when it consumes a bit more memory. Say, ~2x more than needed, give or > >> take. I'm perfectly fine with that, particularly when it's a worst-case > >> evil data set like this one. > > > > I think the way to prevent that kind of attack is to add randomization. > > > > By randomization you mean universal hashing [1], or something else? No, adding in a random seed to the hash. It'd be a lot better if we had a way to provide internal state to the hashfunction, but even just using hash_combine()ing a random number into a hash would be a lot better than what we're doing now. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 10:12 PM, Andres Freund wrote: > On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote: >> On 12/06/2017 09:46 PM, Andres Freund wrote: >>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: >>>> It's one thing when the hash table takes longer to lookup something or >>> >>> longer aka "forever". >>> >> >> Not necessarily. > >> The datasets I shared are somewhat extreme in the sense that there are >> many contiguous sequences of hash values, but it only takes one such >> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So >> the hash table may still be perfectly fine for most keys, and only >> slightly slower for the keys in the sequence. > > Meh, we're talking about adversarial attacks here. > > >>>> when it consumes a bit more memory. Say, ~2x more than needed, give or >>>> take. I'm perfectly fine with that, particularly when it's a worst-case >>>> evil data set like this one. >>> >>> I think the way to prevent that kind of attack is to add randomization. >>> >> >> By randomization you mean universal hashing [1], or something else? > > No, adding in a random seed to the hash. It'd be a lot better if we > had a way to provide internal state to the hashfunction, but even > just using hash_combine()ing a random number into a hash would be a > lot better than what we're doing now. > The universal hashing (of "our" hash value) would have the benefit that we could change it when growing the hash table. I.e. do this h(x) = ((A * hash_any(x) + B) mod P)) mod M and generate new A, B, P whenever we grow the hashtable. That would further reduce the probability of either of the failures described in this thread (I mean, hitting it again after growing the table). I don't know if it's worth the additional CPU cycles, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 12/06/17 16:12, Andres Freund wrote: > >> The datasets I shared are somewhat extreme in the sense that there are >> many contiguous sequences of hash values, but it only takes one such >> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So >> the hash table may still be perfectly fine for most keys, and only >> slightly slower for the keys in the sequence. > > Meh, we're talking about adversarial attacks here. Hmmmmm... I found this problem when I dropped 10.1 into a test environment to see what would happen. There was no deliberate attempt to break anything. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote: > On 12/06/17 16:12, Andres Freund wrote: > > > > > The datasets I shared are somewhat extreme in the sense that there are > > > many contiguous sequences of hash values, but it only takes one such > > > sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So > > > the hash table may still be perfectly fine for most keys, and only > > > slightly slower for the keys in the sequence. > > > > Meh, we're talking about adversarial attacks here. > > Hmmmmm... > > I found this problem when I dropped 10.1 into a test environment to see > what would happen. There was no deliberate attempt to break anything. Read Thomas' message at: http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote: >> I found this problem when I dropped 10.1 into a test environment to see >> what would happen. There was no deliberate attempt to break anything. > Read Thomas' message at: http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com I'm confused by Tomas' claim that >> (essentially hashint8 only ever produces 60% of >> values from [0,1000000], which likely increases collision rate). This is directly contradicted by the simple experiments I've done, eg regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v; count -------- 999879 (1 row) regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v; count -------- 644157 (1 row) regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v; count --------- 1048514 (1 row) It's certainly not perfect, but I'm not observing any major failure to span the output space. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 11:55 PM, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote: >>> I found this problem when I dropped 10.1 into a test environment to see >>> what would happen. There was no deliberate attempt to break anything. > >> Read Thomas' message at: http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com > > I'm confused by Tomas' claim that > >>> (essentially hashint8 only ever produces 60% of >>> values from [0,1000000], which likely increases collision rate). > > This is directly contradicted by the simple experiments I've done, eg > > regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v; > count > -------- > 999879 > (1 row) > > regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v; > count > -------- > 644157 > (1 row) > > regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v; > count > --------- > 1048514 > (1 row) > > It's certainly not perfect, but I'm not observing any major failure to > span the output space. > That is not the claim I was making, though. When generating data sets I've been considering only values where hashint8(v) is between 0 and 1000000 *without* the masking. Otherwise growing the hash table would break the "chain" of values in the hash table, which would resolve the issue with SH_GROW_MAX_MOVE. So you'd have to do something like this: select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v where hashint8(v) between 0 and 1024*1024; but to get some numbers you'd have to also increase the value passed to generate_series (because it'll filter most of the values). Which is why I generated the data by an ugly C program, similar to the attached one. It keeps a small bitmap for generated hashes in the [0,1M] range, and prints the number of bits set after every 1e9 values processed. What I do see is this: i=1000000000 nset=217666 i=2000000000 nset=389526 i=3000000000 nset=525135 i=4000000000 nset=632305 i=5000000000 nset=659574 i=6000000000 nset=659574 i=7000000000 nset=659574 i=8000000000 nset=659574 i=9000000000 nset=659574 ... So somewhere between 4e9 and 5e9 we generate 659574 hashes in the range, and then never increase it further. I don't know if this an issue in practice, but it seems that for large hash tables (on bigint), growing the hash table may not be that great improvement because we're not really doubling the number of slots we can hit directly. We'll probably find an empty slot nearby, though. It was just an observation - I only noticed that because I tried to construct the adversarial data set on bigint, and couldn't make it work no matter what. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/06/2017 06:19 PM, Andres Freund wrote: > On 2017-12-06 12:14:24 -0500, Tom Lane wrote: >> Checking out hashint8() on random data shows no such obvious fault. >> You've apparently got a data set that exposes a weakness in hashint8, >> but it's not very clear what that is. > > It's intentionally designed to cause problems afaict: > > http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com > > >> In any case, the hashtable code needs to not fall over in the >> presence of a lot of collisions, regardless of the exact reason >> for there being a lot. > > Yes, we need to be more resilient about it. Working on a patch. > I've been playing with this a bit today (sorry for trespassing on your turf a bit, but curiosity killed the cat), particularly with the tree basic ideas mentioned in this thread. Let me share some numbers. 1) randomized hashing, i.e. using the extended hash functions and a random seed 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p) formula, where h is "our" hash, and (a,b) are random and p is a prime. This can't possibly fix the collisions reported by Todd, because the result will remain the same. But it should fix "my" chains of values by spreading them a bit (and the a,b,p values are generated on each grow, so in the unlikely case where we get a chain, it should disappear after a growth) [1] https://en.wikipedia.org/wiki/Universal_hashing 3) disabling growth of the hash table when the fillfactor would drop too much (e.g. below 0.2) Very rough patches for these three approaches are attached. Now, some very rough numbers comparing the impact of those approaches (and their combinations). The following two tables present timings for for a couple of datasets - five cases without any collisions (allowing us to quantify impact on regular datasets), the two datasets generated by me (one without collisions, one with collisions), and the two issues reported by Todd. RH - randomized hashing (seed + extended) UH - universal hashing ST - stop growing master RH UH ST RH UH ST ------------------------------------------------------- Ok/1 542 540 552 537 0% 2% -1% Ok/2 232 221 242 237 -5% 4% 2% Ok/3 161 167 180 161 4% 12% 0% Ok/4 276 270 285 274 -2% 3% -1% Ok/5 174 179 193 174 3% 11% 0% Tomas/1 543 555 551 533 2% 1% -2% Tomas/2 OOM 2322 3251 - Todd/1 OOM OOM 63 63 Todd/2 OOM OOM OOM 72 This is mostly as expected, although I'm surprised that the universal hashing seems to fix Todd's first reproducer. The "stop growing" approach technically fixes "my" dataset, but it takes so much time I haven't timed it. In terms of slowdown (compared to master), I'd say RH and ST are mostly neutral. Universal hashing has significant impact in some cases, but I suppose that might be fixed by using a slightly different scheme (instead of using simple modulo arithmetic). Now, let's see on combinations of the approaches (broken into two tables, so that it doesn't get mangled by mail clients): master RH+UH RH+ST UH+ST RH+UH+ST Ok/1 542 553 545 549 558 Ok/2 232 241 234 241 250 Ok/3 161 180 170 180 187 Ok/4 276 274 270 285 290 Ok/5 174 187 182 193 198 Tomas/1 543 571 561 554 570 Tomas/2 - 2380 2340 3130 2380 Todd/1 - 61 65 63 64 Todd/2 - - 90 91 93 RH+UH RH+ST UH+ST RH+UH+ST Ok/1 2% 1% 1% 3% Ok/2 4% 1% 4% 8% Ok/3 12% 6% 12% 16% Ok/4 -1% -2% 3% 5% Ok/5 7% 5% 11% 14% Tomas/1 5% 3% 2% 5% It seems only the "stop growth" has to be part of the solution, as it's included in any combination fixing all cases shared in this thread. I'd say (randomized hashing + stop growing) seems like the best option. FWIW I do agree the data sets shared in this thread are pretty extreme and it doesn't make much sense to slow the regular cases. I'll be perfectly happy if we stop the OOM, making those cases fast is a bonus. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 12/07/2017 12:45 AM, Tomas Vondra wrote: > > > On 12/06/2017 11:55 PM, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote: >>>> I found this problem when I dropped 10.1 into a test environment to see >>>> what would happen. There was no deliberate attempt to break anything. >> >>> Read Thomas' message at: http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com >> >> I'm confused by Tomas' claim that >> >>>> (essentially hashint8 only ever produces 60% of >>>> values from [0,1000000], which likely increases collision rate). >> >> This is directly contradicted by the simple experiments I've done, eg >> >> regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v; >> count >> -------- >> 999879 >> (1 row) >> >> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v; >> count >> -------- >> 644157 >> (1 row) >> >> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v; >> count >> --------- >> 1048514 >> (1 row) >> >> It's certainly not perfect, but I'm not observing any major failure to >> span the output space. >> > > That is not the claim I was making, though. When generating data sets > I've been considering only values where hashint8(v) is between 0 and > 1000000 *without* the masking. Otherwise growing the hash table would > break the "chain" of values in the hash table, which would resolve the > issue with SH_GROW_MAX_MOVE. > > So you'd have to do something like this: > > select count(distinct hashint8(v)) > from generate_series(0,1000000::int8) v > where hashint8(v) between 0 and 1024*1024; > > but to get some numbers you'd have to also increase the value passed to > generate_series (because it'll filter most of the values). > > Which is why I generated the data by an ugly C program, similar to the > attached one. It keeps a small bitmap for generated hashes in the [0,1M] > range, and prints the number of bits set after every 1e9 values processed. > > What I do see is this: > > i=1000000000 nset=217666 > i=2000000000 nset=389526 > i=3000000000 nset=525135 > i=4000000000 nset=632305 > i=5000000000 nset=659574 > i=6000000000 nset=659574 > i=7000000000 nset=659574 > i=8000000000 nset=659574 > i=9000000000 nset=659574 > ... > > So somewhere between 4e9 and 5e9 we generate 659574 hashes in the range, > and then never increase it further. > > I don't know if this an issue in practice, but it seems that for large > hash tables (on bigint), growing the hash table may not be that great > improvement because we're not really doubling the number of slots we can > hit directly. We'll probably find an empty slot nearby, though. > > It was just an observation - I only noticed that because I tried to > construct the adversarial data set on bigint, and couldn't make it work > no matter what. > I've done a simple experiment today - computing a hash for every uint32 value, and checking how many distinct hash values that produces. For the 4.2 billion values the results look like this: second key ndistinct ndistinct/nkeys 3.380 i=100000000 nset=98829159 0.99 6.240 i=200000000 nset=195351181 0.98 9.106 i=300000000 nset=289623103 0.97 11.932 i=400000000 nset=381695003 0.95 14.814 i=500000000 nset=471621355 0.94 17.706 i=600000000 nset=559452278 0.93 20.577 i=700000000 nset=645242762 0.92 23.496 i=800000000 nset=729036217 0.91 26.460 i=900000000 nset=810879821 0.90 29.380 i=1000000000 nset=890818778 0.89 32.331 i=1100000000 nset=968898478 0.88 35.282 i=1200000000 nset=1045164189 0.87 38.262 i=1300000000 nset=1119654810 0.86 41.251 i=1400000000 nset=1192424766 0.85 44.258 i=1500000000 nset=1263482593 0.84 47.268 i=1600000000 nset=1332897449 0.83 50.305 i=1700000000 nset=1400717923 0.82 53.356 i=1800000000 nset=1466962823 0.81 56.425 i=1900000000 nset=1531660191 0.81 59.489 i=2000000000 nset=1594856205 0.80 62.593 i=2100000000 nset=1656588855 0.79 65.706 i=2200000000 nset=1716895530 0.78 68.829 i=2300000000 nset=1775809288 0.77 71.966 i=2400000000 nset=1833353377 0.76 75.123 i=2500000000 nset=1889558599 0.76 78.271 i=2600000000 nset=1944463039 0.75 81.418 i=2700000000 nset=1998096496 0.74 84.574 i=2800000000 nset=2050490745 0.73 87.711 i=2900000000 nset=2101666393 0.72 90.852 i=3000000000 nset=2151669155 0.72 93.986 i=3100000000 nset=2200517580 0.71 97.084 i=3200000000 nset=2248232772 0.70 100.172 i=3300000000 nset=2294849331 0.70 103.232 i=3400000000 nset=2340389131 0.69 106.273 i=3500000000 nset=2384875319 0.68 109.272 i=3600000000 nset=2428339639 0.67 112.260 i=3700000000 nset=2470798655 0.67 115.199 i=3800000000 nset=2512271730 0.66 118.125 i=3900000000 nset=2552788321 0.65 121.029 i=4000000000 nset=2592379529 0.65 123.895 i=4100000000 nset=2631056297 0.64 126.726 i=4200000000 nset=2668843329 0.64 129.397 loops 4294967295 set 2703915179 (0.63) That means we only ever generate about 64% of the possible hash keys. That doesn't seem particularly healthy I guess, but for small hash tables (with fewer than 2^32 buckets) that may not be an issue, as some of the values will wrap because of the modulo. For comparison, murmurhash3 and xxhash (both fairly popular hash functions) end with 1:1 ratio of values and hashes, that is not having any collisions at all. Of course, both are somewhat slower than the simple hash functions we use for int/bigint values. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 12/10/17 18:14, Tomas Vondra wrote: > > I've done a simple experiment today - computing a hash for every uint32 > value, and checking how many distinct hash values that produces. For the > 4.2 billion values the results look like this: > > second key ndistinct ndistinct/nkeys > 3.380 i=100000000 nset=98829159 0.99 > 6.240 i=200000000 nset=195351181 0.98 > 9.106 i=300000000 nset=289623103 0.97 > 11.932 i=400000000 nset=381695003 0.95 > 14.814 i=500000000 nset=471621355 0.94 > 17.706 i=600000000 nset=559452278 0.93 > 20.577 i=700000000 nset=645242762 0.92 > 23.496 i=800000000 nset=729036217 0.91 > 26.460 i=900000000 nset=810879821 0.90 > 29.380 i=1000000000 nset=890818778 0.89 > 32.331 i=1100000000 nset=968898478 0.88 > 35.282 i=1200000000 nset=1045164189 0.87 > 38.262 i=1300000000 nset=1119654810 0.86 > 41.251 i=1400000000 nset=1192424766 0.85 > 44.258 i=1500000000 nset=1263482593 0.84 > 47.268 i=1600000000 nset=1332897449 0.83 > 50.305 i=1700000000 nset=1400717923 0.82 > 53.356 i=1800000000 nset=1466962823 0.81 > 56.425 i=1900000000 nset=1531660191 0.81 > 59.489 i=2000000000 nset=1594856205 0.80 > 62.593 i=2100000000 nset=1656588855 0.79 > 65.706 i=2200000000 nset=1716895530 0.78 > 68.829 i=2300000000 nset=1775809288 0.77 > 71.966 i=2400000000 nset=1833353377 0.76 > 75.123 i=2500000000 nset=1889558599 0.76 > 78.271 i=2600000000 nset=1944463039 0.75 > 81.418 i=2700000000 nset=1998096496 0.74 > 84.574 i=2800000000 nset=2050490745 0.73 > 87.711 i=2900000000 nset=2101666393 0.72 > 90.852 i=3000000000 nset=2151669155 0.72 > 93.986 i=3100000000 nset=2200517580 0.71 > 97.084 i=3200000000 nset=2248232772 0.70 > 100.172 i=3300000000 nset=2294849331 0.70 > 103.232 i=3400000000 nset=2340389131 0.69 > 106.273 i=3500000000 nset=2384875319 0.68 > 109.272 i=3600000000 nset=2428339639 0.67 > 112.260 i=3700000000 nset=2470798655 0.67 > 115.199 i=3800000000 nset=2512271730 0.66 > 118.125 i=3900000000 nset=2552788321 0.65 > 121.029 i=4000000000 nset=2592379529 0.65 > 123.895 i=4100000000 nset=2631056297 0.64 > 126.726 i=4200000000 nset=2668843329 0.64 > 129.397 loops 4294967295 set 2703915179 (0.63) > > That means we only ever generate about 64% of the possible hash keys. > That doesn't seem particularly healthy I guess, but for small hash > tables (with fewer than 2^32 buckets) that may not be an issue, as some > of the values will wrap because of the modulo. FWIW, changing hashint8() to int64 val = PG_GETARG_INT64(0); if (val <= UINT32_MAX && val >= 0) return hash_uint32((uint32) val); else return hash_any((unsigned char *) &val, sizeof(val)); fixes the problem for me on all of my data sets. My conjecture is that the existing implementation uint32 lohalf = (uint32) val; uint32 hihalf = (uint32) (val >> 32); lohalf ^= (val >= 0) ? hihalf : ~hihalf; biases lohalf when hashing collections of large-magnitude negative numbers (like my data sets) because the individual bits of lohalf do not have equal probability of being toggled by the exclusive-or. For example, given a value of -9223261146507558080 (0x8000770b48a68340), 15 of the 16 most significant bits in lohalf will toggle, but only half of the least significant will toggle. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 15:59, Todd A. Cook wrote: > On 11/27/17 14:17, Tomas Vondra wrote: >> Hi, >> >> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 14932 >>> Logged by: Todd Cook >>> Email address: tcook@blackducksoftware.com >>> PostgreSQL version: 10.1 >>> Operating system: CentOS Linux release 7.4.1708 (Core) >>> Description: >>> >>> It hangs on a table with 167834 rows, though it works fine with only 167833 >>> rows. When it hangs, CTRL-C does not interrupt it, and the backend has to >>> be killed to stop it. >>> >> >> Can you share the query and data, so that we can reproduce the issue? >> >> Based on the stack traces this smells like a bug in the simplehash, >> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), >> which gets triggered for 167834 rows (but not for 167833). > > I've attached a reproducer using real data. (FWIW, I wasn't able to reproduce with > fake data made with generate_series().) > > Also, I forgot to mention that this problem is not present in 9.6 or 9.5. Is this issue likely to be addressed in 10.2? FWIW, we try to avoid running with local patches, but we are unable to move past 9.6 without a fix... -- todd
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 12/06/2017 06:19 PM, Andres Freund wrote: >> On 2017-12-06 12:14:24 -0500, Tom Lane wrote: >>> In any case, the hashtable code needs to not fall over in the >>> presence of a lot of collisions, regardless of the exact reason >>> for there being a lot. >> Yes, we need to be more resilient about it. Working on a patch. This seems to have fallen through a crack over Christmas holidays. Let's see if we can't push it to a conclusion. > [ Tomas presents some draft patches and performance analysis ] > It seems only the "stop growth" has to be part of the solution, as it's > included in any combination fixing all cases shared in this thread. I'd > say (randomized hashing + stop growing) seems like the best option. > FWIW I do agree the data sets shared in this thread are pretty extreme > and it doesn't make much sense to slow the regular cases. I'll be > perfectly happy if we stop the OOM, making those cases fast is a bonus. Personally I'd say let's just adopt the "stop growing" patch and call it good. In particular, I'm nervous about the idea of backpatching your RH patch because of the ABI hazard from changing struct TupleHashTableData. But perhaps Andres has some different idea in mind ... regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-09 10:36:48 -0500, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > On 12/06/2017 06:19 PM, Andres Freund wrote: > >> On 2017-12-06 12:14:24 -0500, Tom Lane wrote: > >>> In any case, the hashtable code needs to not fall over in the > >>> presence of a lot of collisions, regardless of the exact reason > >>> for there being a lot. > > >> Yes, we need to be more resilient about it. Working on a patch. > > This seems to have fallen through a crack over Christmas holidays. > Let's see if we can't push it to a conclusion. Yea, I'll try to get to it over the weekend. Sorry for the delay. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 11/27/17 14:17, Tomas Vondra wrote: > Hi, > > On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: >> The following bug has been logged on the website: >> >> Bug reference: 14932 >> Logged by: Todd Cook >> Email address: tcook@blackducksoftware.com >> PostgreSQL version: 10.1 >> Operating system: CentOS Linux release 7.4.1708 (Core) >> Description: >> >> It hangs on a table with 167834 rows, though it works fine with only 167833 >> rows. When it hangs, CTRL-C does not interrupt it, and the backend has to >> be killed to stop it. >> > > Can you share the query and data, so that we can reproduce the issue? > > Based on the stack traces this smells like a bug in the simplehash, > introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), > which gets triggered for 167834 rows (but not for 167833). FWIW, changing the guts of hashint8() to + if (val >= INT32_MIN && val <= INT32_MAX) + return hash_uint32((uint32) val); + else + return hash_any((unsigned char *) &val, sizeof(val)); allows us to process a full-sized data set of around 900 million rows. However, memory usage seemed to be rather excessive (we can only run 7 of these jobs in parallel on a 128GB system before the OOM killer kicked in, rather than the usual 24); if there's any interest, I can try to measure exactly how excessive. -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 01/25/2018 11:31 PM, Todd A. Cook wrote: > On 11/27/17 14:17, Tomas Vondra wrote: >> Hi, >> >> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 14932 >>> Logged by: Todd Cook >>> Email address: tcook@blackducksoftware.com >>> PostgreSQL version: 10.1 >>> Operating system: CentOS Linux release 7.4.1708 (Core) >>> Description: >>> >>> It hangs on a table with 167834 rows, though it works fine with only >>> 167833 >>> rows. When it hangs, CTRL-C does not interrupt it, and the backend >>> has to >>> be killed to stop it. >>> >> >> Can you share the query and data, so that we can reproduce the issue? >> >> Based on the stack traces this smells like a bug in the simplehash, >> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), >> which gets triggered for 167834 rows (but not for 167833). > > FWIW, changing the guts of hashint8() to > > + if (val >= INT32_MIN && val <= INT32_MAX) > + return hash_uint32((uint32) val); > + else > + return hash_any((unsigned char *) &val, sizeof(val)); > > allows us to process a full-sized data set of around 900 million rows. > However, > memory usage seemed to be rather excessive (we can only run 7 of these > jobs in parallel > on a 128GB system before the OOM killer kicked in, rather than the usual > 24); if there's > any interest, I can try to measure exactly how excessive. > I suspect you're right the hash is biased to lohalf bits, as you wrote in the 19/12 message. In fact, I think it's a direct consequence of the requirement that hashint8() needs to produce the same hash for logically equivalent int2 and int4 values. Out of curiosity, could you try replacing the hash_any call in hashint8 with a hash function like murmur3, and see if it improves the behavior? That obviously breaks the hashint8 for cross-type hash joins, but it would be interesting bit of information I think. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I suspect you're right the hash is biased to lohalf bits, as you wrote > in the 19/12 message. I don't see any bias in what it's doing, which is basically xoring the two halves and hashing the result. It's possible though that Todd's data set contains values in which corresponding bits of the high and low halves are correlated somehow, in which case the xor would produce a lot of cancellation and a relatively small number of distinct outputs. If we weren't bound by backwards compatibility, we could consider changing to logic more like "if the value is within the int4 range, apply int4hash, otherwise hash all 8 bytes normally". But I don't see how we can change that now that hash indexes are first-class citizens. In any case, we still need a fix for the behavior that the hash table size is blown out by lots of collisions, because that can happen no matter what the hash function is. Andres seems to have dropped the ball on doing something about that. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 01/27/2018 12:22 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I suspect you're right the hash is biased to lohalf bits, as you wrote >> in the 19/12 message. > > I don't see any bias in what it's doing, which is basically xoring the > two halves and hashing the result. It's possible though that Todd's > data set contains values in which corresponding bits of the high and > low halves are correlated somehow, in which case the xor would produce > a lot of cancellation and a relatively small number of distinct outputs. > Hmm, that makes more sense than what I wrote. Probably time to get some sleep or drink more coffee, I guess. BTW what do you think about the fact that we only really generate ~63% of the possible hash values (see my message from 11/12)? That seems a bit unfortunate, although not unexpected for simple hash hunction. > If we weren't bound by backwards compatibility, we could consider > changing to logic more like "if the value is within the int4 range, > apply int4hash, otherwise hash all 8 bytes normally". But I don't see > how we can change that now that hash indexes are first-class > citizens. > Yeah, I've been thinking about that too. But I think it's an issue only for pg_upgraded clusters, which may have have hash indexes (and also hash-partitioned tables). So couldn't we use new hash functions in fresh clusters and use the backwards-compatible ones in pg_upgraded ones?r regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
Hi, On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote: > 1) randomized hashing, i.e. using the extended hash functions and a > random seed Given the lack of extended hash functions for everything this doesn't seem quite viable. But I think we can make it work quite easily nonetheless - we can just do the combining with a random seed outside of the hashtable. We kinda already do so (with a really crappy combiner) in TupleHashTableHash(), except that we'd actually have to use a random seed. This seems like something we should do, just out of defensiveness. It'll not be as good as persisting the hashfunction's state across calls, but well, ... Let me write a patch for that. > 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p) > formula, where h is "our" hash, and (a,b) are random and p is a prime. > This can't possibly fix the collisions reported by Todd, because the > result will remain the same. But it should fix "my" chains of values by > spreading them a bit (and the a,b,p values are generated on each grow, > so in the unlikely case where we get a chain, it should disappear after > a growth) This seems on the too expensive side of things for my taste to do unconditionally. > 3) disabling growth of the hash table when the fillfactor would drop too > much (e.g. below 0.2) Yea, I think we should just apply something like this. > FWIW I do agree the data sets shared in this thread are pretty extreme > and it doesn't make much sense to slow the regular cases. I'll be > perfectly happy if we stop the OOM, making those cases fast is a bonus. Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10, and so something better in 11. And then later possibly backpatch if we've grown some confidence? Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-26 18:22:26 -0500, Tom Lane wrote: > In any case, we still need a fix for the behavior that the hash table size > is blown out by lots of collisions, because that can happen no matter what > the hash function is. Andres seems to have dropped the ball on doing > something about that. Didn't have spare brain cycles :(, and the next backbranch release wasn't yet close reducing immediate urgency a bit. As written nearby I think we should make execGrouping.c users of hashtables use a more random IV, and apply something similar to the growth limit patch from Tomas. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote: >> FWIW I do agree the data sets shared in this thread are pretty extreme >> and it doesn't make much sense to slow the regular cases. I'll be >> perfectly happy if we stop the OOM, making those cases fast is a bonus. > Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10, > and so something better in 11. And then later possibly backpatch if > we've grown some confidence? +1. I'd like to see some response to this included in 10.2, and time grows short for that. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-26 18:48:35 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote: > >> FWIW I do agree the data sets shared in this thread are pretty extreme > >> and it doesn't make much sense to slow the regular cases. I'll be > >> perfectly happy if we stop the OOM, making those cases fast is a bonus. > > > Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10, > > and so something better in 11. And then later possibly backpatch if > > we've grown some confidence? > > +1. I'd like to see some response to this included in 10.2, and time > grows short for that. Here are two patches that I think we want for 10.2, and the start of one that I think we want for master. 0002 is needed because otherwise the lack of extra growth leads to noticeably worse performance when filling an underestimated a coordinator hash table from the workers - turns out our hash combine (and most hash combines) let a lot of clustering survive. By adding a final hashing round the bit perturbation is near perfect. The commit messages need to be polished a bit, but other than that I think these are reasonable fixes. Plan to push by Monday evening at the latest. The third patch is a version of the random IV discussed in this thread. I do think we want to add usage of the extended hash functions, as prototyped by Tomas, as that actually helps to fix issues with actual hash conflicts. But we additionally need a fallback path for types without extended hashtables, and the random IV is a good idea nonetheless. There's no ABI difference in my patch, so I think this is actually something we could backpatch. But I don't think it's urgent, so I'm not planning to do that for 10.2. The one thing that could confuse people is that it can lead to output order changes from run to run - I think that's actually good, nobody should rely on hashagg etc output being stable, but it might be a bit much in a stable release? In my tests this solves the worst performance issues in Todd's case, Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18 (which showcases the underestimated worker hashtable into leader hashtable issue). Greetings, Andres Freund
Attachment
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 01/29/2018 02:02 AM, Andres Freund wrote: > On 2018-01-26 18:48:35 -0500, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote: >>>> FWIW I do agree the data sets shared in this thread are pretty extreme >>>> and it doesn't make much sense to slow the regular cases. I'll be >>>> perfectly happy if we stop the OOM, making those cases fast is a bonus. >> >>> Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10, >>> and so something better in 11. And then later possibly backpatch if >>> we've grown some confidence? >> >> +1. I'd like to see some response to this included in 10.2, and time >> grows short for that. > > Here are two patches that I think we want for 10.2, and the start of one > that I think we want for master. 0002 is needed because otherwise the > lack of extra growth leads to noticeably worse performance when filling > an underestimated a coordinator hash table from the workers - turns out > our hash combine (and most hash combines) let a lot of clustering > survive. By adding a final hashing round the bit perturbation is near > perfect. The commit messages need to be polished a bit, but other than > that I think these are reasonable fixes. Plan to push by Monday evening > at the latest. > > The third patch is a version of the random IV discussed in this > thread. I do think we want to add usage of the extended hash functions, > as prototyped by Tomas, as that actually helps to fix issues with actual > hash conflicts. But we additionally need a fallback path for types > without extended hashtables, and the random IV is a good idea > nonetheless. There's no ABI difference in my patch, so I think this is > actually something we could backpatch. But I don't think it's urgent, so > I'm not planning to do that for 10.2. The one thing that could confuse > people is that it can lead to output order changes from run to run - I > think that's actually good, nobody should rely on hashagg etc output > being stable, but it might be a bit much in a stable release? > > > In my tests this solves the worst performance issues in Todd's case, > Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18 > (which showcases the underestimated worker hashtable into leader > hashtable issue). > Sounds reasonable. Do you think repeating the performance testing is necessary? Not sure when do you plan to push the changes, I may not have time for the tests until after FOSDEM. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
Hi, On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote: > Do you think repeating the performance testing is necessary? Not sure > when do you plan to push the changes, I may not have time for the tests > until after FOSDEM. Hm, it surely wouldn't hurt, but your previous result does give a lot of confidence - there's not a lot of fundamental difference leaving one more mitigation that should never reduce hashtable "quality" aside. So I think going ahead with pushing and then benchmarking if possible, is reasonable? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote: >> Do you think repeating the performance testing is necessary? Not sure >> when do you plan to push the changes, I may not have time for the tests >> until after FOSDEM. > Hm, it surely wouldn't hurt, but your previous result does give a lot of > confidence - there's not a lot of fundamental difference leaving one > more mitigation that should never reduce hashtable "quality" aside. So > I think going ahead with pushing and then benchmarking if possible, is > reasonable? Since the release wrap is this coming Monday, there's not really time to wait and do something after FOSDEM ... regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
From
Andres Freund
Date:
On January 29, 2018 10:11:38 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote: >>> Do you think repeating the performance testing is necessary? Not >sure >>> when do you plan to push the changes, I may not have time for the >tests >>> until after FOSDEM. > >> Hm, it surely wouldn't hurt, but your previous result does give a lot >of >> confidence - there's not a lot of fundamental difference leaving one >> more mitigation that should never reduce hashtable "quality" aside. >So >> I think going ahead with pushing and then benchmarking if possible, >is >> reasonable? > >Since the release wrap is this coming Monday, there's not really time >to >wait and do something after FOSDEM ... I'm not sure what you're arguing for? I have measured the performance impacts, and am happy. That's why I do not want towait till then. Doing additional benchmarks after commit can't hurt tho... Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 01/28/18 20:02, Andres Freund wrote: > On 2018-01-26 18:48:35 -0500, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote: >>>> FWIW I do agree the data sets shared in this thread are pretty extreme >>>> and it doesn't make much sense to slow the regular cases. I'll be >>>> perfectly happy if we stop the OOM, making those cases fast is a bonus. >> >>> Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10, >>> and so something better in 11. And then later possibly backpatch if >>> we've grown some confidence? >> >> +1. I'd like to see some response to this included in 10.2, and time >> grows short for that. > > Here are two patches that I think we want for 10.2, and the start of one > that I think we want for master. 0002 is needed because otherwise the > lack of extra growth leads to noticeably worse performance when filling > an underestimated a coordinator hash table from the workers - turns out > our hash combine (and most hash combines) let a lot of clustering > survive. By adding a final hashing round the bit perturbation is near > perfect. The commit messages need to be polished a bit, but other than > that I think these are reasonable fixes. Plan to push by Monday evening > at the latest. I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1 and got the following build error: execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory #include "utils/hashutils.h" My build sequence was git pull git status (showing clean tree) make distclean ./configure ... make -- todd
Andres Freund <andres@anarazel.de> writes: > Here are two patches that I think we want for 10.2, and the start of one > that I think we want for master. 0002 is needed because otherwise the > lack of extra growth leads to noticeably worse performance when filling > an underestimated a coordinator hash table from the workers - turns out > our hash combine (and most hash combines) let a lot of clustering > survive. By adding a final hashing round the bit perturbation is near > perfect. The commit messages need to be polished a bit, but other than > that I think these are reasonable fixes. Plan to push by Monday evening > at the latest. The first 2 seem OK to me by eyeball, though I've not done performance testing. > The third patch is a version of the random IV discussed in this > thread. I do think we want to add usage of the extended hash functions, > as prototyped by Tomas, as that actually helps to fix issues with actual > hash conflicts. But we additionally need a fallback path for types > without extended hashtables, and the random IV is a good idea > nonetheless. There's no ABI difference in my patch, so I think this is > actually something we could backpatch. But I don't think it's urgent, so > I'm not planning to do that for 10.2. The one thing that could confuse > people is that it can lead to output order changes from run to run - I > think that's actually good, nobody should rely on hashagg etc output > being stable, but it might be a bit much in a stable release? I disagree: people should reasonably expect the same query and same data and same plan to give consistent results. When we stuck in the "synchronous seqscans" feature, which broke that property, we were very quickly forced by user complaints to provide a way to shut it off. I'm also concerned that we'd have to lobotomize a bunch of regression test cases to ensure stable results there. IOW, I think randomizing hashkeys is unacceptable for HEAD, let alone back-patching. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On January 29, 2018 10:11:38 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Since the release wrap is this coming Monday, there's not really time >> to wait and do something after FOSDEM ... > I'm not sure what you're arguing for? I have measured the performance impacts, and am happy. That's why I do not want towait till then. Doing additional benchmarks after commit can't hurt tho... I'm arguing for not waiting. If Tomas does some more benchmarking and finds something else we can improve, that's fine, but the stop-growing patch is already an important fix compared to 10.1. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-29 13:25:35 -0500, Tom Lane wrote: > The first 2 seem OK to me by eyeball, though I've not done performance > testing. Thanks for the look! > > The third patch is a version of the random IV discussed in this > > thread. I do think we want to add usage of the extended hash functions, > > as prototyped by Tomas, as that actually helps to fix issues with actual > > hash conflicts. But we additionally need a fallback path for types > > without extended hashtables, and the random IV is a good idea > > nonetheless. There's no ABI difference in my patch, so I think this is > > actually something we could backpatch. But I don't think it's urgent, so > > I'm not planning to do that for 10.2. The one thing that could confuse > > people is that it can lead to output order changes from run to run - I > > think that's actually good, nobody should rely on hashagg etc output > > being stable, but it might be a bit much in a stable release? > > I disagree: people should reasonably expect the same query and same > data and same plan to give consistent results. When we stuck in the > "synchronous seqscans" feature, which broke that property, we were very > quickly forced by user complaints to provide a way to shut it off. > I'm also concerned that we'd have to lobotomize a bunch of regression > test cases to ensure stable results there. I don't think the cases really compare. For one there's no "natural order" like insertion order for seqscans (in simplistic cases). For another hashaggs/joins/whatnot are *already* architecture dependent, so there's ORDER BYs for pretty much everything already - I needed to add a single ORDER BY to make things stable over ~50 regression runs. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-29 13:20:25 -0500, Todd A. Cook wrote: > I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1 > and got the following build error: > > execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory > #include "utils/hashutils.h" Yea, there's some minor differences from master to the 10 branch, needing to backport one commit beforehand. Attached are three patches applying to 10. Greetings, Andres Freund
Attachment
Andres Freund <andres@anarazel.de> writes: > On 2018-01-29 13:25:35 -0500, Tom Lane wrote: >> I disagree: people should reasonably expect the same query and same >> data and same plan to give consistent results. When we stuck in the >> "synchronous seqscans" feature, which broke that property, we were very >> quickly forced by user complaints to provide a way to shut it off. >> I'm also concerned that we'd have to lobotomize a bunch of regression >> test cases to ensure stable results there. > I don't think the cases really compare. For one there's no "natural > order" like insertion order for seqscans (in simplistic cases). For > another hashaggs/joins/whatnot are *already* architecture dependent, so > there's ORDER BYs for pretty much everything already - I needed to add a > single ORDER BY to make things stable over ~50 regression runs. As developers who see test results from a variety of architectures, we're aware that hash ops don't give stable results across machines. Most end users aren't going to know or care about that: they're running on one architecture and up to now they've gotten stable results from hash ops. I think we *will* get pushback if we change that. Also, the fact that you needed one more ORDER BY in testing on a single machine is already proof of my worry about the regression tests, and it'll only get worse when testing across a variety of machines. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 01/29/18 13:31, Andres Freund wrote: > On 2018-01-29 13:20:25 -0500, Todd A. Cook wrote: >> I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1 >> and got the following build error: >> >> execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory >> #include "utils/hashutils.h" > > Yea, there's some minor differences from master to the 10 branch, > needing to backport one commit beforehand. Attached are three patches > applying to 10. Thanks. With those patches applied, I'm able to process a 175 million row data set without any problems. I'll try the 900 million row set next. -- todd
One other point here is that it's not really clear to me what a randomly varying IV is supposed to accomplish. Surely we're not intending that it prevents somebody from crafting a data set that causes bad hash performance. If a user with DB access wants to cause a performance problem, there are and always will be plenty of other avenues to making that happen. If the idea is that for a data set that otherwise would have bad hash performance, choosing a different IV would (almost always) fix it, that sounds good but you're ignoring the inverse case: for a data set that works fine, there would be some choices of IV that create a problem where there was none before. I see no reason to think that the probability of the former kind of situation is higher than the latter. So I'm on board with using the extended hash functions when available, but I'm not convinced that a varying IV buys us anything but trouble. regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-29 13:56:37 -0500, Todd A. Cook wrote: > With those patches applied, I'm able to process a 175 million row data set > without any problems. Cool! > I'll try the 900 million row set next. Thanks! - Andres
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
Hi, On 2018-01-29 13:45:10 -0500, Tom Lane wrote: > Also, the fact that you needed one more ORDER BY in testing on a > single machine is already proof of my worry about the regression > tests, and it'll only get worse when testing across a variety > of machines. That's an entirely unconvincing argument. A lot of changes to hashtable code will result in ordering changes, and has done so in the past. Sure there occasionally was need for a followup ORDER BY addition, but that's not that bad. The fact that only a single order by was needed when entirely changing the hash function output (by randomizing it) doesn't at all seem to suggest that it's a huge problem. On 2018-01-29 14:11:26 -0500, Tom Lane wrote: > One other point here is that it's not really clear to me what a randomly > varying IV is supposed to accomplish. Surely we're not intending that > it prevents somebody from crafting a data set that causes bad hash > performance. I do think we should make that harder. > If a user with DB access wants to cause a performance > problem, there are and always will be plenty of other avenues to making > that happen. Sure, but that's different from somebody triggering, e.g. by website input, such behaviour. > If the idea is that for a data set that otherwise would have > bad hash performance, choosing a different IV would (almost always) fix > it, that sounds good but you're ignoring the inverse case: for a data set > that works fine, there would be some choices of IV that create a problem > where there was none before. I see no reason to think that the > probability of the former kind of situation is higher than the latter. It's easily intentionally triggerable vs. not. > So I'm on board with using the extended hash functions when available, > but I'm not convinced that a varying IV buys us anything but trouble. I think we really need both. A constant IV of e.g. 0 for a single-column hashtable doesn't really fix much, even though it does improve hash quality a bit for multi-column indexes. A random IV does address issues due to a lot of tuples falling into the same bucket, but does not address actual collisions. Both together are pretty convincing tho. Greetings, Andres Freund
On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One other point here is that it's not really clear to me what a randomly > varying IV is supposed to accomplish. Surely we're not intending that > it prevents somebody from crafting a data set that causes bad hash > performance. I actually think that is a real live issue that we will be forced to deal with one day. And I think that day is coming soon. It's not hard to imagine a user of a web site intentionally naming their objects such that they all hash to the same value. Probably most systems the worst case is a query that takes a few seconds or even tens of seconds but if you get lucky you could run a server out of memory. I'm actually thinking we should replace all the hashes with https://en.wikipedia.org/wiki/SipHash with a randomly generated key. -- greg
Greg Stark <stark@mit.edu> writes: > On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One other point here is that it's not really clear to me what a randomly >> varying IV is supposed to accomplish. Surely we're not intending that >> it prevents somebody from crafting a data set that causes bad hash >> performance. > I actually think that is a real live issue that we will be forced to > deal with one day. And I think that day is coming soon. > It's not hard to imagine a user of a web site intentionally naming > their objects such that they all hash to the same value. Probably most > systems the worst case is a query that takes a few seconds or even > tens of seconds but if you get lucky you could run a server out of > memory. By their very nature, hash algorithms have weak spots. Pretending that they do not, or that you can 100% remove them, is a fool's errand. You could always "set enable_hashjoin = off", and deal with mergejoin's weak spots instead; but that just requires a different data set to expose its performance shortcomings ... regards, tom lane
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-29 17:54:42 -0500, Tom Lane wrote: > Greg Stark <stark@mit.edu> writes: > > On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> One other point here is that it's not really clear to me what a randomly > >> varying IV is supposed to accomplish. Surely we're not intending that > >> it prevents somebody from crafting a data set that causes bad hash > >> performance. > > > I actually think that is a real live issue that we will be forced to > > deal with one day. And I think that day is coming soon. > > > It's not hard to imagine a user of a web site intentionally naming > > their objects such that they all hash to the same value. Probably most > > systems the worst case is a query that takes a few seconds or even > > tens of seconds but if you get lucky you could run a server out of > > memory. > > By their very nature, hash algorithms have weak spots. Pretending that > they do not, or that you can 100% remove them, is a fool's errand. Having edge case weaknesses and having remotely triggerable weaknesses aren't the same. And the cost for preventing such misuse is a immeasurable runtime difference and returning queries that don't specify an order and don't have a "natural" order with a changing order. That's not particularly high. Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Tomas Vondra
Date:
On 01/29/2018 08:11 PM, Tom Lane wrote: > One other point here is that it's not really clear to me what a > randomly varying IV is supposed to accomplish. Surely we're not > intending that it prevents somebody from crafting a data set that > causes bad hash performance. If a user with DB access wants to cause > a performance problem, there are and always will be plenty of other > avenues to making that happen. While I'm not sure the random IV is something we want/need, I don't think "There are other ways to harm the database," is a convincing argument. I agree it's fairly easy to set work_mem to an insane value, or craft a query that eats all memory (many sorts using work_mem each, hashagg used for data with many groups, ...). Basically, if you don't know how to crash the database, you're not a senior DBA/engineer. But not all attacks are equal. All those examples I named require direct database connection, ability to construct SQL queries, etc. The case discussed in this thread does not require that - it's enough to control the data input, say by uploading a CSV file with crafted data to some web application (as Greg mentions). Of course, you need to make some assumptions while crafting the data (that it's using Postgres, and that it'll use hash aggregate on the data), but that's about it. > If the idea is that for a data set that otherwise would have bad hash > performance, choosing a different IV would (almost always) fix it, > that sounds good but you're ignoring the inverse case: for a data > set that works fine, there would be some choices of IV that create a > problem where there was none before. I see no reason to think that > the probability of the former kind of situation is higher than the > latter. I don't think this thread is really about randomly generated data sets, but about something that is generated in a way that increases the number of collisions - either intentionally or unintentionally. That's certainly the case of my data set, which was designed to trigger one of the issues in the code. I agree we need to check if the randomization might cause more harm than good by causing random failures. But my feeling is the probability will be very low. > So I'm on board with using the extended hash functions when > available, but I'm not convinced that a varying IV buys us anything > but trouble. I don't have a clear opinion on the random IV yet. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 01/29/18 14:26, Andres Freund wrote: > On 2018-01-29 13:56:37 -0500, Todd A. Cook wrote: >> With those patches applied, I'm able to process a 175 million row data set >> without any problems. > > Cool! > >> I'll try the 900 million row set next. We're also able to process those 900 million rows without problems. :) It took about 28 minutes (average of 3 runs). Out of curiosity, I then modified hashint8() as previously described. With that change, run time dropped to 11 minutes (also average of 3 runs). FWIW, the data values in these sets are sort-of random (where I can't explain the "sort-of" in a public forum), but strongly biased towards negative infinity. Starting again from scratch, we could probably remove the bias, but we have 28-30 billion of these things collected over the last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change directions at this point... -- todd
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
Andres Freund
Date:
On 2018-01-30 13:57:44 -0500, Todd A. Cook wrote: > Out of curiosity, I then modified hashint8() as previously described. > With that change, run time dropped to 11 minutes (also average of 3 > runs). > > FWIW, the data values in these sets are sort-of random (where I can't > explain the "sort-of" in a public forum), but strongly biased towards > negative infinity. Starting again from scratch, we could probably remove > the bias, but we have 28-30 billion of these things collected over the > last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change > directions at this point... FWIW, you could just create a different hash opclass and use it for those queries... Greetings, Andres Freund
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
From
"Todd A. Cook"
Date:
On 01/30/18 14:34, Andres Freund wrote: > On 2018-01-30 13:57:44 -0500, Todd A. Cook wrote: >> Out of curiosity, I then modified hashint8() as previously described. >> With that change, run time dropped to 11 minutes (also average of 3 >> runs). >> >> FWIW, the data values in these sets are sort-of random (where I can't >> explain the "sort-of" in a public forum), but strongly biased towards >> negative infinity. Starting again from scratch, we could probably remove >> the bias, but we have 28-30 billion of these things collected over the >> last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change >> directions at this point... > > FWIW, you could just create a different hash opclass and use it for > those queries... Hmmm, that's intriguing; I'll look into it. Thanks! :) -- todd