Thread: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop

BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop

From
tcook@blackducksoftware.com
Date:
The following bug has been logged on the website:

Bug reference:      14932
Logged by:          Todd Cook
Email address:      tcook@blackducksoftware.com
PostgreSQL version: 10.1
Operating system:   CentOS Linux release 7.4.1708 (Core)
Description:

It hangs on a table with 167834 rows, though it works fine with only 167833
rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
be killed to stop it.

Some sample stack traces:

#0  0x00007f66f1ee9860 in __memset_sse2 () from /lib64/libc.so.6
#1  0x000000000083e085 in memset (__len=51539607552, __ch=0,
__dest=0x7f4cdf35c048) at /usr/include/bits/string3.h:84
#2  MemoryContextAllocExtended (context=<optimized out>, size=51539607552,
flags=flags@entry=5) at mcxt.c:843
#3  0x00000000005ec804 in tuplehash_allocate (type=0x163ecc8,
size=<optimized out>) at ../../../src/include/lib/simplehash.h:305
#4  tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:379
#5  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#6  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#7  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#8  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#9  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#10 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#11 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#12 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#13 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#14 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#15 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#16 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#17 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#18 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#19 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#20 ServerLoop () at postmaster.c:1753
#21 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#22 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:450
450                    curelem = SH_NEXT(tb, curelem, startelem);
#0  tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:450
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:445
445                    if (newentry->status == SH_STATUS_EMPTY)
#0  0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:445
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228


select name, setting, unit, source, pending_restart from pg_settings where
source <> 'default' and context <> 'internal' order by lower(name) ;
             name             |      setting       | unit |        source
    | pending_restart 
------------------------------+--------------------+------+----------------------+-----------------
 application_name             | psql               |      | client
    | f
 autovacuum_work_mem          | 131072             | kB   | configuration
file   | f
 checkpoint_completion_target | 0.8                |      | configuration
file   | f
 checkpoint_timeout           | 1800               | s    | configuration
file   | f
 client_encoding              | SQL_ASCII          |      | client
    | f
 cluster_name                 | PG 10              |      | configuration
file   | f
 DateStyle                    | ISO, MDY           |      | configuration
file   | f
 default_text_search_config   | pg_catalog.english |      | configuration
file   | f
 dynamic_shared_memory_type   | posix              |      | configuration
file   | f
 effective_cache_size         | 8388608            | 8kB  | configuration
file   | f
 lc_messages                  | C                  |      | configuration
file   | f
 lc_monetary                  | C                  |      | configuration
file   | f
 lc_numeric                   | C                  |      | configuration
file   | f
 lc_time                      | C                  |      | configuration
file   | f
 listen_addresses             | *                  |      | configuration
file   | f
 log_destination              | stderr             |      | configuration
file   | f
 log_line_prefix              | %m [%p]            |      | configuration
file   | f
 log_lock_waits               | on                 |      | configuration
file   | f
 log_min_duration_statement   | 20000              | ms   | configuration
file   | f
 log_rotation_age             | 1440               | min  | configuration
file   | f
 log_rotation_size            | 0                  | kB   | configuration
file   | f
 log_temp_files               | 1024               | kB   | configuration
file   | f
 log_timezone                 | US/Eastern         |      | configuration
file   | f
 log_truncate_on_rotation     | off                |      | configuration
file   | f
 logging_collector            | on                 |      | configuration
file   | f
 maintenance_work_mem         | 1048576            | kB   | configuration
file   | f
 max_connections              | 100                |      | configuration
file   | f
 max_stack_depth              | 2048               | kB   | environment
variable | f
 max_wal_senders              | 0                  |      | configuration
file   | f
 max_wal_size                 | 4096               | MB   | configuration
file   | f
 port                         | 54310              |      | configuration
file   | f
 shared_buffers               | 1048576            | 8kB  | configuration
file   | f
 synchronous_commit           | off                |      | configuration
file   | f
 TimeZone                     | US/Eastern         |      | configuration
file   | f
 transaction_deferrable       | off                |      | override
    | f
 transaction_isolation        | read committed     |      | override
    | f
 transaction_read_only        | off                |      | override
    | f
 wal_buffers                  | 2048               | 8kB  | override
    | f
 wal_level                    | minimal            |      | configuration
file   | f
 work_mem                     | 65536              | kB   | configuration
file   | f

select version();
                                                 version
                            
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit



Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:
Hi,

On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14932
> Logged by:          Todd Cook
> Email address:      tcook@blackducksoftware.com
> PostgreSQL version: 10.1
> Operating system:   CentOS Linux release 7.4.1708 (Core)
> Description:        
> 
> It hangs on a table with 167834 rows, though it works fine with only 167833
> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
> be killed to stop it.
> 

Can you share the query and data, so that we can reproduce the issue?

Based on the stack traces this smells like a bug in the simplehash,
introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
which gets triggered for 167834 rows (but not for 167833).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 14:17, Tomas Vondra wrote:
> Hi,
> 
> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      14932
>> Logged by:          Todd Cook
>> Email address:      tcook@blackducksoftware.com
>> PostgreSQL version: 10.1
>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>> Description:
>>
>> It hangs on a table with 167834 rows, though it works fine with only 167833
>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
>> be killed to stop it.
>>
> 
> Can you share the query and data, so that we can reproduce the issue?
> 
> Based on the stack traces this smells like a bug in the simplehash,
> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
> which gets triggered for 167834 rows (but not for 167833).

I've attached a reproducer using real data.  (FWIW, I wasn't able to reproduce with
fake data made with generate_series().)

Also, I forgot to mention that this problem is not present in 9.6 or 9.5.

-- todd

Attachment
As a workaround for this issue we've dropped the pgbouncer connection 
lifetime to 60 seconds and that seems to have alleviated this for the 
most part. No response from pgbouncer about this (either to the recently 
created issue or the mailing list message last year when I initially 
reported this).

Based on the comments in the past discussions on the postgres cancel 
protocol it seems that this is not viewed as a big issue because there's 
no real reports of it causing problems. Are other people just not using 
pgbouncer in transaction mode with the default settings (or not having 
two instances of pgbouncer between client and server)? Or typically 
don't send many cancel requests? Or is there just something silly I'm 
missing?

In one of our medium databases we see 5-15 cancel requests per day and 
with pgbouncer on the hard coded default setting (3600 second connection 
lifetime) we would get around 1 or 2 relevant erroneous cancels (one 
that causes an insert to fail, typically failing a larger transaction) 
per week. This was up from about 1 a month using the config default of 
1800 seconds that we lived with for a long time.

Is there something better we should be using other than pgbouncer for 
connection pooling?

On 2017-11-08 22:55, Skarsol wrote:

The following bug has been logged on the website:

Bug reference:      14891
Logged by:          Skarsol
Email address:      postgresql(at)skarsol(dot)com
PostgreSQL version: 9.6.3
Operating system:   Linux 4.4.8-hardened-r1 #4 SMP Mon Jun 12
Description:

This might be a symptom of the issue discussed in the ToDo "Changes to 
make
cancellations more reliable and more secure" but as it is related to the
pgbouncer bug I've opened at
https://github.com/pgbouncer/pgbouncer/issues/245 I figured I'd post it 
over
here just to make sure.

As the last step of this bug, pgbouncer 1.7.2 presents a cancel request 
to
postgres 9.6.3. This request targets pid 29330 which is connected to
pgbouncer on port 33024. That pid then accepts a new query, returns a 
result
set, accepts another new query, and then cancels that one out.

Expected behavior would have been for either no cancel (as that pid was
between queries at the time) or to cancel the first query. Cancelling 
the
2nd query is just weird (to me).

I have no idea how much of this is related to whatever pgbouncer is 
doing to
delay the cancel in the first place before presenting it to postgres.

I'm aware that we're 2 minor versions behind, but I don't see anything 
that
seems relevant to this in the changelogs.

Image of the relevant wireshark display at
https://user-images.githubusercontent.com/1915152/32578433-d5d4a71c-c4a2-11e7-9d25-f59d5afbb06b.jpg


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
Hi,

On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
> COPY reproducer (val) FROM stdin;
> 2976219712004784288
> -6429122065899879392
> -7471109962990387136
> -7471109962990387136
> -2895470491222113184
> -4083509061952565472
> 1019481548263425664
> 4639248884787347648
> -6999443831165647744
> -4199917803455020480
> -4110530183001439680

How are these values generated? They awfully look like hash values
(~same lenght, full numerical range)...

I'm kinda neck deep in something else, but I'll try to have a look
afterwards.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Thomas Munro
Date:
On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>> COPY reproducer (val) FROM stdin;
>> 2976219712004784288
>> -6429122065899879392
>> -7471109962990387136
>> -7471109962990387136
>> -2895470491222113184
>> -4083509061952565472
>> 1019481548263425664
>> 4639248884787347648
>> -6999443831165647744
>> -4199917803455020480
>> -4110530183001439680
>
> How are these values generated? They awfully look like hash values
> (~same lenght, full numerical range)...

When SH_INSERT tries to insert that final extra value, insertdist
keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
double the size (at least until my computer gives up, somewhere around
11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
to 26 then it succeeds, but I guess some other attack could be crafted
for that.  What is the theory behind this parameter?

-- 
Thomas Munro
http://www.enterprisedb.com


Thomas Munro <thomas.munro@enterprisedb.com> writes:
> When SH_INSERT tries to insert that final extra value, insertdist
> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
> double the size (at least until my computer gives up, somewhere around
> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
> to 26 then it succeeds, but I guess some other attack could be crafted
> for that.  What is the theory behind this parameter?

You beat me to it --- after looking at simplehash.h I'd guessed that
either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
an infinite loop, but I'd not gotten to determining which one yet.

I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
of them are obviously loop-proof.

Note that the sample data has a lot of collisions:

regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; hashint8   | count
-------------+-------  441526644 |  2337-1117776826 |  1221-1202007016 |   935-2068831050 |   620 1156644653 |   538
553783815|   510  259780770 |   444  371047036 |   394  915722575 |   359... etc etc ... 

It's evidently more complicated than just that the code fails with
more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
by a lot.  There needs to be a safety valve that prevents letting
the hash fill factor approach zero, which is what's happening in
this test case.

(I wonder whether these loops oughtn't contain CHECK_FOR_INTERRUPTS,
btw.)
        regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Thomas Munro
Date:
On Tue, Nov 28, 2017 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You beat me to it --- after looking at simplehash.h I'd guessed that
> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
> an infinite loop, but I'd not gotten to determining which one yet.
>
> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
> of them are obviously loop-proof.
>
> Note that the sample data has a lot of collisions:
>
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>   hashint8   | count
> -------------+-------
>    441526644 |  2337
>  -1117776826 |  1221
>  -1202007016 |   935
>  -2068831050 |   620
>   1156644653 |   538
>    553783815 |   510
>    259780770 |   444
>    371047036 |   394
>    915722575 |   359
>  ... etc etc ...
>
> It's evidently more complicated than just that the code fails with
> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
> by a lot.  There needs to be a safety valve that prevents letting
> the hash fill factor approach zero, which is what's happening in
> this test case.

Yeah.  If you count distinct hashint8(val) of *distinct* values, you get:

postgres=# select hashint8(val), count(*) from (select distinct val
from reproducer) ss group by 1 order by 2 desc limit 1; hashint8  | count
------------+-------1288181237 |    26
(1 row)

It doesn't matter how many bits of that you use, you'll get at least
26 collisions, so our loop can't terminate just by increasing the mask
size.  My guess is that you'd either need a defence based on something
like a secondary hash function, or at least a dynamic SH_GROW_MAX_DIB
that wouldn't allow the fill factor to plummet as you say.  A dataset
could be found that would exceed any static value of SH_GROW_MAX_DIB
by brute force.  In this case, considering the definition of hashint8,
there are more straightforward ways to find distinct bigint values
that hash to the same value... just swap some bits between the high
and low words, or something like that.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 11/28/2017 03:20 AM, Thomas Munro wrote:
> On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>>> COPY reproducer (val) FROM stdin;
>>> 2976219712004784288
>>> -6429122065899879392
>>> -7471109962990387136
>>> -7471109962990387136
>>> -2895470491222113184
>>> -4083509061952565472
>>> 1019481548263425664
>>> 4639248884787347648
>>> -6999443831165647744
>>> -4199917803455020480
>>> -4110530183001439680
>>
>> How are these values generated? They awfully look like hash values
>> (~same lenght, full numerical range)...
> 
> When SH_INSERT tries to insert that final extra value, insertdist
> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
> double the size (at least until my computer gives up, somewhere around
> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
> to 26 then it succeeds, but I guess some other attack could be crafted
> for that.  What is the theory behind this parameter?
> 

Yeah, I came to the same hypothesis yesterday, but I see I failed to
keep pgsql-bugs on CC :-(

FWIW I believe the last doubling (from 2147483648 to 2*2147483648) is
doomed to fail due to the sizemask=0 bug. I mean, if oldsize=2147483648,
then newsize=2*2147483648, which triggers this:
   if (tb->size == SH_MAX_SIZE)       tb->sizemask = 0;

which would explain why the last grow did not complete even after 40
minutes (while the 1073741824->2147483648 took 15 seconds). Because with
sizemask=0 the SH_NEXT/SH_PREV/.. can only ever returns 0.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 17:10, Andres Freund wrote:
> Hi,
> 
> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>> COPY reproducer (val) FROM stdin;
>> 2976219712004784288
>> -6429122065899879392
>> -7471109962990387136
>> -7471109962990387136
>> -2895470491222113184
>> -4083509061952565472
>> 1019481548263425664
>> 4639248884787347648
>> -6999443831165647744
>> -4199917803455020480
>> -4110530183001439680
> 
> How are these values generated? They awfully look like hash values
> (~same lenght, full numerical range)...

They are biased hashes.  FWIW, the reproducer data is from one column of a
three-column primary key.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 23:03, Tom Lane wrote:
> 
> Note that the sample data has a lot of collisions:
> 
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>    hashint8   | count
> -------------+-------
>     441526644 |  2337
>   -1117776826 |  1221
>   -1202007016 |   935
>   -2068831050 |   620
>    1156644653 |   538
>     553783815 |   510
>     259780770 |   444
>     371047036 |   394
>     915722575 |   359
>   ... etc etc ...

In case it matters, the complete data set will have some outlier values with 10k to 100k
collisions in this column.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 11/28/2017 03:55 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>>    hashint8   | count
>> -------------+-------
>>     441526644 |  2337
>>   -1117776826 |  1221
>>   -1202007016 |   935
>>   -2068831050 |   620
>>    1156644653 |   538
>>     553783815 |   510
>>     259780770 |   444
>>     371047036 |   394
>>     915722575 |   359
>>   ... etc etc ...
> 
> In case it matters, the complete data set will have some outlier values
> with 10k to 100k collisions in this column.
> 

In the original values? Not a big deal, I guess. It's the hashint8
collisions that's causing the infinite loop, i.e. different values with
the same hashint8 result.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
> > 
> > Note that the sample data has a lot of collisions:
> > 
> > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
> >    hashint8   | count
> > -------------+-------
> >     441526644 |  2337
> >   -1117776826 |  1221
> >   -1202007016 |   935
> >   -2068831050 |   620
> >    1156644653 |   538
> >     553783815 |   510
> >     259780770 |   444
> >     371047036 |   394
> >     915722575 |   359
> >   ... etc etc ...
> 
> In case it matters, the complete data set will have some outlier values with 10k to 100k
> collisions in this column.

To make sure we're on the same page, this is data intentionally created
to have a lot of hash collisions, is that right?

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/28/17 13:14, Andres Freund wrote:
> On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>>>     hashint8   | count
>>> -------------+-------
>>>      441526644 |  2337
>>>    -1117776826 |  1221
>>>    -1202007016 |   935
>>>    -2068831050 |   620
>>>     1156644653 |   538
>>>      553783815 |   510
>>>      259780770 |   444
>>>      371047036 |   394
>>>      915722575 |   359
>>>    ... etc etc ...
>>
>> In case it matters, the complete data set will have some outlier values with 10k to 100k
>> collisions in this column.
> 
> To make sure we're on the same page, this is data intentionally created
> to have a lot of hash collisions, is that right?

More or less.  It might be more accurate to say that it's created in such a way that
we expect to get lots of collisions.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 23:03, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> When SH_INSERT tries to insert that final extra value, insertdist
>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>> double the size (at least until my computer gives up, somewhere around
>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>> to 26 then it succeeds, but I guess some other attack could be crafted
>> for that.  What is the theory behind this parameter?
> 
> You beat me to it --- after looking at simplehash.h I'd guessed that
> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
> an infinite loop, but I'd not gotten to determining which one yet.
> 
> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
> of them are obviously loop-proof.
> 
> Note that the sample data has a lot of collisions:
> 
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>    hashint8   | count
> -------------+-------
>     441526644 |  2337
>   -1117776826 |  1221
>   -1202007016 |   935
>   -2068831050 |   620
>    1156644653 |   538
>     553783815 |   510
>     259780770 |   444
>     371047036 |   394
>     915722575 |   359
>   ... etc etc ...
> 
> It's evidently more complicated than just that the code fails with
> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
> by a lot.  There needs to be a safety valve that prevents letting
> the hash fill factor approach zero, which is what's happening in
> this test case.

FWIW, I can also reproduce the infinite loop with 167834 unique values.

It kinda looks like the problematic collisions arise from masking the
computed hash values; e.g.:

     SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
     {
    return hash & tb->sizemask;
     }

(Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/05/2017 10:23 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>> When SH_INSERT tries to insert that final extra value, insertdist
>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>> double the size (at least until my computer gives up, somewhere around
>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>> for that.  What is the theory behind this parameter?
>>
>> You beat me to it --- after looking at simplehash.h I'd guessed that
>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>> an infinite loop, but I'd not gotten to determining which one yet.
>>
>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>> of them are obviously loop-proof.
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>>    hashint8   | count
>> -------------+-------
>>     441526644 |  2337
>>   -1117776826 |  1221
>>   -1202007016 |   935
>>   -2068831050 |   620
>>    1156644653 |   538
>>     553783815 |   510
>>     259780770 |   444
>>     371047036 |   394
>>     915722575 |   359
>>   ... etc etc ...
>>
>> It's evidently more complicated than just that the code fails with
>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>> by a lot.  There needs to be a safety valve that prevents letting
>> the hash fill factor approach zero, which is what's happening in
>> this test case.
> 
> FWIW, I can also reproduce the infinite loop with 167834 unique values.
> 

Unique values or unique *hash* values?

Can you share the data, so that whoever fixes the bug can verify it also
fixes your example?

> It kinda looks like the problematic collisions arise from masking the
> computed hash values; e.g.:
> 
>     SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
>     {
>     return hash & tb->sizemask;
>     }
> 

That's pretty standard way to do modulo (when computing index of the
hash bucket), and certainly not the root cause.

> (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)
> 

Based on the initial discussion, the problem here is twofold.

Firstly, the code assumes that if it gets certain number of bucket
collisions (essentially, the initial bucket and certain number of
neighboring buckets already being full), making the table larger is
guaranteed to reduce the number of collisions.

Which is obviously untrue for duplicate hash values, but it may also
happen for distinct hash values that form a chain (think a sequence of
hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).

This causes the insane growth to the largest possible hash table.

Secondly, there's a bug that for the largest hash table we set
sizemask=0, which means we never ever get bucket index other than 0.

This causes the infinite loop.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 12/05/17 17:01, Tomas Vondra wrote:
> 
> 
> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>> double the size (at least until my computer gives up, somewhere around
>>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>> for that.  What is the theory behind this parameter?
>>>
>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>
>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>>> of them are obviously loop-proof.
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>> order by 2 desc;
>>>     hashint8   | count
>>> -------------+-------
>>>      441526644 |  2337
>>>    -1117776826 |  1221
>>>    -1202007016 |   935
>>>    -2068831050 |   620
>>>     1156644653 |   538
>>>      553783815 |   510
>>>      259780770 |   444
>>>      371047036 |   394
>>>      915722575 |   359
>>>    ... etc etc ...
>>>
>>> It's evidently more complicated than just that the code fails with
>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>> by a lot.  There needs to be a safety valve that prevents letting
>>> the hash fill factor approach zero, which is what's happening in
>>> this test case.
>>
>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>
> 
> Unique values or unique *hash* values?

Unique values.


> Can you share the data, so that whoever fixes the bug can verify it also
> fixes your example?

Sure.  It's attached.

-- todd

Attachment

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/05/2017 11:33 PM, Todd A. Cook wrote:
> On 12/05/17 17:01, Tomas Vondra wrote:
>>
>>
>> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>>> On 11/27/17 23:03, Tom Lane wrote:
>>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>>> double the size (at least until my computer gives up, somewhere around
>>>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>>> for that.  What is the theory behind this parameter?
>>>>
>>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>>
>>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>>>> of them are obviously loop-proof.
>>>>
>>>> Note that the sample data has a lot of collisions:
>>>>
>>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>>> order by 2 desc;
>>>>     hashint8   | count
>>>> -------------+-------
>>>>      441526644 |  2337
>>>>    -1117776826 |  1221
>>>>    -1202007016 |   935
>>>>    -2068831050 |   620
>>>>     1156644653 |   538
>>>>      553783815 |   510
>>>>      259780770 |   444
>>>>      371047036 |   394
>>>>      915722575 |   359
>>>>    ... etc etc ...
>>>>
>>>> It's evidently more complicated than just that the code fails with
>>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>>> by a lot.  There needs to be a safety valve that prevents letting
>>>> the hash fill factor approach zero, which is what's happening in
>>>> this test case.
>>>
>>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>>
>>
>> Unique values or unique *hash* values?
> 
> Unique values.
> 
> 
>> Can you share the data, so that whoever fixes the bug can verify it also
>> fixes your example?
> 
> Sure.  It's attached.
> 

Seems the dataset has pretty much the same issue as the one reported
before, that is

select hashint8(val), count(distinct val), count(*) from temp_f_03 group
by 1 order by 2 desc;

      hashint8   | count | count
    -------------+-------+-------
     -1971396144 |    45 |    45
      2035896403 |    42 |    42
     -1633843397 |    30 |    30
      1425704662 |    29 |    29
      -455482779 |    22 |    22
      -300163565 |    17 |    17
     -1803963420 |    17 |    17
      -537082846 |    14 |    14
       603707034 |    13 |    13
      -176369887 |    12 |    12
      1274957136 |    11 |    11
      1465522632 |    11 |    11
     -1589862230 |    10 |    10
     -1145403239 |    10 |    10

i.e. there are many hash collisions (more than in the other data set).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 12/06/17 08:33, Tomas Vondra wrote:
>
>>> Can you share the data, so that whoever fixes the bug can verify it also
>>> fixes your example?
>>
>> Sure.  It's attached.
>>
> 
> Seems the dataset has pretty much the same issue as the one reported
> before, that is
> 
> select hashint8(val), count(distinct val), count(*) from temp_f_03 group
> by 1 order by 2 desc;
> 
>        hashint8   | count | count
>      -------------+-------+-------
>       -1971396144 |    45 |    45
>        2035896403 |    42 |    42
>       -1633843397 |    30 |    30
>        1425704662 |    29 |    29
>        -455482779 |    22 |    22
>        -300163565 |    17 |    17
>       -1803963420 |    17 |    17
>        -537082846 |    14 |    14
>         603707034 |    13 |    13
>        -176369887 |    12 |    12
>        1274957136 |    11 |    11
>        1465522632 |    11 |    11
>       -1589862230 |    10 |    10
>       -1145403239 |    10 |    10
> 
> i.e. there are many hash collisions (more than in the other data set).

If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c,
it might be magnifying the difficulties here.  The least significant bits,
which are used as the bucket number in simplehash.h, are not very well
distributed:

select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ;
          val          |      to_hex      |  to_hex
----------------------+------------------+----------
   4444319256653758784 | 3dad64d121468140 | 805ffffe
    554179993563924608 | 7b0d7c49a018880  | 84dffffb
  -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7
  -4706811054739454944 | beae0c48915f8420 | 191ffff6
    618200668902031424 | 8944a3ba5d08040  | 2a3ffff0
   5074043922812601024 | 466aa01079f982c0 | 7effffee
  -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea
  -4597800992953433792 | c031545b6b128140 | b1dfffea
   8563040839807173408 | 76d608465dde8320 | 7d9fffe6
   6092569112843158816 | 548d27540c888520 | 6f9fffe6
  -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5
  -1786712428165627488 | e7345283536981a0 | 73ffffe5
  -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8
     88426174078092128 | 13a272306c58360  | b57fffd8
  -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3

-- todd


"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c,
> it might be magnifying the difficulties here.  The least significant bits,
> which are used as the bucket number in simplehash.h, are not very well
> distributed:

> select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ;
>           val          |      to_hex      |  to_hex
> ----------------------+------------------+----------
>    4444319256653758784 | 3dad64d121468140 | 805ffffe
>     554179993563924608 | 7b0d7c49a018880  | 84dffffb
>   -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7
>   -4706811054739454944 | beae0c48915f8420 | 191ffff6
>     618200668902031424 | 8944a3ba5d08040  | 2a3ffff0
>    5074043922812601024 | 466aa01079f982c0 | 7effffee
>   -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea
>   -4597800992953433792 | c031545b6b128140 | b1dfffea
>    8563040839807173408 | 76d608465dde8320 | 7d9fffe6
>    6092569112843158816 | 548d27540c888520 | 6f9fffe6
>   -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5
>   -1786712428165627488 | e7345283536981a0 | 73ffffe5
>   -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8
>      88426174078092128 | 13a272306c58360  | b57fffd8
>   -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3


Checking out hashint8() on random data shows no such obvious fault.
You've apparently got a data set that exposes a weakness in hashint8,
but it's not very clear what that is.

In any case, the hashtable code needs to not fall over in the
presence of a lot of collisions, regardless of the exact reason
for there being a lot.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
> Checking out hashint8() on random data shows no such obvious fault.
> You've apparently got a data set that exposes a weakness in hashint8,
> but it's not very clear what that is.

It's intentionally designed to cause problems afaict:

http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com


> In any case, the hashtable code needs to not fall over in the
> presence of a lot of collisions, regardless of the exact reason
> for there being a lot.

Yes, we need to be more resilient about it. Working on a patch.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 12/06/17 12:19, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
> 
> It's intentionally designed to cause problems afaict:
> 
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com

Definitely not intentionally.  My wording there could have been better;
perhaps "we expect to get lots of duplicates in that column and we then
need to find the unique values".  Eventually, a substantial number of
the duplicates will be deleted, but that's further along in the process.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/05/2017 11:01 PM, Tomas Vondra wrote:
> 
> Based on the initial discussion, the problem here is twofold.
> 
> Firstly, the code assumes that if it gets certain number of bucket
> collisions (essentially, the initial bucket and certain number of
> neighboring buckets already being full), making the table larger is
> guaranteed to reduce the number of collisions.
> 
> Which is obviously untrue for duplicate hash values, but it may also
> happen for distinct hash values that form a chain (think a sequence of
> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
> 

I've managed to construct (by sheer brute force) an example that breaks
simplehash in this way. It triggers the growth by having a sequence of
distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
then trying to insert a value with duplicate hash (say, 4).

Which will trigger the "emptydist" condition (as opposed to
"insertdist", as in the other example).

This means there are very few collisions (no hash value matching more
than 2 distinct values), and all the hash values are "at the beginning
of the range" so growing the hash table can't possible break the hash
sequences.

The generated datasets are quite large (16MB each), so I've placed them
here (along with the ugly C code to generate it):

    https://github.com/tvondra/simplehash-break

Each CSV file contains ~1M of strings with distinct hashes between 0 and
1048576 (essentially, covering about 97.5% of the range). There are
"chains" of up to ~300 consecutive hash values. This works:

    CREATE TABLE hash_text (val text);
    COPY hash_text FROM '/tmp/data1.csv';
    SET work_mem = '1GB';
    SELECT DISTINCT val FROM hash_text;

but this does not:

    CREATE TABLE hash_text (val text);
    COPY hash_text FROM '/tmp/data1.csv';
    COPY hash_text FROM '/tmp/data2.csv';
    SET work_mem = '1GB';
    SELECT DISTINCT val FROM hash_text;

In fact, only a small subset of the second data set should be needed.

FWIW I first tried to do the same thing with bigint values, but no
matter what I did I've been unable to generate a dataset covering more
than ~60% of the range (essentially hashint8 only ever produces 60% of
values from [0,1000000], which likely increases collision rate).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
> > 
> > Based on the initial discussion, the problem here is twofold.
> > 
> > Firstly, the code assumes that if it gets certain number of bucket
> > collisions (essentially, the initial bucket and certain number of
> > neighboring buckets already being full), making the table larger is
> > guaranteed to reduce the number of collisions.
> > 
> > Which is obviously untrue for duplicate hash values, but it may also
> > happen for distinct hash values that form a chain (think a sequence of
> > hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).

> I've managed to construct (by sheer brute force) an example that breaks
> simplehash in this way. It triggers the growth by having a sequence of
> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
> then trying to insert a value with duplicate hash (say, 4).

I fail to be excited by this. That's possible for any sort of hashtable
in some way. You might hit issues due to resizing or due to lookup
performance, but you'll hit problem. That's a fairly fundamental issue
of pure hashtables.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/06/2017 08:35 PM, Andres Freund wrote:
> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>
>>> Based on the initial discussion, the problem here is twofold.
>>>
>>> Firstly, the code assumes that if it gets certain number of bucket
>>> collisions (essentially, the initial bucket and certain number of
>>> neighboring buckets already being full), making the table larger is
>>> guaranteed to reduce the number of collisions.
>>>
>>> Which is obviously untrue for duplicate hash values, but it may also
>>> happen for distinct hash values that form a chain (think a sequence of
>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
> 
>> I've managed to construct (by sheer brute force) an example that breaks
>> simplehash in this way. It triggers the growth by having a sequence of
>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
>> then trying to insert a value with duplicate hash (say, 4).
> 
> I fail to be excited by this. That's possible for any sort of hashtable
> in some way. You might hit issues due to resizing or due to lookup
> performance, but you'll hit problem. That's a fairly fundamental issue
> of pure hashtables.
> 

Eh? I think you misunderstood the issue this triggers. I'm perfectly
fine with poor lookup performance - that's expected and reasonable. But
it actually triggers the same behavior as the already presented example,
i.e. the hash table grows indefinitely (so eventually gets OOM).

I really doubt allocating 3GB of memory (which is when my laptop gives
up and kills the query) to do distinct on 50MB table is a sane behavior.
Even if the table is constructed in so intentionally evil way.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From
Andres Freund
Date:

On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>On 12/06/2017 08:35 PM, Andres Freund wrote:
>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>
>>>> Based on the initial discussion, the problem here is twofold.
>>>>
>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>> collisions (essentially, the initial bucket and certain number of
>>>> neighboring buckets already being full), making the table larger is
>>>> guaranteed to reduce the number of collisions.
>>>>
>>>> Which is obviously untrue for duplicate hash values, but it may
>also
>>>> happen for distinct hash values that form a chain (think a sequence
>of
>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>
>>> I've managed to construct (by sheer brute force) an example that
>breaks
>>> simplehash in this way. It triggers the growth by having a sequence
>of
>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>and
>>> then trying to insert a value with duplicate hash (say, 4).
>>
>> I fail to be excited by this. That's possible for any sort of
>hashtable
>> in some way. You might hit issues due to resizing or due to lookup
>> performance, but you'll hit problem. That's a fairly fundamental
>issue
>> of pure hashtables.
>>
>
>Eh? I think you misunderstood the issue this triggers. I'm perfectly
>fine with poor lookup performance - that's expected and reasonable. But
>it actually triggers the same behavior as the already presented
>example,
>i.e. the hash table grows indefinitely (so eventually gets OOM).
>
>I really doubt allocating 3GB of memory (which is when my laptop gives
>up and kills the query) to do distinct on 50MB table is a sane
>behavior.
>Even if the table is constructed in so intentionally evil way.

No, I got that. I just don't think time and memory are that different.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/06/2017 09:00 PM, Andres Freund wrote:
> 
> 
> On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> On 12/06/2017 08:35 PM, Andres Freund wrote:
>>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>>
>>>>> Based on the initial discussion, the problem here is twofold.
>>>>>
>>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>>> collisions (essentially, the initial bucket and certain number of
>>>>> neighboring buckets already being full), making the table larger is
>>>>> guaranteed to reduce the number of collisions.
>>>>>
>>>>> Which is obviously untrue for duplicate hash values, but it may
>> also
>>>>> happen for distinct hash values that form a chain (think a sequence
>> of
>>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>>
>>>> I've managed to construct (by sheer brute force) an example that
>> breaks
>>>> simplehash in this way. It triggers the growth by having a sequence
>> of
>>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>> and
>>>> then trying to insert a value with duplicate hash (say, 4).
>>>
>>> I fail to be excited by this. That's possible for any sort of
>> hashtable
>>> in some way. You might hit issues due to resizing or due to lookup
>>> performance, but you'll hit problem. That's a fairly fundamental
>> issue
>>> of pure hashtables.
>>>
>>
>> Eh? I think you misunderstood the issue this triggers. I'm perfectly
>> fine with poor lookup performance - that's expected and reasonable. But
>> it actually triggers the same behavior as the already presented
>> example,
>> i.e. the hash table grows indefinitely (so eventually gets OOM).
>>
>> I really doubt allocating 3GB of memory (which is when my laptop gives
>> up and kills the query) to do distinct on 50MB table is a sane
>> behavior.
>> Even if the table is constructed in so intentionally evil way.
> 
> No, I got that. I just don't think time and memory are that different.
>
It's one thing when the hash table takes longer to lookup something or
when it consumes a bit more memory. Say, ~2x more than needed, give or
take. I'm perfectly fine with that, particularly when it's a worst-case
evil data set like this one.

But it's a very different situation when it consumes tens of GBs of
memory (essentially building the largest possible hash table, despite
the load factor dropping to 0), because the code grows the hash table
assuming it will magically resolve the collisions. And even if the
machine has enough RAM, it's guaranteed to fail we'll try growing the
table beyond that limit.

I don't quite understand why should that be OK? It doesn't seem very
sane to me.

FWIW I've constructed the data sets for two reasons - to convince myself
that my understanding of the simplehash code is correct, and to provide
a data set triggering the other growth condition in simplehash code. My
understanding is that if we stop growing the table after the load factor
drops below some threshold (as TL proposed earlier in this thread), it
should address both of these cases.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> It's one thing when the hash table takes longer to lookup something or

longer aka "forever".


> when it consumes a bit more memory. Say, ~2x more than needed, give or
> take. I'm perfectly fine with that, particularly when it's a worst-case
> evil data set like this one.

I think the way to prevent that kind of attack is to add randomization.


> FWIW I've constructed the data sets for two reasons - to convince myself
> that my understanding of the simplehash code is correct, and to provide
> a data set triggering the other growth condition in simplehash code. My
> understanding is that if we stop growing the table after the load factor
> drops below some threshold (as TL proposed earlier in this thread), it
> should address both of these cases.

Yea, I'm not adverse to adding a few stopgaps that break in a less
annoying manner. WAll I'm saying is that I don't think we need to be
super concerned about this specific way of breaking things.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Stephen Frost
Date:
Andres, all,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> > FWIW I've constructed the data sets for two reasons - to convince myself
> > that my understanding of the simplehash code is correct, and to provide
> > a data set triggering the other growth condition in simplehash code. My
> > understanding is that if we stop growing the table after the load factor
> > drops below some threshold (as TL proposed earlier in this thread), it
> > should address both of these cases.
>
> Yea, I'm not adverse to adding a few stopgaps that break in a less
> annoying manner. WAll I'm saying is that I don't think we need to be
> super concerned about this specific way of breaking things.

I think I'm agreeing with Tomas here when I say that we shouldn't have a
system that breaks in the "tries to allocate infinite memory" way when
the load factor drops far below some reasonable threshold and that we
*should* be concerned that we have this problem in the existing
implementation.  It's bad, and we should care about that and fix it.

Running forever in a loop isn't great either, but at least you only
expend at most one CPU to it and can go kill it, assuming we have an
appropriate CHECK_FOR_INTERRUPTS() placement.  Running the box out of
memory can lead to other things *failing* (even entirely reasonable
things!), not just being slower.  That's an important difference.

Thanks!

Stephen

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/06/2017 09:46 PM, Andres Freund wrote:
> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>> It's one thing when the hash table takes longer to lookup something or
> 
> longer aka "forever".
> 

Not necessarily.

The datasets I shared are somewhat extreme in the sense that there are
many contiguous sequences of hash values, but it only takes one such
sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
the hash table may still be perfectly fine for most keys, and only
slightly slower for the keys in the sequence.

> 
>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>> take. I'm perfectly fine with that, particularly when it's a worst-case
>> evil data set like this one.
> 
> I think the way to prevent that kind of attack is to add randomization.
> 

By randomization you mean universal hashing [1], or something else?

[1] https://en.wikipedia.org/wiki/Universal_hashing

In any case, randomization should make it much harder (or pretty much
impossible) to construct datasets that are guaranteed to cause failures
of this kind. No problem with that (although if someone hits the issue,
it will be more difficult to reproduce).

It would still be possible to hit it by chance, but that should be very
unlikely (OTOH the current code assumes the same thing about collisions
and sequences, and here we are.)

> 
>> FWIW I've constructed the data sets for two reasons - to convince myself
>> that my understanding of the simplehash code is correct, and to provide
>> a data set triggering the other growth condition in simplehash code. My
>> understanding is that if we stop growing the table after the load factor
>> drops below some threshold (as TL proposed earlier in this thread), it
>> should address both of these cases.
> 
> Yea, I'm not adverse to adding a few stopgaps that break in a less
> annoying manner. WAll I'm saying is that I don't think we need to be
> super concerned about this specific way of breaking things.
> 

OK, understood.


cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
> On 12/06/2017 09:46 PM, Andres Freund wrote:
> > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> >> It's one thing when the hash table takes longer to lookup something or
> > 
> > longer aka "forever".
> > 
> 
> Not necessarily.

> The datasets I shared are somewhat extreme in the sense that there are
> many contiguous sequences of hash values, but it only takes one such
> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
> the hash table may still be perfectly fine for most keys, and only
> slightly slower for the keys in the sequence.

Meh, we're talking about adversarial attacks here.


> >> when it consumes a bit more memory. Say, ~2x more than needed, give or
> >> take. I'm perfectly fine with that, particularly when it's a worst-case
> >> evil data set like this one.
> > 
> > I think the way to prevent that kind of attack is to add randomization.
> > 
> 
> By randomization you mean universal hashing [1], or something else?

No, adding in a random seed to the hash. It'd be a lot better if we had
a way to provide internal state to the hashfunction, but even just using
hash_combine()ing a random number into a hash would be a lot better than
what we're doing now.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/06/2017 10:12 PM, Andres Freund wrote:
> On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
>> On 12/06/2017 09:46 PM, Andres Freund wrote:
>>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>>>> It's one thing when the hash table takes longer to lookup something or
>>>
>>> longer aka "forever".
>>>
>>
>> Not necessarily.
> 
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
> 
> Meh, we're talking about adversarial attacks here.
> 
> 
>>>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>>>> take. I'm perfectly fine with that, particularly when it's a worst-case
>>>> evil data set like this one.
>>>
>>> I think the way to prevent that kind of attack is to add randomization.
>>>
>>
>> By randomization you mean universal hashing [1], or something else?
> 
> No, adding in a random seed to the hash. It'd be a lot better if we
> had a way to provide internal state to the hashfunction, but even
> just using hash_combine()ing a random number into a hash would be a
> lot better than what we're doing now.
> 

The universal hashing (of "our" hash value) would have the benefit that
we could change it when growing the hash table. I.e. do this

    h(x) = ((A * hash_any(x) + B) mod P)) mod M

and generate new A, B, P whenever we grow the hashtable. That would
further reduce the probability of either of the failures described in
this thread (I mean, hitting it again after growing the table).

I don't know if it's worth the additional CPU cycles, of course.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 12/06/17 16:12, Andres Freund wrote:
> 
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
> 
> Meh, we're talking about adversarial attacks here.

Hmmmmm...

I found this problem when I dropped 10.1 into a test environment to see
what would happen.  There was no deliberate attempt to break anything.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
> On 12/06/17 16:12, Andres Freund wrote:
> > 
> > > The datasets I shared are somewhat extreme in the sense that there are
> > > many contiguous sequences of hash values, but it only takes one such
> > > sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
> > > the hash table may still be perfectly fine for most keys, and only
> > > slightly slower for the keys in the sequence.
> > 
> > Meh, we're talking about adversarial attacks here.
> 
> Hmmmmm...
> 
> I found this problem when I dropped 10.1 into a test environment to see
> what would happen.  There was no deliberate attempt to break anything.

Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com

Greetings,

Andres Freund


Andres Freund <andres@anarazel.de> writes:
> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
>> I found this problem when I dropped 10.1 into a test environment to see
>> what would happen.  There was no deliberate attempt to break anything.

> Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com

I'm confused by Tomas' claim that

>> (essentially hashint8 only ever produces 60% of
>> values from [0,1000000], which likely increases collision rate).

This is directly contradicted by the simple experiments I've done, eg

regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v;
 count
--------
 999879
(1 row)

regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v;
 count
--------
 644157
(1 row)

regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v;
  count
---------
 1048514
(1 row)

It's certainly not perfect, but I'm not observing any major failure to
span the output space.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/06/2017 11:55 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
>>> I found this problem when I dropped 10.1 into a test environment to see
>>> what would happen.  There was no deliberate attempt to break anything.
> 
>> Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com
> 
> I'm confused by Tomas' claim that
> 
>>> (essentially hashint8 only ever produces 60% of
>>> values from [0,1000000], which likely increases collision rate).
> 
> This is directly contradicted by the simple experiments I've done, eg
> 
> regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v;
>  count  
> --------
>  999879
> (1 row)
> 
> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v;
>  count  
> --------
>  644157
> (1 row)
> 
> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v;
>   count  
> ---------
>  1048514
> (1 row)
> 
> It's certainly not perfect, but I'm not observing any major failure to
> span the output space.
> 

That is not the claim I was making, though. When generating data sets
I've been considering only values where hashint8(v) is between 0 and
1000000 *without* the masking. Otherwise growing the hash table would
break the "chain" of values in the hash table, which would resolve the
issue with SH_GROW_MAX_MOVE.

So you'd have to do something like this:

    select count(distinct hashint8(v))
      from generate_series(0,1000000::int8) v
     where hashint8(v) between 0 and 1024*1024;

but to get some numbers you'd have to also increase the value passed to
generate_series (because it'll filter most of the values).

Which is why I generated the data by an ugly C program, similar to the
attached one. It keeps a small bitmap for generated hashes in the [0,1M]
range, and prints the number of bits set after every 1e9 values processed.

What I do see is this:

    i=1000000000 nset=217666
    i=2000000000 nset=389526
    i=3000000000 nset=525135
    i=4000000000 nset=632305
    i=5000000000 nset=659574
    i=6000000000 nset=659574
    i=7000000000 nset=659574
    i=8000000000 nset=659574
    i=9000000000 nset=659574
    ...

So somewhere between 4e9 and 5e9 we generate 659574 hashes in the range,
and then never increase it further.

I don't know if this an issue in practice, but it seems that for large
hash tables (on bigint), growing the hash table may not be that great
improvement because we're not really doubling the number of slots we can
hit directly. We'll probably find an empty slot nearby, though.

It was just an observation - I only noticed that because I tried to
construct the adversarial data set on bigint, and couldn't make it work
no matter what.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:
On 12/06/2017 06:19 PM, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
> 
> It's intentionally designed to cause problems afaict:
> 
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com
> 
> 
>> In any case, the hashtable code needs to not fall over in the
>> presence of a lot of collisions, regardless of the exact reason
>> for there being a lot.
> 
> Yes, we need to be more resilient about it. Working on a patch.
> 

I've been playing with this a bit today (sorry for trespassing on your
turf a bit, but curiosity killed the cat), particularly with the tree
basic ideas mentioned in this thread. Let me share some numbers.

1) randomized hashing, i.e. using the extended hash functions and a
random seed

2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
formula, where h is "our" hash, and (a,b) are random and p is a prime.
This can't possibly fix the collisions reported by Todd, because the
result will remain the same. But it should fix "my" chains of values by
spreading them a bit (and the a,b,p values are generated on each grow,
so in the unlikely case where we get a chain, it should disappear after
a growth)

[1] https://en.wikipedia.org/wiki/Universal_hashing

3) disabling growth of the hash table when the fillfactor would drop too
much (e.g. below 0.2)

Very rough patches for these three approaches are attached.


Now, some very rough numbers comparing the impact of those approaches
(and their combinations). The following two tables present timings for
for a couple of datasets - five cases without any collisions (allowing
us to quantify impact on regular datasets), the two datasets generated
by me (one without collisions, one with collisions), and the two issues
reported by Todd.

RH - randomized hashing (seed + extended)
UH - universal hashing
ST - stop growing

             master     RH     UH     ST    RH     UH    ST
    -------------------------------------------------------
    Ok/1        542    540    552    537    0%     2%   -1%
    Ok/2        232    221    242    237   -5%     4%    2%
    Ok/3        161    167    180    161    4%    12%    0%
    Ok/4        276    270    285    274   -2%     3%   -1%
    Ok/5        174    179    193    174    3%    11%    0%
    Tomas/1     543    555    551    533    2%     1%   -2%
    Tomas/2     OOM   2322   3251      -
    Todd/1      OOM    OOM     63     63
    Todd/2      OOM    OOM    OOM     72


This is mostly as expected, although I'm surprised that the universal
hashing seems to fix Todd's first reproducer. The "stop growing"
approach technically fixes "my" dataset, but it takes so much time I
haven't timed it.

In terms of slowdown (compared to master), I'd say RH and ST are mostly
neutral. Universal hashing has significant impact in some cases, but I
suppose that might be fixed by using a slightly different scheme
(instead of using simple modulo arithmetic).

Now, let's see on combinations of the approaches (broken into two
tables, so that it doesn't get mangled by mail clients):

             master  RH+UH  RH+ST  UH+ST  RH+UH+ST
    Ok/1        542    553    545    549       558
    Ok/2        232    241    234    241       250
    Ok/3        161    180    170    180       187
    Ok/4        276    274    270    285       290
    Ok/5        174    187    182    193       198
    Tomas/1     543    571    561    554       570
    Tomas/2       -   2380   2340   3130      2380
    Todd/1        -     61     65     63        64
    Todd/2        -      -     90     91        93

              RH+UH   RH+ST  UH+ST  RH+UH+ST
    Ok/1         2%      1%     1%        3%
    Ok/2         4%      1%     4%        8%
    Ok/3        12%      6%    12%       16%
    Ok/4        -1%     -2%     3%        5%
    Ok/5         7%      5%    11%       14%
    Tomas/1      5%      3%     2%        5%

It seems only the "stop growth" has to be part of the solution, as it's
included in any combination fixing all cases shared in this thread. I'd
say (randomized hashing + stop growing) seems like the best option.

FWIW I do agree the data sets shared in this thread are pretty extreme
and it doesn't make much sense to slow the regular cases. I'll be
perfectly happy if we stop the OOM, making those cases fast is a bonus.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 12/07/2017 12:45 AM, Tomas Vondra wrote:
> 
> 
> On 12/06/2017 11:55 PM, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
>>>> I found this problem when I dropped 10.1 into a test environment to see
>>>> what would happen.  There was no deliberate attempt to break anything.
>>
>>> Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com
>>
>> I'm confused by Tomas' claim that
>>
>>>> (essentially hashint8 only ever produces 60% of
>>>> values from [0,1000000], which likely increases collision rate).
>>
>> This is directly contradicted by the simple experiments I've done, eg
>>
>> regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v;
>>  count  
>> --------
>>  999879
>> (1 row)
>>
>> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v;
>>  count  
>> --------
>>  644157
>> (1 row)
>>
>> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v;
>>   count  
>> ---------
>>  1048514
>> (1 row)
>>
>> It's certainly not perfect, but I'm not observing any major failure to
>> span the output space.
>>
> 
> That is not the claim I was making, though. When generating data sets
> I've been considering only values where hashint8(v) is between 0 and
> 1000000 *without* the masking. Otherwise growing the hash table would
> break the "chain" of values in the hash table, which would resolve the
> issue with SH_GROW_MAX_MOVE.
> 
> So you'd have to do something like this:
> 
>     select count(distinct hashint8(v))
>       from generate_series(0,1000000::int8) v
>      where hashint8(v) between 0 and 1024*1024;
> 
> but to get some numbers you'd have to also increase the value passed to
> generate_series (because it'll filter most of the values).
> 
> Which is why I generated the data by an ugly C program, similar to the
> attached one. It keeps a small bitmap for generated hashes in the [0,1M]
> range, and prints the number of bits set after every 1e9 values processed.
> 
> What I do see is this:
> 
>     i=1000000000 nset=217666
>     i=2000000000 nset=389526
>     i=3000000000 nset=525135
>     i=4000000000 nset=632305
>     i=5000000000 nset=659574
>     i=6000000000 nset=659574
>     i=7000000000 nset=659574
>     i=8000000000 nset=659574
>     i=9000000000 nset=659574
>     ...
> 
> So somewhere between 4e9 and 5e9 we generate 659574 hashes in the range,
> and then never increase it further.
> 
> I don't know if this an issue in practice, but it seems that for large
> hash tables (on bigint), growing the hash table may not be that great
> improvement because we're not really doubling the number of slots we can
> hit directly. We'll probably find an empty slot nearby, though.
> 
> It was just an observation - I only noticed that because I tried to
> construct the adversarial data set on bigint, and couldn't make it work
> no matter what.
> 

I've done a simple experiment today - computing a hash for every uint32
value, and checking how many distinct hash values that produces. For the
4.2 billion values the results look like this:

    second key        ndistinct     ndistinct/nkeys
    3.380 i=100000000 nset=98829159 0.99
    6.240 i=200000000 nset=195351181 0.98
    9.106 i=300000000 nset=289623103 0.97
    11.932 i=400000000 nset=381695003 0.95
    14.814 i=500000000 nset=471621355 0.94
    17.706 i=600000000 nset=559452278 0.93
    20.577 i=700000000 nset=645242762 0.92
    23.496 i=800000000 nset=729036217 0.91
    26.460 i=900000000 nset=810879821 0.90
    29.380 i=1000000000 nset=890818778 0.89
    32.331 i=1100000000 nset=968898478 0.88
    35.282 i=1200000000 nset=1045164189 0.87
    38.262 i=1300000000 nset=1119654810 0.86
    41.251 i=1400000000 nset=1192424766 0.85
    44.258 i=1500000000 nset=1263482593 0.84
    47.268 i=1600000000 nset=1332897449 0.83
    50.305 i=1700000000 nset=1400717923 0.82
    53.356 i=1800000000 nset=1466962823 0.81
    56.425 i=1900000000 nset=1531660191 0.81
    59.489 i=2000000000 nset=1594856205 0.80
    62.593 i=2100000000 nset=1656588855 0.79
    65.706 i=2200000000 nset=1716895530 0.78
    68.829 i=2300000000 nset=1775809288 0.77
    71.966 i=2400000000 nset=1833353377 0.76
    75.123 i=2500000000 nset=1889558599 0.76
    78.271 i=2600000000 nset=1944463039 0.75
    81.418 i=2700000000 nset=1998096496 0.74
    84.574 i=2800000000 nset=2050490745 0.73
    87.711 i=2900000000 nset=2101666393 0.72
    90.852 i=3000000000 nset=2151669155 0.72
    93.986 i=3100000000 nset=2200517580 0.71
    97.084 i=3200000000 nset=2248232772 0.70
    100.172 i=3300000000 nset=2294849331 0.70
    103.232 i=3400000000 nset=2340389131 0.69
    106.273 i=3500000000 nset=2384875319 0.68
    109.272 i=3600000000 nset=2428339639 0.67
    112.260 i=3700000000 nset=2470798655 0.67
    115.199 i=3800000000 nset=2512271730 0.66
    118.125 i=3900000000 nset=2552788321 0.65
    121.029 i=4000000000 nset=2592379529 0.65
    123.895 i=4100000000 nset=2631056297 0.64
    126.726 i=4200000000 nset=2668843329 0.64
    129.397 loops 4294967295 set 2703915179 (0.63)

That means we only ever generate about 64% of the possible hash keys.
That doesn't seem particularly healthy I guess, but for small hash
tables (with fewer than 2^32 buckets) that may not be an issue, as some
of the values will wrap because of the modulo.

For comparison, murmurhash3 and xxhash (both fairly popular hash
functions) end with 1:1 ratio of values and hashes, that is not having
any collisions at all. Of course, both are somewhat slower than the
simple hash functions we use for int/bigint values.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 12/10/17 18:14, Tomas Vondra wrote:
> 
> I've done a simple experiment today - computing a hash for every uint32
> value, and checking how many distinct hash values that produces. For the
> 4.2 billion values the results look like this:
> 
>      second key        ndistinct     ndistinct/nkeys
>      3.380 i=100000000 nset=98829159 0.99
>      6.240 i=200000000 nset=195351181 0.98
>      9.106 i=300000000 nset=289623103 0.97
>      11.932 i=400000000 nset=381695003 0.95
>      14.814 i=500000000 nset=471621355 0.94
>      17.706 i=600000000 nset=559452278 0.93
>      20.577 i=700000000 nset=645242762 0.92
>      23.496 i=800000000 nset=729036217 0.91
>      26.460 i=900000000 nset=810879821 0.90
>      29.380 i=1000000000 nset=890818778 0.89
>      32.331 i=1100000000 nset=968898478 0.88
>      35.282 i=1200000000 nset=1045164189 0.87
>      38.262 i=1300000000 nset=1119654810 0.86
>      41.251 i=1400000000 nset=1192424766 0.85
>      44.258 i=1500000000 nset=1263482593 0.84
>      47.268 i=1600000000 nset=1332897449 0.83
>      50.305 i=1700000000 nset=1400717923 0.82
>      53.356 i=1800000000 nset=1466962823 0.81
>      56.425 i=1900000000 nset=1531660191 0.81
>      59.489 i=2000000000 nset=1594856205 0.80
>      62.593 i=2100000000 nset=1656588855 0.79
>      65.706 i=2200000000 nset=1716895530 0.78
>      68.829 i=2300000000 nset=1775809288 0.77
>      71.966 i=2400000000 nset=1833353377 0.76
>      75.123 i=2500000000 nset=1889558599 0.76
>      78.271 i=2600000000 nset=1944463039 0.75
>      81.418 i=2700000000 nset=1998096496 0.74
>      84.574 i=2800000000 nset=2050490745 0.73
>      87.711 i=2900000000 nset=2101666393 0.72
>      90.852 i=3000000000 nset=2151669155 0.72
>      93.986 i=3100000000 nset=2200517580 0.71
>      97.084 i=3200000000 nset=2248232772 0.70
>      100.172 i=3300000000 nset=2294849331 0.70
>      103.232 i=3400000000 nset=2340389131 0.69
>      106.273 i=3500000000 nset=2384875319 0.68
>      109.272 i=3600000000 nset=2428339639 0.67
>      112.260 i=3700000000 nset=2470798655 0.67
>      115.199 i=3800000000 nset=2512271730 0.66
>      118.125 i=3900000000 nset=2552788321 0.65
>      121.029 i=4000000000 nset=2592379529 0.65
>      123.895 i=4100000000 nset=2631056297 0.64
>      126.726 i=4200000000 nset=2668843329 0.64
>      129.397 loops 4294967295 set 2703915179 (0.63)
> 
> That means we only ever generate about 64% of the possible hash keys.
> That doesn't seem particularly healthy I guess, but for small hash
> tables (with fewer than 2^32 buckets) that may not be an issue, as some
> of the values will wrap because of the modulo.

FWIW, changing hashint8() to

    int64   val = PG_GETARG_INT64(0);

    if (val <= UINT32_MAX && val >= 0)
        return hash_uint32((uint32) val);
    else
        return hash_any((unsigned char *) &val, sizeof(val));

fixes the problem for me on all of my data sets.  My conjecture is that the
existing implementation

    uint32    lohalf = (uint32) val;
    uint32    hihalf = (uint32) (val >> 32);

    lohalf ^= (val >= 0) ? hihalf : ~hihalf;

biases lohalf when hashing collections of large-magnitude negative numbers
(like my data sets) because the individual bits of lohalf do not have equal
probability of being toggled by the exclusive-or.  For example, given a
value of -9223261146507558080 (0x8000770b48a68340), 15 of the 16 most
significant bits in lohalf will toggle, but only half of the least significant
will toggle.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 15:59, Todd A. Cook wrote:
> On 11/27/17 14:17, Tomas Vondra wrote:
>> Hi,
>>
>> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      14932
>>> Logged by:          Todd Cook
>>> Email address:      tcook@blackducksoftware.com
>>> PostgreSQL version: 10.1
>>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>>> Description:
>>>
>>> It hangs on a table with 167834 rows, though it works fine with only 167833
>>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
>>> be killed to stop it.
>>>
>>
>> Can you share the query and data, so that we can reproduce the issue?
>>
>> Based on the stack traces this smells like a bug in the simplehash,
>> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
>> which gets triggered for 167834 rows (but not for 167833).
> 
> I've attached a reproducer using real data.  (FWIW, I wasn't able to reproduce with
> fake data made with generate_series().)
> 
> Also, I forgot to mention that this problem is not present in 9.6 or 9.5.

Is this issue likely to be addressed in 10.2?

FWIW, we try to avoid running with local patches, but we are unable to move
past 9.6 without a fix...

-- todd


Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 12/06/2017 06:19 PM, Andres Freund wrote:
>> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>>> In any case, the hashtable code needs to not fall over in the
>>> presence of a lot of collisions, regardless of the exact reason
>>> for there being a lot.

>> Yes, we need to be more resilient about it. Working on a patch.

This seems to have fallen through a crack over Christmas holidays.
Let's see if we can't push it to a conclusion.

> [ Tomas presents some draft patches and performance analysis ]

> It seems only the "stop growth" has to be part of the solution, as it's
> included in any combination fixing all cases shared in this thread. I'd
> say (randomized hashing + stop growing) seems like the best option.

> FWIW I do agree the data sets shared in this thread are pretty extreme
> and it doesn't make much sense to slow the regular cases. I'll be
> perfectly happy if we stop the OOM, making those cases fast is a bonus.

Personally I'd say let's just adopt the "stop growing" patch and call it
good.  In particular, I'm nervous about the idea of backpatching your RH
patch because of the ABI hazard from changing struct TupleHashTableData.

But perhaps Andres has some different idea in mind ...

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-09 10:36:48 -0500, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > On 12/06/2017 06:19 PM, Andres Freund wrote:
> >> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
> >>> In any case, the hashtable code needs to not fall over in the
> >>> presence of a lot of collisions, regardless of the exact reason
> >>> for there being a lot.
> 
> >> Yes, we need to be more resilient about it. Working on a patch.
> 
> This seems to have fallen through a crack over Christmas holidays.
> Let's see if we can't push it to a conclusion.

Yea, I'll try to get to it over the weekend. Sorry for the delay.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 11/27/17 14:17, Tomas Vondra wrote:
> Hi,
> 
> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      14932
>> Logged by:          Todd Cook
>> Email address:      tcook@blackducksoftware.com
>> PostgreSQL version: 10.1
>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>> Description:
>>
>> It hangs on a table with 167834 rows, though it works fine with only 167833
>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
>> be killed to stop it.
>>
> 
> Can you share the query and data, so that we can reproduce the issue?
> 
> Based on the stack traces this smells like a bug in the simplehash,
> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
> which gets triggered for 167834 rows (but not for 167833).

FWIW, changing the guts of hashint8() to

+       if (val >= INT32_MIN && val <= INT32_MAX)
+               return hash_uint32((uint32) val);
+       else
+               return hash_any((unsigned char *) &val, sizeof(val));

allows us to process a full-sized data set of around 900 million rows.  However,
memory usage seemed to be rather excessive (we can only run 7 of these jobs in parallel
on a 128GB system before the OOM killer kicked in, rather than the usual 24); if there's
any interest, I can try to measure exactly how excessive.

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 01/25/2018 11:31 PM, Todd A. Cook wrote:
> On 11/27/17 14:17, Tomas Vondra wrote:
>> Hi,
>>
>> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      14932
>>> Logged by:          Todd Cook
>>> Email address:      tcook@blackducksoftware.com
>>> PostgreSQL version: 10.1
>>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>>> Description:
>>>
>>> It hangs on a table with 167834 rows, though it works fine with only
>>> 167833
>>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend
>>> has to
>>> be killed to stop it.
>>>
>>
>> Can you share the query and data, so that we can reproduce the issue?
>>
>> Based on the stack traces this smells like a bug in the simplehash,
>> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
>> which gets triggered for 167834 rows (but not for 167833).
> 
> FWIW, changing the guts of hashint8() to
> 
> +       if (val >= INT32_MIN && val <= INT32_MAX)
> +               return hash_uint32((uint32) val);
> +       else
> +               return hash_any((unsigned char *) &val, sizeof(val));
> 
> allows us to process a full-sized data set of around 900 million rows. 
> However,
> memory usage seemed to be rather excessive (we can only run 7 of these
> jobs in parallel
> on a 128GB system before the OOM killer kicked in, rather than the usual
> 24); if there's
> any interest, I can try to measure exactly how excessive.
> 

I suspect you're right the hash is biased to lohalf bits, as you wrote
in the 19/12 message. In fact, I think it's a direct consequence of the
requirement that hashint8() needs to produce the same hash for logically
equivalent int2 and int4 values.

Out of curiosity, could you try replacing the hash_any call in hashint8
with a hash function like murmur3, and see if it improves the behavior?

That obviously breaks the hashint8 for cross-type hash joins, but it
would be interesting bit of information I think.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I suspect you're right the hash is biased to lohalf bits, as you wrote
> in the 19/12 message.

I don't see any bias in what it's doing, which is basically xoring the
two halves and hashing the result.  It's possible though that Todd's
data set contains values in which corresponding bits of the high and
low halves are correlated somehow, in which case the xor would produce
a lot of cancellation and a relatively small number of distinct outputs.

If we weren't bound by backwards compatibility, we could consider changing
to logic more like "if the value is within the int4 range, apply int4hash,
otherwise hash all 8 bytes normally".  But I don't see how we can change
that now that hash indexes are first-class citizens.

In any case, we still need a fix for the behavior that the hash table size
is blown out by lots of collisions, because that can happen no matter what
the hash function is.  Andres seems to have dropped the ball on doing
something about that.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:
On 01/27/2018 12:22 AM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I suspect you're right the hash is biased to lohalf bits, as you wrote
>> in the 19/12 message.
> 
> I don't see any bias in what it's doing, which is basically xoring the
> two halves and hashing the result.  It's possible though that Todd's
> data set contains values in which corresponding bits of the high and
> low halves are correlated somehow, in which case the xor would produce
> a lot of cancellation and a relatively small number of distinct outputs.
> 

Hmm, that makes more sense than what I wrote. Probably time to get some
sleep or drink more coffee, I guess.

BTW what do you think about the fact that we only really generate ~63%
of the possible hash values (see my message from 11/12)? That seems a
bit unfortunate, although not unexpected for simple hash hunction.

> If we weren't bound by backwards compatibility, we could consider
> changing to logic more like "if the value is within the int4 range,
> apply int4hash, otherwise hash all 8 bytes normally". But I don't see
> how we can change that now that hash indexes are first-class
> citizens.
> 

Yeah, I've been thinking about that too. But I think it's an issue only
for pg_upgraded clusters, which may have have hash indexes (and also
hash-partitioned tables). So couldn't we use new hash functions in fresh
clusters and use the backwards-compatible ones in pg_upgraded ones?r

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
Hi,

On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> 1) randomized hashing, i.e. using the extended hash functions and a
> random seed

Given the lack of extended hash functions for everything this doesn't
seem quite viable. But I think we can make it work quite easily
nonetheless - we can just do the combining with a random seed outside of
the hashtable. We kinda already do so (with a really crappy combiner) in
TupleHashTableHash(), except that we'd actually have to use a random
seed.

This seems like something we should do, just out of defensiveness. It'll
not be as good as persisting the hashfunction's state across calls, but
well, ...

Let me write a patch for that.


> 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
> formula, where h is "our" hash, and (a,b) are random and p is a prime.
> This can't possibly fix the collisions reported by Todd, because the
> result will remain the same. But it should fix "my" chains of values by
> spreading them a bit (and the a,b,p values are generated on each grow,
> so in the unlikely case where we get a chain, it should disappear after
> a growth)

This seems on the too expensive side of things for my taste to do
unconditionally.


> 3) disabling growth of the hash table when the fillfactor would drop too
> much (e.g. below 0.2)

Yea, I think we should just apply something like this.


> FWIW I do agree the data sets shared in this thread are pretty extreme
> and it doesn't make much sense to slow the regular cases. I'll be
> perfectly happy if we stop the OOM, making those cases fast is a bonus.

Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
and so something better in 11. And then later possibly backpatch if
we've grown some confidence?

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-26 18:22:26 -0500, Tom Lane wrote:
> In any case, we still need a fix for the behavior that the hash table size
> is blown out by lots of collisions, because that can happen no matter what
> the hash function is.  Andres seems to have dropped the ball on doing
> something about that.

Didn't have spare brain cycles :(, and the next backbranch release
wasn't yet close reducing immediate urgency a bit.  As written nearby I
think we should make execGrouping.c users of hashtables use a more
random IV, and apply something similar to the growth limit patch from
Tomas.

Greetings,

Andres Freund


Andres Freund <andres@anarazel.de> writes:
> On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
>> FWIW I do agree the data sets shared in this thread are pretty extreme
>> and it doesn't make much sense to slow the regular cases. I'll be
>> perfectly happy if we stop the OOM, making those cases fast is a bonus.

> Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
> and so something better in 11. And then later possibly backpatch if
> we've grown some confidence?

+1.  I'd like to see some response to this included in 10.2, and time
grows short for that.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-26 18:48:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> >> FWIW I do agree the data sets shared in this thread are pretty extreme
> >> and it doesn't make much sense to slow the regular cases. I'll be
> >> perfectly happy if we stop the OOM, making those cases fast is a bonus.
> 
> > Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
> > and so something better in 11. And then later possibly backpatch if
> > we've grown some confidence?
> 
> +1.  I'd like to see some response to this included in 10.2, and time
> grows short for that.

Here are two patches that I think we want for 10.2, and the start of one
that I think we want for master.  0002 is needed because otherwise the
lack of extra growth leads to noticeably worse performance when filling
an underestimated a coordinator hash table from the workers - turns out
our hash combine (and most hash combines) let a lot of clustering
survive. By adding a final hashing round the bit perturbation is near
perfect.  The commit messages need to be polished a bit, but other than
that I think these are reasonable fixes. Plan to push by Monday evening
at the latest.

The third patch is a version of the random IV discussed in this
thread. I do think we want to add usage of the extended hash functions,
as prototyped by Tomas, as that actually helps to fix issues with actual
hash conflicts. But we additionally need a fallback path for types
without extended hashtables, and the random IV is a good idea
nonetheless.  There's no ABI difference in my patch, so I think this is
actually something we could backpatch. But I don't think it's urgent, so
I'm not planning to do that for 10.2.  The one thing that could confuse
people is that it can lead to output order changes from run to run - I
think that's actually good, nobody should rely on hashagg etc output
being stable, but it might be a bit much in a stable release?


In my tests this solves the worst performance issues in Todd's case,
Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18
(which showcases the underestimated worker hashtable into leader
hashtable issue).

Greetings,

Andres Freund

Attachment

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 01/29/2018 02:02 AM, Andres Freund wrote:
> On 2018-01-26 18:48:35 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
>>>> FWIW I do agree the data sets shared in this thread are pretty extreme
>>>> and it doesn't make much sense to slow the regular cases. I'll be
>>>> perfectly happy if we stop the OOM, making those cases fast is a bonus.
>>
>>> Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
>>> and so something better in 11. And then later possibly backpatch if
>>> we've grown some confidence?
>>
>> +1.  I'd like to see some response to this included in 10.2, and time
>> grows short for that.
> 
> Here are two patches that I think we want for 10.2, and the start of one
> that I think we want for master.  0002 is needed because otherwise the
> lack of extra growth leads to noticeably worse performance when filling
> an underestimated a coordinator hash table from the workers - turns out
> our hash combine (and most hash combines) let a lot of clustering
> survive. By adding a final hashing round the bit perturbation is near
> perfect.  The commit messages need to be polished a bit, but other than
> that I think these are reasonable fixes. Plan to push by Monday evening
> at the latest.
> 
> The third patch is a version of the random IV discussed in this
> thread. I do think we want to add usage of the extended hash functions,
> as prototyped by Tomas, as that actually helps to fix issues with actual
> hash conflicts. But we additionally need a fallback path for types
> without extended hashtables, and the random IV is a good idea
> nonetheless.  There's no ABI difference in my patch, so I think this is
> actually something we could backpatch. But I don't think it's urgent, so
> I'm not planning to do that for 10.2.  The one thing that could confuse
> people is that it can lead to output order changes from run to run - I
> think that's actually good, nobody should rely on hashagg etc output
> being stable, but it might be a bit much in a stable release?
> 
> 
> In my tests this solves the worst performance issues in Todd's case,
> Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18
> (which showcases the underestimated worker hashtable into leader
> hashtable issue).
> 

Sounds reasonable.

Do you think repeating the performance testing is necessary? Not sure
when do you plan to push the changes, I may not have time for the tests
until after FOSDEM.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
Hi,

On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote:
> Do you think repeating the performance testing is necessary? Not sure
> when do you plan to push the changes, I may not have time for the tests
> until after FOSDEM.

Hm, it surely wouldn't hurt, but your previous result does give a lot of
confidence - there's not a lot of fundamental difference leaving one
more mitigation that should never reduce hashtable "quality" aside.  So
I think going ahead with pushing and then benchmarking if possible, is
reasonable?

Greetings,

Andres Freund


Andres Freund <andres@anarazel.de> writes:
> On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote:
>> Do you think repeating the performance testing is necessary? Not sure
>> when do you plan to push the changes, I may not have time for the tests
>> until after FOSDEM.

> Hm, it surely wouldn't hurt, but your previous result does give a lot of
> confidence - there's not a lot of fundamental difference leaving one
> more mitigation that should never reduce hashtable "quality" aside.  So
> I think going ahead with pushing and then benchmarking if possible, is
> reasonable?

Since the release wrap is this coming Monday, there's not really time to
wait and do something after FOSDEM ...

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From
Andres Freund
Date:

On January 29, 2018 10:11:38 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2018-01-29 14:41:49 +0100, Tomas Vondra wrote:
>>> Do you think repeating the performance testing is necessary? Not
>sure
>>> when do you plan to push the changes, I may not have time for the
>tests
>>> until after FOSDEM.
>
>> Hm, it surely wouldn't hurt, but your previous result does give a lot
>of
>> confidence - there's not a lot of fundamental difference leaving one
>> more mitigation that should never reduce hashtable "quality" aside.
>So
>> I think going ahead with pushing and then benchmarking if possible,
>is
>> reasonable?
>
>Since the release wrap is this coming Monday, there's not really time
>to
>wait and do something after FOSDEM ...

I'm not sure what you're arguing for? I have measured the performance impacts, and am happy. That's why I do not want
towait till then.  Doing additional benchmarks after commit can't hurt tho... 

Andres

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 01/28/18 20:02, Andres Freund wrote:
> On 2018-01-26 18:48:35 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
>>>> FWIW I do agree the data sets shared in this thread are pretty extreme
>>>> and it doesn't make much sense to slow the regular cases. I'll be
>>>> perfectly happy if we stop the OOM, making those cases fast is a bonus.
>>
>>> Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
>>> and so something better in 11. And then later possibly backpatch if
>>> we've grown some confidence?
>>
>> +1.  I'd like to see some response to this included in 10.2, and time
>> grows short for that.
> 
> Here are two patches that I think we want for 10.2, and the start of one
> that I think we want for master.  0002 is needed because otherwise the
> lack of extra growth leads to noticeably worse performance when filling
> an underestimated a coordinator hash table from the workers - turns out
> our hash combine (and most hash combines) let a lot of clustering
> survive. By adding a final hashing round the bit perturbation is near
> perfect.  The commit messages need to be polished a bit, but other than
> that I think these are reasonable fixes. Plan to push by Monday evening
> at the latest.

I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1
and got the following build error:

    execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory
     #include "utils/hashutils.h"

My build sequence was

    git pull
    git status  (showing clean tree)
    make distclean
    ./configure ...
    make

-- todd


Andres Freund <andres@anarazel.de> writes:
> Here are two patches that I think we want for 10.2, and the start of one
> that I think we want for master.  0002 is needed because otherwise the
> lack of extra growth leads to noticeably worse performance when filling
> an underestimated a coordinator hash table from the workers - turns out
> our hash combine (and most hash combines) let a lot of clustering
> survive. By adding a final hashing round the bit perturbation is near
> perfect.  The commit messages need to be polished a bit, but other than
> that I think these are reasonable fixes. Plan to push by Monday evening
> at the latest.

The first 2 seem OK to me by eyeball, though I've not done performance
testing.

> The third patch is a version of the random IV discussed in this
> thread. I do think we want to add usage of the extended hash functions,
> as prototyped by Tomas, as that actually helps to fix issues with actual
> hash conflicts. But we additionally need a fallback path for types
> without extended hashtables, and the random IV is a good idea
> nonetheless.  There's no ABI difference in my patch, so I think this is
> actually something we could backpatch. But I don't think it's urgent, so
> I'm not planning to do that for 10.2.  The one thing that could confuse
> people is that it can lead to output order changes from run to run - I
> think that's actually good, nobody should rely on hashagg etc output
> being stable, but it might be a bit much in a stable release?

I disagree: people should reasonably expect the same query and same
data and same plan to give consistent results.  When we stuck in the
"synchronous seqscans" feature, which broke that property, we were very
quickly forced by user complaints to provide a way to shut it off.
I'm also concerned that we'd have to lobotomize a bunch of regression
test cases to ensure stable results there.

IOW, I think randomizing hashkeys is unacceptable for HEAD, let alone
back-patching.

            regards, tom lane


Andres Freund <andres@anarazel.de> writes:
> On January 29, 2018 10:11:38 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Since the release wrap is this coming Monday, there's not really time
>> to wait and do something after FOSDEM ...

> I'm not sure what you're arguing for? I have measured the performance impacts, and am happy. That's why I do not want
towait till then.  Doing additional benchmarks after commit can't hurt tho... 

I'm arguing for not waiting.  If Tomas does some more benchmarking and
finds something else we can improve, that's fine, but the stop-growing
patch is already an important fix compared to 10.1.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-29 13:25:35 -0500, Tom Lane wrote:
> The first 2 seem OK to me by eyeball, though I've not done performance
> testing.

Thanks for the look!


> > The third patch is a version of the random IV discussed in this
> > thread. I do think we want to add usage of the extended hash functions,
> > as prototyped by Tomas, as that actually helps to fix issues with actual
> > hash conflicts. But we additionally need a fallback path for types
> > without extended hashtables, and the random IV is a good idea
> > nonetheless.  There's no ABI difference in my patch, so I think this is
> > actually something we could backpatch. But I don't think it's urgent, so
> > I'm not planning to do that for 10.2.  The one thing that could confuse
> > people is that it can lead to output order changes from run to run - I
> > think that's actually good, nobody should rely on hashagg etc output
> > being stable, but it might be a bit much in a stable release?
> 
> I disagree: people should reasonably expect the same query and same
> data and same plan to give consistent results.  When we stuck in the
> "synchronous seqscans" feature, which broke that property, we were very
> quickly forced by user complaints to provide a way to shut it off.
> I'm also concerned that we'd have to lobotomize a bunch of regression
> test cases to ensure stable results there.

I don't think the cases really compare. For one there's no "natural
order" like insertion order for seqscans (in simplistic cases). For
another hashaggs/joins/whatnot are *already* architecture dependent, so
there's ORDER BYs for pretty much everything already - I needed to add a
single ORDER BY to make things stable over ~50 regression runs.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-29 13:20:25 -0500, Todd A. Cook wrote:
> I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1
> and got the following build error:
> 
>     execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory
>      #include "utils/hashutils.h"

Yea, there's some minor differences from master to the 10 branch,
needing to backport one commit beforehand. Attached are three patches
applying to 10.

Greetings,

Andres Freund

Attachment
Andres Freund <andres@anarazel.de> writes:
> On 2018-01-29 13:25:35 -0500, Tom Lane wrote:
>> I disagree: people should reasonably expect the same query and same
>> data and same plan to give consistent results.  When we stuck in the
>> "synchronous seqscans" feature, which broke that property, we were very
>> quickly forced by user complaints to provide a way to shut it off.
>> I'm also concerned that we'd have to lobotomize a bunch of regression
>> test cases to ensure stable results there.

> I don't think the cases really compare. For one there's no "natural
> order" like insertion order for seqscans (in simplistic cases). For
> another hashaggs/joins/whatnot are *already* architecture dependent, so
> there's ORDER BYs for pretty much everything already - I needed to add a
> single ORDER BY to make things stable over ~50 regression runs.

As developers who see test results from a variety of architectures,
we're aware that hash ops don't give stable results across machines.
Most end users aren't going to know or care about that: they're running
on one architecture and up to now they've gotten stable results from
hash ops.  I think we *will* get pushback if we change that.

Also, the fact that you needed one more ORDER BY in testing on a
single machine is already proof of my worry about the regression
tests, and it'll only get worse when testing across a variety
of machines.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 01/29/18 13:31, Andres Freund wrote:
> On 2018-01-29 13:20:25 -0500, Todd A. Cook wrote:
>> I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1
>> and got the following build error:
>>
>>     execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory
>>      #include "utils/hashutils.h"
> 
> Yea, there's some minor differences from master to the 10 branch,
> needing to backport one commit beforehand. Attached are three patches
> applying to 10.

Thanks.

With those patches applied, I'm able to process a 175 million row data set
without any problems.  I'll try the 900 million row set next.

-- todd


One other point here is that it's not really clear to me what a randomly
varying IV is supposed to accomplish.  Surely we're not intending that
it prevents somebody from crafting a data set that causes bad hash
performance.  If a user with DB access wants to cause a performance
problem, there are and always will be plenty of other avenues to making
that happen.  If the idea is that for a data set that otherwise would have
bad hash performance, choosing a different IV would (almost always) fix
it, that sounds good but you're ignoring the inverse case: for a data set
that works fine, there would be some choices of IV that create a problem
where there was none before.  I see no reason to think that the
probability of the former kind of situation is higher than the latter.

So I'm on board with using the extended hash functions when available,
but I'm not convinced that a varying IV buys us anything but trouble.

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-29 13:56:37 -0500, Todd A. Cook wrote:
> With those patches applied, I'm able to process a 175 million row data set
> without any problems.

Cool!

> I'll try the 900 million row set next.

Thanks!

- Andres


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
Hi,

On 2018-01-29 13:45:10 -0500, Tom Lane wrote:
> Also, the fact that you needed one more ORDER BY in testing on a
> single machine is already proof of my worry about the regression
> tests, and it'll only get worse when testing across a variety
> of machines.

That's an entirely unconvincing argument. A lot of changes to hashtable
code will result in ordering changes, and has done so in the past. Sure
there occasionally was need for a followup ORDER BY addition, but that's
not that bad. The fact that only a single order by was needed when
entirely changing the hash function output (by randomizing it) doesn't
at all seem to suggest that it's a huge problem.


On 2018-01-29 14:11:26 -0500, Tom Lane wrote:
> One other point here is that it's not really clear to me what a randomly
> varying IV is supposed to accomplish.  Surely we're not intending that
> it prevents somebody from crafting a data set that causes bad hash
> performance.

I do think we should make that harder.


> If a user with DB access wants to cause a performance
> problem, there are and always will be plenty of other avenues to making
> that happen.

Sure, but that's different from somebody triggering, e.g. by website
input, such behaviour.


> If the idea is that for a data set that otherwise would have
> bad hash performance, choosing a different IV would (almost always) fix
> it, that sounds good but you're ignoring the inverse case: for a data set
> that works fine, there would be some choices of IV that create a problem
> where there was none before.  I see no reason to think that the
> probability of the former kind of situation is higher than the latter.

It's easily intentionally triggerable vs. not.


> So I'm on board with using the extended hash functions when available,
> but I'm not convinced that a varying IV buys us anything but trouble.

I think we really need both. A constant IV of e.g. 0 for a single-column
hashtable doesn't really fix much, even though it does improve hash
quality a bit for multi-column indexes. A random IV does address issues
due to a lot of tuples falling into the same bucket, but does not
address actual collisions. Both together are pretty convincing tho.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Greg Stark
Date:
On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One other point here is that it's not really clear to me what a randomly
> varying IV is supposed to accomplish.  Surely we're not intending that
> it prevents somebody from crafting a data set that causes bad hash
> performance.

I actually think that is a real live issue that we will be forced to
deal with one day. And I think that day is coming soon.

It's not hard to imagine a user of a web site intentionally naming
their objects such that they all hash to the same value. Probably most
systems the worst case is a query that takes a few seconds or even
tens of seconds but if you get lucky you could run a server out of
memory.

I'm actually thinking we should replace all the hashes with
https://en.wikipedia.org/wiki/SipHash with a randomly generated key.

-- 
greg


Greg Stark <stark@mit.edu> writes:
> On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One other point here is that it's not really clear to me what a randomly
>> varying IV is supposed to accomplish.  Surely we're not intending that
>> it prevents somebody from crafting a data set that causes bad hash
>> performance.

> I actually think that is a real live issue that we will be forced to
> deal with one day. And I think that day is coming soon.

> It's not hard to imagine a user of a web site intentionally naming
> their objects such that they all hash to the same value. Probably most
> systems the worst case is a query that takes a few seconds or even
> tens of seconds but if you get lucky you could run a server out of
> memory.

By their very nature, hash algorithms have weak spots.  Pretending that
they do not, or that you can 100% remove them, is a fool's errand.

You could always "set enable_hashjoin = off", and deal with mergejoin's
weak spots instead; but that just requires a different data set to
expose its performance shortcomings ...

            regards, tom lane


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-29 17:54:42 -0500, Tom Lane wrote:
> Greg Stark <stark@mit.edu> writes:
> > On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> One other point here is that it's not really clear to me what a randomly
> >> varying IV is supposed to accomplish.  Surely we're not intending that
> >> it prevents somebody from crafting a data set that causes bad hash
> >> performance.
> 
> > I actually think that is a real live issue that we will be forced to
> > deal with one day. And I think that day is coming soon.
> 
> > It's not hard to imagine a user of a web site intentionally naming
> > their objects such that they all hash to the same value. Probably most
> > systems the worst case is a query that takes a few seconds or even
> > tens of seconds but if you get lucky you could run a server out of
> > memory.
> 
> By their very nature, hash algorithms have weak spots.  Pretending that
> they do not, or that you can 100% remove them, is a fool's errand.

Having edge case weaknesses and having remotely triggerable weaknesses
aren't the same. And the cost for preventing such misuse is a
immeasurable runtime difference and returning queries that don't specify
an order and don't have a "natural" order with a changing order. That's
not particularly high.

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Tomas Vondra
Date:

On 01/29/2018 08:11 PM, Tom Lane wrote:
> One other point here is that it's not really clear to me what a
> randomly varying IV is supposed to accomplish. Surely we're not
> intending that it prevents somebody from crafting a data set that
> causes bad hash performance. If a user with DB access wants to cause
> a performance problem, there are and always will be plenty of other
> avenues to making that happen.
While I'm not sure the random IV is something we want/need, I don't
think "There are other ways to harm the database," is a convincing argument.

I agree it's fairly easy to set work_mem to an insane value, or craft a
query that eats all memory (many sorts using work_mem each, hashagg used
for data with many groups, ...). Basically, if you don't know how to
crash the database, you're not a senior DBA/engineer.

But not all attacks are equal. All those examples I named require direct
database connection, ability to construct SQL queries, etc.

The case discussed in this thread does not require that - it's enough to
control the data input, say by uploading a CSV file with crafted data to
some web application (as Greg mentions). Of course, you need to make
some assumptions while crafting the data (that it's using Postgres, and
that it'll use hash aggregate on the data), but that's about it.


> If the idea is that for a data set that otherwise would have bad hash
> performance, choosing a different IV would (almost always) fix it,
> that sounds good but you're ignoring the inverse case: for a data
> set that works fine, there would be some choices of IV that create a
> problem where there was none before. I see no reason to think that
> the probability of the former kind of situation is higher than the
> latter.

I don't think this thread is really about randomly generated data sets,
but about something that is generated in a way that increases the number
of collisions - either intentionally or unintentionally. That's
certainly the case of my data set, which was designed to trigger one of
the issues in the code.

I agree we need to check if the randomization might cause more harm than
good by causing random failures. But my feeling is the probability will
be very low.

> So I'm on board with using the extended hash functions when
> available, but I'm not convinced that a varying IV buys us anything
> but trouble.

I don't have a clear opinion on the random IV yet.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 01/29/18 14:26, Andres Freund wrote:
> On 2018-01-29 13:56:37 -0500, Todd A. Cook wrote:
>> With those patches applied, I'm able to process a 175 million row data set
>> without any problems.
> 
> Cool!
> 
>> I'll try the 900 million row set next.

We're also able to process those 900 million rows without problems. :)
It took about 28 minutes (average of 3 runs).

Out of curiosity, I then modified hashint8() as previously described.
With that change, run time dropped to 11 minutes (also average of 3
runs).

FWIW, the data values in these sets are sort-of random (where I can't
explain the "sort-of" in a public forum), but strongly biased towards
negative infinity.  Starting again from scratch, we could probably remove
the bias, but we have 28-30 billion of these things collected over the
last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change
directions at this point...

-- todd


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
Andres Freund
Date:
On 2018-01-30 13:57:44 -0500, Todd A. Cook wrote:
> Out of curiosity, I then modified hashint8() as previously described.
> With that change, run time dropped to 11 minutes (also average of 3
> runs).
> 
> FWIW, the data values in these sets are sort-of random (where I can't
> explain the "sort-of" in a public forum), but strongly biased towards
> negative infinity.  Starting again from scratch, we could probably remove
> the bias, but we have 28-30 billion of these things collected over the
> last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change
> directions at this point...

FWIW, you could just create a different hash opclass and use it for
those queries...

Greetings,

Andres Freund


Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From
"Todd A. Cook"
Date:
On 01/30/18 14:34, Andres Freund wrote:
> On 2018-01-30 13:57:44 -0500, Todd A. Cook wrote:
>> Out of curiosity, I then modified hashint8() as previously described.
>> With that change, run time dropped to 11 minutes (also average of 3
>> runs).
>>
>> FWIW, the data values in these sets are sort-of random (where I can't
>> explain the "sort-of" in a public forum), but strongly biased towards
>> negative infinity.  Starting again from scratch, we could probably remove
>> the bias, but we have 28-30 billion of these things collected over the
>> last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change
>> directions at this point...
> 
> FWIW, you could just create a different hash opclass and use it for
> those queries...

Hmmm, that's intriguing; I'll look into it.  Thanks! :)

-- todd