Thread: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop

BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infiniteloop

From

tcook@blackducksoftware.com

Date:

27 November 2017, 21:57:00

The following bug has been logged on the website:

Bug reference:      14932
Logged by:          Todd Cook
Email address:      tcook@blackducksoftware.com
PostgreSQL version: 10.1
Operating system:   CentOS Linux release 7.4.1708 (Core)
Description:

It hangs on a table with 167834 rows, though it works fine with only 167833
rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
be killed to stop it.

Some sample stack traces:

#0  0x00007f66f1ee9860 in __memset_sse2 () from /lib64/libc.so.6
#1  0x000000000083e085 in memset (__len=51539607552, __ch=0,
__dest=0x7f4cdf35c048) at /usr/include/bits/string3.h:84
#2  MemoryContextAllocExtended (context=<optimized out>, size=51539607552,
flags=flags@entry=5) at mcxt.c:843
#3  0x00000000005ec804 in tuplehash_allocate (type=0x163ecc8,
size=<optimized out>) at ../../../src/include/lib/simplehash.h:305
#4  tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:379
#5  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#6  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#7  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#8  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#9  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#10 ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#11 0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#12 ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#13 standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#14 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#15 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#16 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#17 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#18 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#19 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#20 ServerLoop () at postmaster.c:1753
#21 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#22 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:450
450                    curelem = SH_NEXT(tb, curelem, startelem);
#0  tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>) at
../../../src/include/lib/simplehash.h:450
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:443
443                    newentry = &newdata[curelem];
#0  0x00000000005ec8ac in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:443
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized out>)
at ../../../src/include/lib/simplehash.h:445
445                    if (newentry->status == SH_STATUS_EMPTY)
#0  0x00000000005ec8b4 in tuplehash_grow (tb=0x163ecc8, newsize=<optimized
out>) at ../../../src/include/lib/simplehash.h:445
#1  0x00000000005ece35 in tuplehash_insert (tb=0x163ecc8, key=key@entry=0x0,
found=found@entry=0x7ffdaff3eb77 "") at
../../../src/include/lib/simplehash.h:504
#2  0x00000000005ed3ea in LookupTupleHashEntry (hashtable=0x163ec38,
slot=slot@entry=0x163e220, isnew=isnew@entry=0x7ffdaff3ebd7 "") at
execGrouping.c:387
#3  0x00000000005fae62 in lookup_hash_entry (aggstate=0x163ce88) at
nodeAgg.c:2075
#4  lookup_hash_entries (aggstate=aggstate@entry=0x163ce88) at
nodeAgg.c:2106
#5  0x00000000005fc5da in agg_fill_hash_table (aggstate=0x163ce88) at
nodeAgg.c:2536
#6  ExecAgg (pstate=0x163ce88) at nodeAgg.c:2140
#7  0x00000000005eec32 in ExecProcNode (node=0x163ce88) at
../../../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x163bbf8,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x163ce88, estate=0x163cc28) at execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x163c818, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:363
#10 0x0000000000718e3b in PortalRunSelect (portal=portal@entry=0x158a568,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x163bbf8) at pquery.c:932
#11 0x000000000071a1ef in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x163bbf8, altdest=0x163bbf8, completionTag=0x7ffdaff3eed0 "") at
pquery.c:773
#12 0x0000000000716163 in exec_simple_query (query_string=<optimized out>)
at postgres.c:1099
#13 0x000000000071745c in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4088
#14 0x000000000047ad1c in BackendRun (port=0x1592ab0) at postmaster.c:4357
#15 BackendStartup (port=0x1592ab0) at postmaster.c:4029
#16 ServerLoop () at postmaster.c:1753
#17 0x00000000006aea2f in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x156bed0) at postmaster.c:1361
#18 0x000000000047bb4f in main (argc=3, argv=0x156bed0) at main.c:228


select name, setting, unit, source, pending_restart from pg_settings where
source <> 'default' and context <> 'internal' order by lower(name) ;
             name             |      setting       | unit |        source
    | pending_restart 
------------------------------+--------------------+------+----------------------+-----------------
 application_name             | psql               |      | client
    | f
 autovacuum_work_mem          | 131072             | kB   | configuration
file   | f
 checkpoint_completion_target | 0.8                |      | configuration
file   | f
 checkpoint_timeout           | 1800               | s    | configuration
file   | f
 client_encoding              | SQL_ASCII          |      | client
    | f
 cluster_name                 | PG 10              |      | configuration
file   | f
 DateStyle                    | ISO, MDY           |      | configuration
file   | f
 default_text_search_config   | pg_catalog.english |      | configuration
file   | f
 dynamic_shared_memory_type   | posix              |      | configuration
file   | f
 effective_cache_size         | 8388608            | 8kB  | configuration
file   | f
 lc_messages                  | C                  |      | configuration
file   | f
 lc_monetary                  | C                  |      | configuration
file   | f
 lc_numeric                   | C                  |      | configuration
file   | f
 lc_time                      | C                  |      | configuration
file   | f
 listen_addresses             | *                  |      | configuration
file   | f
 log_destination              | stderr             |      | configuration
file   | f
 log_line_prefix              | %m [%p]            |      | configuration
file   | f
 log_lock_waits               | on                 |      | configuration
file   | f
 log_min_duration_statement   | 20000              | ms   | configuration
file   | f
 log_rotation_age             | 1440               | min  | configuration
file   | f
 log_rotation_size            | 0                  | kB   | configuration
file   | f
 log_temp_files               | 1024               | kB   | configuration
file   | f
 log_timezone                 | US/Eastern         |      | configuration
file   | f
 log_truncate_on_rotation     | off                |      | configuration
file   | f
 logging_collector            | on                 |      | configuration
file   | f
 maintenance_work_mem         | 1048576            | kB   | configuration
file   | f
 max_connections              | 100                |      | configuration
file   | f
 max_stack_depth              | 2048               | kB   | environment
variable | f
 max_wal_senders              | 0                  |      | configuration
file   | f
 max_wal_size                 | 4096               | MB   | configuration
file   | f
 port                         | 54310              |      | configuration
file   | f
 shared_buffers               | 1048576            | 8kB  | configuration
file   | f
 synchronous_commit           | off                |      | configuration
file   | f
 TimeZone                     | US/Eastern         |      | configuration
file   | f
 transaction_deferrable       | off                |      | override
    | f
 transaction_isolation        | read committed     |      | override
    | f
 transaction_read_only        | off                |      | override
    | f
 wal_buffers                  | 2048               | 8kB  | override
    | f
 wal_level                    | minimal            |      | configuration
file   | f
 work_mem                     | 65536              | kB   | configuration
file   | f

select version();
                                                 version
                            
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

27 November 2017, 22:17:25

Hi,

On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14932
> Logged by:          Todd Cook
> Email address:      tcook@blackducksoftware.com
> PostgreSQL version: 10.1
> Operating system:   CentOS Linux release 7.4.1708 (Core)
> Description:        
> 
> It hangs on a table with 167834 rows, though it works fine with only 167833
> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
> be killed to stop it.
> 

Can you share the query and data, so that we can reproduce the issue?

Based on the stack traces this smells like a bug in the simplehash,
introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
which gets triggered for 167834 rows (but not for 167833).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

27 November 2017, 23:59:37

On 11/27/17 14:17, Tomas Vondra wrote:
> Hi,
> 
> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      14932
>> Logged by:          Todd Cook
>> Email address:      tcook@blackducksoftware.com
>> PostgreSQL version: 10.1
>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>> Description:
>>
>> It hangs on a table with 167834 rows, though it works fine with only 167833
>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend has to
>> be killed to stop it.
>>
> 
> Can you share the query and data, so that we can reproduce the issue?
> 
> Based on the stack traces this smells like a bug in the simplehash,
> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
> which gets triggered for 167834 rows (but not for 167833).

I've attached a reproducer using real data.  (FWIW, I wasn't able to reproduce with
fake data made with generate_series().)

Also, I forgot to mention that this problem is not present in 9.6 or 9.5.

-- todd

Attachment

reproducer.pgsql

Re: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.

From

Skarsol

Date:

28 November 2017, 00:49:53

As a workaround for this issue we've dropped the pgbouncer connection
lifetime to 60 seconds and that seems to have alleviated this for the
most part. No response from pgbouncer about this (either to the recently
created issue or the mailing list message last year when I initially
reported this).

Based on the comments in the past discussions on the postgres cancel
protocol it seems that this is not viewed as a big issue because there's
no real reports of it causing problems. Are other people just not using
pgbouncer in transaction mode with the default settings (or not having
two instances of pgbouncer between client and server)? Or typically
don't send many cancel requests? Or is there just something silly I'm
missing?

In one of our medium databases we see 5-15 cancel requests per day and
with pgbouncer on the hard coded default setting (3600 second connection
lifetime) we would get around 1 or 2 relevant erroneous cancels (one
that causes an insert to fail, typically failing a larger transaction)
per week. This was up from about 1 a month using the config default of
1800 seconds that we lived with for a long time.

Is there something better we should be using other than pgbouncer for
connection pooling?

On 2017-11-08 22:55, Skarsol wrote:

The following bug has been logged on the website:

Bug reference: 14891
Logged by: Skarsol
Email address: postgresql(at)skarsol(dot)com
PostgreSQL version: 9.6.3
Operating system: Linux 4.4.8-hardened-r1 #4 SMP Mon Jun 12
Description:

This might be a symptom of the issue discussed in the ToDo "Changes to
make
cancellations more reliable and more secure" but as it is related to the
pgbouncer bug I've opened at
https://github.com/pgbouncer/pgbouncer/issues/245 I figured I'd post it
over
here just to make sure.

As the last step of this bug, pgbouncer 1.7.2 presents a cancel request
to
postgres 9.6.3. This request targets pid 29330 which is connected to
pgbouncer on port 33024. That pid then accepts a new query, returns a
result
set, accepts another new query, and then cancels that one out.

Expected behavior would have been for either no cancel (as that pid was
between queries at the time) or to cancel the first query. Cancelling
the
2nd query is just weird (to me).

I have no idea how much of this is related to whatever pgbouncer is
doing to
delay the cancel in the first place before presenting it to postgres.

I'm aware that we're 2 minor versions behind, but I don't see anything
that
seems relevant to this in the changelogs.

Image of the relevant wireshark display at
https://user-images.githubusercontent.com/1915152/32578433-d5d4a71c-c4a2-11e7-9d25-f59d5afbb06b.jpg

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

28 November 2017, 01:10:42

Hi,

On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
> COPY reproducer (val) FROM stdin;
> 2976219712004784288
> -6429122065899879392
> -7471109962990387136
> -7471109962990387136
> -2895470491222113184
> -4083509061952565472
> 1019481548263425664
> 4639248884787347648
> -6999443831165647744
> -4199917803455020480
> -4110530183001439680

How are these values generated? They awfully look like hash values
(~same lenght, full numerical range)...

I'm kinda neck deep in something else, but I'll try to have a look
afterwards.

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Thomas Munro

Date:

28 November 2017, 05:20:40

On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>> COPY reproducer (val) FROM stdin;
>> 2976219712004784288
>> -6429122065899879392
>> -7471109962990387136
>> -7471109962990387136
>> -2895470491222113184
>> -4083509061952565472
>> 1019481548263425664
>> 4639248884787347648
>> -6999443831165647744
>> -4199917803455020480
>> -4110530183001439680
>
> How are these values generated? They awfully look like hash values
> (~same lenght, full numerical range)...

When SH_INSERT tries to insert that final extra value, insertdist
keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
double the size (at least until my computer gives up, somewhere around
11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
to 26 then it succeeds, but I guess some other attack could be crafted
for that.  What is the theory behind this parameter?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From

Tom Lane

Date:

28 November 2017, 07:03:03

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> When SH_INSERT tries to insert that final extra value, insertdist
> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
> double the size (at least until my computer gives up, somewhere around
> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
> to 26 then it succeeds, but I guess some other attack could be crafted
> for that.  What is the theory behind this parameter?

You beat me to it --- after looking at simplehash.h I'd guessed that
either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
an infinite loop, but I'd not gotten to determining which one yet.

I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
of them are obviously loop-proof.

Note that the sample data has a lot of collisions:

regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; hashint8   | count
-------------+-------  441526644 |  2337-1117776826 |  1221-1202007016 |   935-2068831050 |   620 1156644653 |   538
553783815|   510  259780770 |   444  371047036 |   394  915722575 |   359... etc etc ... 

It's evidently more complicated than just that the code fails with
more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
by a lot.  There needs to be a safety valve that prevents letting
the hash fill factor approach zero, which is what's happening in
this test case.

(I wonder whether these loops oughtn't contain CHECK_FOR_INTERRUPTS,
btw.)
        regards, tom lane

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Thomas Munro

Date:

28 November 2017, 07:51:50

On Tue, Nov 28, 2017 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You beat me to it --- after looking at simplehash.h I'd guessed that
> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
> an infinite loop, but I'd not gotten to determining which one yet.
>
> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
> of them are obviously loop-proof.
>
> Note that the sample data has a lot of collisions:
>
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>   hashint8   | count
> -------------+-------
>    441526644 |  2337
>  -1117776826 |  1221
>  -1202007016 |   935
>  -2068831050 |   620
>   1156644653 |   538
>    553783815 |   510
>    259780770 |   444
>    371047036 |   394
>    915722575 |   359
>  ... etc etc ...
>
> It's evidently more complicated than just that the code fails with
> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
> by a lot.  There needs to be a safety valve that prevents letting
> the hash fill factor approach zero, which is what's happening in
> this test case.

Yeah.  If you count distinct hashint8(val) of *distinct* values, you get:

postgres=# select hashint8(val), count(*) from (select distinct val
from reproducer) ss group by 1 order by 2 desc limit 1; hashint8  | count
------------+-------1288181237 |    26
(1 row)

It doesn't matter how many bits of that you use, you'll get at least
26 collisions, so our loop can't terminate just by increasing the mask
size.  My guess is that you'd either need a defence based on something
like a secondary hash function, or at least a dynamic SH_GROW_MAX_DIB
that wouldn't allow the fill factor to plummet as you say.  A dataset
could be found that would exceed any static value of SH_GROW_MAX_DIB
by brute force.  In this case, considering the definition of hashint8,
there are more straightforward ways to find distinct bigint values
that hash to the same value... just swap some bits between the high
and low words, or something like that.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

28 November 2017, 16:08:10


On 11/28/2017 03:20 AM, Thomas Munro wrote:
> On Tue, Nov 28, 2017 at 11:10 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>>> COPY reproducer (val) FROM stdin;
>>> 2976219712004784288
>>> -6429122065899879392
>>> -7471109962990387136
>>> -7471109962990387136
>>> -2895470491222113184
>>> -4083509061952565472
>>> 1019481548263425664
>>> 4639248884787347648
>>> -6999443831165647744
>>> -4199917803455020480
>>> -4110530183001439680
>>
>> How are these values generated? They awfully look like hash values
>> (~same lenght, full numerical range)...
> 
> When SH_INSERT tries to insert that final extra value, insertdist
> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
> double the size (at least until my computer gives up, somewhere around
> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
> to 26 then it succeeds, but I guess some other attack could be crafted
> for that.  What is the theory behind this parameter?
> 

Yeah, I came to the same hypothesis yesterday, but I see I failed to
keep pgsql-bugs on CC :-(

FWIW I believe the last doubling (from 2147483648 to 2*2147483648) is
doomed to fail due to the sizemask=0 bug. I mean, if oldsize=2147483648,
then newsize=2*2147483648, which triggers this:
   if (tb->size == SH_MAX_SIZE)       tb->sizemask = 0;

which would explain why the last grow did not complete even after 40
minutes (while the 1073741824->2147483648 took 15 seconds). Because with
sizemask=0 the SH_NEXT/SH_PREV/.. can only ever returns 0.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

28 November 2017, 17:49:34

On 11/27/17 17:10, Andres Freund wrote:
> Hi,
> 
> On 2017-11-27 15:59:37 -0500, Todd A. Cook wrote:
>> COPY reproducer (val) FROM stdin;
>> 2976219712004784288
>> -6429122065899879392
>> -7471109962990387136
>> -7471109962990387136
>> -2895470491222113184
>> -4083509061952565472
>> 1019481548263425664
>> 4639248884787347648
>> -6999443831165647744
>> -4199917803455020480
>> -4110530183001439680
> 
> How are these values generated? They awfully look like hash values
> (~same lenght, full numerical range)...

They are biased hashes.  FWIW, the reproducer data is from one column of a
three-column primary key.

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

28 November 2017, 17:55:13

On 11/27/17 23:03, Tom Lane wrote:
> 
> Note that the sample data has a lot of collisions:
> 
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>    hashint8   | count
> -------------+-------
>     441526644 |  2337
>   -1117776826 |  1221
>   -1202007016 |   935
>   -2068831050 |   620
>    1156644653 |   538
>     553783815 |   510
>     259780770 |   444
>     371047036 |   394
>     915722575 |   359
>   ... etc etc ...

In case it matters, the complete data set will have some outlier values with 10k to 100k
collisions in this column.

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

28 November 2017, 18:02:19


On 11/28/2017 03:55 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>>    hashint8   | count
>> -------------+-------
>>     441526644 |  2337
>>   -1117776826 |  1221
>>   -1202007016 |   935
>>   -2068831050 |   620
>>    1156644653 |   538
>>     553783815 |   510
>>     259780770 |   444
>>     371047036 |   394
>>     915722575 |   359
>>   ... etc etc ...
> 
> In case it matters, the complete data set will have some outlier values
> with 10k to 100k collisions in this column.
> 

In the original values? Not a big deal, I guess. It's the hashint8
collisions that's causing the infinite loop, i.e. different values with
the same hashint8 result.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

28 November 2017, 21:14:59

On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
> > 
> > Note that the sample data has a lot of collisions:
> > 
> > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
> >    hashint8   | count
> > -------------+-------
> >     441526644 |  2337
> >   -1117776826 |  1221
> >   -1202007016 |   935
> >   -2068831050 |   620
> >    1156644653 |   538
> >     553783815 |   510
> >     259780770 |   444
> >     371047036 |   394
> >     915722575 |   359
> >   ... etc etc ...
> 
> In case it matters, the complete data set will have some outlier values with 10k to 100k
> collisions in this column.

To make sure we're on the same page, this is data intentionally created
to have a lot of hash collisions, is that right?

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

29 November 2017, 00:05:26

On 11/28/17 13:14, Andres Freund wrote:
> On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>>>     hashint8   | count
>>> -------------+-------
>>>      441526644 |  2337
>>>    -1117776826 |  1221
>>>    -1202007016 |   935
>>>    -2068831050 |   620
>>>     1156644653 |   538
>>>      553783815 |   510
>>>      259780770 |   444
>>>      371047036 |   394
>>>      915722575 |   359
>>>    ... etc etc ...
>>
>> In case it matters, the complete data set will have some outlier values with 10k to 100k
>> collisions in this column.
> 
> To make sure we're on the same page, this is data intentionally created
> to have a lot of hash collisions, is that right?

More or less.  It might be more accurate to say that it's created in such a way that
we expect to get lots of collisions.

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

06 December 2017, 00:23:02

On 11/27/17 23:03, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> When SH_INSERT tries to insert that final extra value, insertdist
>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>> double the size (at least until my computer gives up, somewhere around
>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>> to 26 then it succeeds, but I guess some other attack could be crafted
>> for that.  What is the theory behind this parameter?
> 
> You beat me to it --- after looking at simplehash.h I'd guessed that
> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
> an infinite loop, but I'd not gotten to determining which one yet.
> 
> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
> of them are obviously loop-proof.
> 
> Note that the sample data has a lot of collisions:
> 
> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>    hashint8   | count
> -------------+-------
>     441526644 |  2337
>   -1117776826 |  1221
>   -1202007016 |   935
>   -2068831050 |   620
>    1156644653 |   538
>     553783815 |   510
>     259780770 |   444
>     371047036 |   394
>     915722575 |   359
>   ... etc etc ...
> 
> It's evidently more complicated than just that the code fails with
> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
> by a lot.  There needs to be a safety valve that prevents letting
> the hash fill factor approach zero, which is what's happening in
> this test case.

FWIW, I can also reproduce the infinite loop with 167834 unique values.

It kinda looks like the problematic collisions arise from masking the
computed hash values; e.g.:

     SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
     {
    return hash & tb->sizemask;
     }

(Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

06 December 2017, 01:01:01

On 12/05/2017 10:23 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>> When SH_INSERT tries to insert that final extra value, insertdist
>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>> double the size (at least until my computer gives up, somewhere around
>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>> for that.  What is the theory behind this parameter?
>>
>> You beat me to it --- after looking at simplehash.h I'd guessed that
>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>> an infinite loop, but I'd not gotten to determining which one yet.
>>
>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>> of them are obviously loop-proof.
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>>    hashint8   | count
>> -------------+-------
>>     441526644 |  2337
>>   -1117776826 |  1221
>>   -1202007016 |   935
>>   -2068831050 |   620
>>    1156644653 |   538
>>     553783815 |   510
>>     259780770 |   444
>>     371047036 |   394
>>     915722575 |   359
>>   ... etc etc ...
>>
>> It's evidently more complicated than just that the code fails with
>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>> by a lot.  There needs to be a safety valve that prevents letting
>> the hash fill factor approach zero, which is what's happening in
>> this test case.
> 
> FWIW, I can also reproduce the infinite loop with 167834 unique values.
> 

Unique values or unique *hash* values?

Can you share the data, so that whoever fixes the bug can verify it also
fixes your example?

> It kinda looks like the problematic collisions arise from masking the
> computed hash values; e.g.:
> 
>     SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
>     {
>     return hash & tb->sizemask;
>     }
> 

That's pretty standard way to do modulo (when computing index of the
hash bucket), and certainly not the root cause.

> (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)
> 

Based on the initial discussion, the problem here is twofold.

Firstly, the code assumes that if it gets certain number of bucket
collisions (essentially, the initial bucket and certain number of
neighboring buckets already being full), making the table larger is
guaranteed to reduce the number of collisions.

Which is obviously untrue for duplicate hash values, but it may also
happen for distinct hash values that form a chain (think a sequence of
hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).

This causes the insane growth to the largest possible hash table.

Secondly, there's a bug that for the largest hash table we set
sizemask=0, which means we never ever get bucket index other than 0.

This causes the infinite loop.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

06 December 2017, 01:33:25

On 12/05/17 17:01, Tomas Vondra wrote:
> 
> 
> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>> double the size (at least until my computer gives up, somewhere around
>>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>> for that.  What is the theory behind this parameter?
>>>
>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>
>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>>> of them are obviously loop-proof.
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>> order by 2 desc;
>>>     hashint8   | count
>>> -------------+-------
>>>      441526644 |  2337
>>>    -1117776826 |  1221
>>>    -1202007016 |   935
>>>    -2068831050 |   620
>>>     1156644653 |   538
>>>      553783815 |   510
>>>      259780770 |   444
>>>      371047036 |   394
>>>      915722575 |   359
>>>    ... etc etc ...
>>>
>>> It's evidently more complicated than just that the code fails with
>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>> by a lot.  There needs to be a safety valve that prevents letting
>>> the hash fill factor approach zero, which is what's happening in
>>> this test case.
>>
>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>
> 
> Unique values or unique *hash* values?

Unique values.


> Can you share the data, so that whoever fixes the bug can verify it also
> fixes your example?

Sure.  It's attached.

-- todd

Attachment

reproducer_2.pgsql

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

06 December 2017, 16:33:36


On 12/05/2017 11:33 PM, Todd A. Cook wrote:
> On 12/05/17 17:01, Tomas Vondra wrote:
>>
>>
>> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>>> On 11/27/17 23:03, Tom Lane wrote:
>>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>>> double the size (at least until my computer gives up, somewhere around
>>>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>>> for that.  What is the theory behind this parameter?
>>>>
>>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>>
>>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>>>> of them are obviously loop-proof.
>>>>
>>>> Note that the sample data has a lot of collisions:
>>>>
>>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>>> order by 2 desc;
>>>>     hashint8   | count
>>>> -------------+-------
>>>>      441526644 |  2337
>>>>    -1117776826 |  1221
>>>>    -1202007016 |   935
>>>>    -2068831050 |   620
>>>>     1156644653 |   538
>>>>      553783815 |   510
>>>>      259780770 |   444
>>>>      371047036 |   394
>>>>      915722575 |   359
>>>>    ... etc etc ...
>>>>
>>>> It's evidently more complicated than just that the code fails with
>>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>>> by a lot.  There needs to be a safety valve that prevents letting
>>>> the hash fill factor approach zero, which is what's happening in
>>>> this test case.
>>>
>>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>>
>>
>> Unique values or unique *hash* values?
> 
> Unique values.
> 
> 
>> Can you share the data, so that whoever fixes the bug can verify it also
>> fixes your example?
> 
> Sure.  It's attached.
> 

Seems the dataset has pretty much the same issue as the one reported
before, that is

select hashint8(val), count(distinct val), count(*) from temp_f_03 group
by 1 order by 2 desc;

      hashint8   | count | count
    -------------+-------+-------
     -1971396144 |    45 |    45
      2035896403 |    42 |    42
     -1633843397 |    30 |    30
      1425704662 |    29 |    29
      -455482779 |    22 |    22
      -300163565 |    17 |    17
     -1803963420 |    17 |    17
      -537082846 |    14 |    14
       603707034 |    13 |    13
      -176369887 |    12 |    12
      1274957136 |    11 |    11
      1465522632 |    11 |    11
     -1589862230 |    10 |    10
     -1145403239 |    10 |    10

i.e. there are many hash collisions (more than in the other data set).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

06 December 2017, 19:58:04

On 12/06/17 08:33, Tomas Vondra wrote:
>
>>> Can you share the data, so that whoever fixes the bug can verify it also
>>> fixes your example?
>>
>> Sure.  It's attached.
>>
> 
> Seems the dataset has pretty much the same issue as the one reported
> before, that is
> 
> select hashint8(val), count(distinct val), count(*) from temp_f_03 group
> by 1 order by 2 desc;
> 
>        hashint8   | count | count
>      -------------+-------+-------
>       -1971396144 |    45 |    45
>        2035896403 |    42 |    42
>       -1633843397 |    30 |    30
>        1425704662 |    29 |    29
>        -455482779 |    22 |    22
>        -300163565 |    17 |    17
>       -1803963420 |    17 |    17
>        -537082846 |    14 |    14
>         603707034 |    13 |    13
>        -176369887 |    12 |    12
>        1274957136 |    11 |    11
>        1465522632 |    11 |    11
>       -1589862230 |    10 |    10
>       -1145403239 |    10 |    10
> 
> i.e. there are many hash collisions (more than in the other data set).

If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c,
it might be magnifying the difficulties here.  The least significant bits,
which are used as the bucket number in simplehash.h, are not very well
distributed:

select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ;
          val          |      to_hex      |  to_hex
----------------------+------------------+----------
   4444319256653758784 | 3dad64d121468140 | 805ffffe
    554179993563924608 | 7b0d7c49a018880  | 84dffffb
  -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7
  -4706811054739454944 | beae0c48915f8420 | 191ffff6
    618200668902031424 | 8944a3ba5d08040  | 2a3ffff0
   5074043922812601024 | 466aa01079f982c0 | 7effffee
  -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea
  -4597800992953433792 | c031545b6b128140 | b1dfffea
   8563040839807173408 | 76d608465dde8320 | 7d9fffe6
   6092569112843158816 | 548d27540c888520 | 6f9fffe6
  -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5
  -1786712428165627488 | e7345283536981a0 | 73ffffe5
  -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8
     88426174078092128 | 13a272306c58360  | b57fffd8
  -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From

Tom Lane

Date:

06 December 2017, 20:14:24

"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> If hashint8() is ultimately invoked by TupleHashTableHash() in execGroups.c,
> it might be magnifying the difficulties here.  The least significant bits,
> which are used as the bucket number in simplehash.h, are not very well
> distributed:

> select val, to_hex(val), to_hex(hashint8(val)) from temp_f_03 limit 15 ;
>           val          |      to_hex      |  to_hex
> ----------------------+------------------+----------
>    4444319256653758784 | 3dad64d121468140 | 805ffffe
>     554179993563924608 | 7b0d7c49a018880  | 84dffffb
>   -3383965646518123872 | d109bd2c6b2982a0 | 9c3ffff7
>   -4706811054739454944 | beae0c48915f8420 | 191ffff6
>     618200668902031424 | 8944a3ba5d08040  | 2a3ffff0
>    5074043922812601024 | 466aa01079f982c0 | 7effffee
>   -8783188184262212928 | 861bd8e1b9a482c0 | a6bfffea
>   -4597800992953433792 | c031545b6b128140 | b1dfffea
>    8563040839807173408 | 76d608465dde8320 | 7d9fffe6
>    6092569112843158816 | 548d27540c888520 | 6f9fffe6
>   -7313351060369079936 | 9a81c1f558f98180 | 68ffffe5
>   -1786712428165627488 | e7345283536981a0 | 73ffffe5
>   -6153596242570280896 | aa9a08d20e6b8040 | ac3fffd8
>      88426174078092128 | 13a272306c58360  | b57fffd8
>   -5305589938458295680 | b65ec20faa4e8280 | ba9fffd3


Checking out hashint8() on random data shows no such obvious fault.
You've apparently got a data set that exposes a weakness in hashint8,
but it's not very clear what that is.

In any case, the hashtable code needs to not fall over in the
presence of a lot of collisions, regardless of the exact reason
for there being a lot.

            regards, tom lane

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

06 December 2017, 20:19:23

On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
> Checking out hashint8() on random data shows no such obvious fault.
> You've apparently got a data set that exposes a weakness in hashint8,
> but it's not very clear what that is.

It's intentionally designed to cause problems afaict:

http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com


> In any case, the hashtable code needs to not fall over in the
> presence of a lot of collisions, regardless of the exact reason
> for there being a lot.

Yes, we need to be more resilient about it. Working on a patch.

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

06 December 2017, 22:04:56

On 12/06/17 12:19, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
> 
> It's intentionally designed to cause problems afaict:
> 
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com

Definitely not intentionally.  My wording there could have been better;
perhaps "we expect to get lots of duplicates in that column and we then
need to find the unique values".  Eventually, a substantial number of
the duplicates will be deleted, but that's further along in the process.

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

06 December 2017, 22:32:24

On 12/05/2017 11:01 PM, Tomas Vondra wrote:
> 
> Based on the initial discussion, the problem here is twofold.
> 
> Firstly, the code assumes that if it gets certain number of bucket
> collisions (essentially, the initial bucket and certain number of
> neighboring buckets already being full), making the table larger is
> guaranteed to reduce the number of collisions.
> 
> Which is obviously untrue for duplicate hash values, but it may also
> happen for distinct hash values that form a chain (think a sequence of
> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
> 

I've managed to construct (by sheer brute force) an example that breaks
simplehash in this way. It triggers the growth by having a sequence of
distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
then trying to insert a value with duplicate hash (say, 4).

Which will trigger the "emptydist" condition (as opposed to
"insertdist", as in the other example).

This means there are very few collisions (no hash value matching more
than 2 distinct values), and all the hash values are "at the beginning
of the range" so growing the hash table can't possible break the hash
sequences.

The generated datasets are quite large (16MB each), so I've placed them
here (along with the ugly C code to generate it):

    https://github.com/tvondra/simplehash-break

Each CSV file contains ~1M of strings with distinct hashes between 0 and
1048576 (essentially, covering about 97.5% of the range). There are
"chains" of up to ~300 consecutive hash values. This works:

    CREATE TABLE hash_text (val text);
    COPY hash_text FROM '/tmp/data1.csv';
    SET work_mem = '1GB';
    SELECT DISTINCT val FROM hash_text;

but this does not:

    CREATE TABLE hash_text (val text);
    COPY hash_text FROM '/tmp/data1.csv';
    COPY hash_text FROM '/tmp/data2.csv';
    SET work_mem = '1GB';
    SELECT DISTINCT val FROM hash_text;

In fact, only a small subset of the second data set should be needed.

FWIW I first tried to do the same thing with bigint values, but no
matter what I did I've been unable to generate a dataset covering more
than ~60% of the range (essentially hashint8 only ever produces 60% of
values from [0,1000000], which likely increases collision rate).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

06 December 2017, 22:35:45

On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
> > 
> > Based on the initial discussion, the problem here is twofold.
> > 
> > Firstly, the code assumes that if it gets certain number of bucket
> > collisions (essentially, the initial bucket and certain number of
> > neighboring buckets already being full), making the table larger is
> > guaranteed to reduce the number of collisions.
> > 
> > Which is obviously untrue for duplicate hash values, but it may also
> > happen for distinct hash values that form a chain (think a sequence of
> > hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).

> I've managed to construct (by sheer brute force) an example that breaks
> simplehash in this way. It triggers the growth by having a sequence of
> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
> then trying to insert a value with duplicate hash (say, 4).

I fail to be excited by this. That's possible for any sort of hashtable
in some way. You might hit issues due to resizing or due to lookup
performance, but you'll hit problem. That's a fairly fundamental issue
of pure hashtables.

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

06 December 2017, 22:57:44


On 12/06/2017 08:35 PM, Andres Freund wrote:
> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>
>>> Based on the initial discussion, the problem here is twofold.
>>>
>>> Firstly, the code assumes that if it gets certain number of bucket
>>> collisions (essentially, the initial bucket and certain number of
>>> neighboring buckets already being full), making the table larger is
>>> guaranteed to reduce the number of collisions.
>>>
>>> Which is obviously untrue for duplicate hash values, but it may also
>>> happen for distinct hash values that form a chain (think a sequence of
>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
> 
>> I've managed to construct (by sheer brute force) an example that breaks
>> simplehash in this way. It triggers the growth by having a sequence of
>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
>> then trying to insert a value with duplicate hash (say, 4).
> 
> I fail to be excited by this. That's possible for any sort of hashtable
> in some way. You might hit issues due to resizing or due to lookup
> performance, but you'll hit problem. That's a fairly fundamental issue
> of pure hashtables.
> 

Eh? I think you misunderstood the issue this triggers. I'm perfectly
fine with poor lookup performance - that's expected and reasonable. But
it actually triggers the same behavior as the already presented example,
i.e. the hash table grows indefinitely (so eventually gets OOM).

I really doubt allocating 3GB of memory (which is when my laptop gives
up and kills the query) to do distinct on 50MB table is a sane behavior.
Even if the table is constructed in so intentionally evil way.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From

Andres Freund

Date:

06 December 2017, 23:00:05


On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>On 12/06/2017 08:35 PM, Andres Freund wrote:
>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>
>>>> Based on the initial discussion, the problem here is twofold.
>>>>
>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>> collisions (essentially, the initial bucket and certain number of
>>>> neighboring buckets already being full), making the table larger is
>>>> guaranteed to reduce the number of collisions.
>>>>
>>>> Which is obviously untrue for duplicate hash values, but it may
>also
>>>> happen for distinct hash values that form a chain (think a sequence
>of
>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>
>>> I've managed to construct (by sheer brute force) an example that
>breaks
>>> simplehash in this way. It triggers the growth by having a sequence
>of
>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>and
>>> then trying to insert a value with duplicate hash (say, 4).
>>
>> I fail to be excited by this. That's possible for any sort of
>hashtable
>> in some way. You might hit issues due to resizing or due to lookup
>> performance, but you'll hit problem. That's a fairly fundamental
>issue
>> of pure hashtables.
>>
>
>Eh? I think you misunderstood the issue this triggers. I'm perfectly
>fine with poor lookup performance - that's expected and reasonable. But
>it actually triggers the same behavior as the already presented
>example,
>i.e. the hash table grows indefinitely (so eventually gets OOM).
>
>I really doubt allocating 3GB of memory (which is when my laptop gives
>up and kills the query) to do distinct on 50MB table is a sane
>behavior.
>Even if the table is constructed in so intentionally evil way.

No, I got that. I just don't think time and memory are that different.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

06 December 2017, 23:38:42


On 12/06/2017 09:00 PM, Andres Freund wrote:
> 
> 
> On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> On 12/06/2017 08:35 PM, Andres Freund wrote:
>>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>>
>>>>> Based on the initial discussion, the problem here is twofold.
>>>>>
>>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>>> collisions (essentially, the initial bucket and certain number of
>>>>> neighboring buckets already being full), making the table larger is
>>>>> guaranteed to reduce the number of collisions.
>>>>>
>>>>> Which is obviously untrue for duplicate hash values, but it may
>> also
>>>>> happen for distinct hash values that form a chain (think a sequence
>> of
>>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>>
>>>> I've managed to construct (by sheer brute force) an example that
>> breaks
>>>> simplehash in this way. It triggers the growth by having a sequence
>> of
>>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>> and
>>>> then trying to insert a value with duplicate hash (say, 4).
>>>
>>> I fail to be excited by this. That's possible for any sort of
>> hashtable
>>> in some way. You might hit issues due to resizing or due to lookup
>>> performance, but you'll hit problem. That's a fairly fundamental
>> issue
>>> of pure hashtables.
>>>
>>
>> Eh? I think you misunderstood the issue this triggers. I'm perfectly
>> fine with poor lookup performance - that's expected and reasonable. But
>> it actually triggers the same behavior as the already presented
>> example,
>> i.e. the hash table grows indefinitely (so eventually gets OOM).
>>
>> I really doubt allocating 3GB of memory (which is when my laptop gives
>> up and kills the query) to do distinct on 50MB table is a sane
>> behavior.
>> Even if the table is constructed in so intentionally evil way.
> 
> No, I got that. I just don't think time and memory are that different.
>
It's one thing when the hash table takes longer to lookup something or
when it consumes a bit more memory. Say, ~2x more than needed, give or
take. I'm perfectly fine with that, particularly when it's a worst-case
evil data set like this one.

But it's a very different situation when it consumes tens of GBs of
memory (essentially building the largest possible hash table, despite
the load factor dropping to 0), because the code grows the hash table
assuming it will magically resolve the collisions. And even if the
machine has enough RAM, it's guaranteed to fail we'll try growing the
table beyond that limit.

I don't quite understand why should that be OK? It doesn't seem very
sane to me.

FWIW I've constructed the data sets for two reasons - to convince myself
that my understanding of the simplehash code is correct, and to provide
a data set triggering the other growth condition in simplehash code. My
understanding is that if we stop growing the table after the load factor
drops below some threshold (as TL proposed earlier in this thread), it
should address both of these cases.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

06 December 2017, 23:46:16

On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> It's one thing when the hash table takes longer to lookup something or

longer aka "forever".


> when it consumes a bit more memory. Say, ~2x more than needed, give or
> take. I'm perfectly fine with that, particularly when it's a worst-case
> evil data set like this one.

I think the way to prevent that kind of attack is to add randomization.


> FWIW I've constructed the data sets for two reasons - to convince myself
> that my understanding of the simplehash code is correct, and to provide
> a data set triggering the other growth condition in simplehash code. My
> understanding is that if we stop growing the table after the load factor
> drops below some threshold (as TL proposed earlier in this thread), it
> should address both of these cases.

Yea, I'm not adverse to adding a few stopgaps that break in a less
annoying manner. WAll I'm saying is that I don't think we need to be
super concerned about this specific way of breaking things.

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Stephen Frost

Date:

06 December 2017, 23:57:50

Andres, all,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> > FWIW I've constructed the data sets for two reasons - to convince myself
> > that my understanding of the simplehash code is correct, and to provide
> > a data set triggering the other growth condition in simplehash code. My
> > understanding is that if we stop growing the table after the load factor
> > drops below some threshold (as TL proposed earlier in this thread), it
> > should address both of these cases.
>
> Yea, I'm not adverse to adding a few stopgaps that break in a less
> annoying manner. WAll I'm saying is that I don't think we need to be
> super concerned about this specific way of breaking things.

I think I'm agreeing with Tomas here when I say that we shouldn't have a
system that breaks in the "tries to allocate infinite memory" way when
the load factor drops far below some reasonable threshold and that we
*should* be concerned that we have this problem in the existing
implementation.  It's bad, and we should care about that and fix it.

Running forever in a loop isn't great either, but at least you only
expend at most one CPU to it and can go kill it, assuming we have an
appropriate CHECK_FOR_INTERRUPTS() placement.  Running the box out of
memory can lead to other things *failing* (even entirely reasonable
things!), not just being slower.  That's an important difference.

Thanks!

Stephen

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

07 December 2017, 00:05:22

On 12/06/2017 09:46 PM, Andres Freund wrote:
> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>> It's one thing when the hash table takes longer to lookup something or
> 
> longer aka "forever".
> 

Not necessarily.

The datasets I shared are somewhat extreme in the sense that there are
many contiguous sequences of hash values, but it only takes one such
sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
the hash table may still be perfectly fine for most keys, and only
slightly slower for the keys in the sequence.

> 
>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>> take. I'm perfectly fine with that, particularly when it's a worst-case
>> evil data set like this one.
> 
> I think the way to prevent that kind of attack is to add randomization.
> 

By randomization you mean universal hashing [1], or something else?

[1] https://en.wikipedia.org/wiki/Universal_hashing

In any case, randomization should make it much harder (or pretty much
impossible) to construct datasets that are guaranteed to cause failures
of this kind. No problem with that (although if someone hits the issue,
it will be more difficult to reproduce).

It would still be possible to hit it by chance, but that should be very
unlikely (OTOH the current code assumes the same thing about collisions
and sequences, and here we are.)

> 
>> FWIW I've constructed the data sets for two reasons - to convince myself
>> that my understanding of the simplehash code is correct, and to provide
>> a data set triggering the other growth condition in simplehash code. My
>> understanding is that if we stop growing the table after the load factor
>> drops below some threshold (as TL proposed earlier in this thread), it
>> should address both of these cases.
> 
> Yea, I'm not adverse to adding a few stopgaps that break in a less
> annoying manner. WAll I'm saying is that I don't think we need to be
> super concerned about this specific way of breaking things.
> 

OK, understood.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

07 December 2017, 00:12:25

On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
> On 12/06/2017 09:46 PM, Andres Freund wrote:
> > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> >> It's one thing when the hash table takes longer to lookup something or
> > 
> > longer aka "forever".
> > 
> 
> Not necessarily.

> The datasets I shared are somewhat extreme in the sense that there are
> many contiguous sequences of hash values, but it only takes one such
> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
> the hash table may still be perfectly fine for most keys, and only
> slightly slower for the keys in the sequence.

Meh, we're talking about adversarial attacks here.


> >> when it consumes a bit more memory. Say, ~2x more than needed, give or
> >> take. I'm perfectly fine with that, particularly when it's a worst-case
> >> evil data set like this one.
> > 
> > I think the way to prevent that kind of attack is to add randomization.
> > 
> 
> By randomization you mean universal hashing [1], or something else?

No, adding in a random seed to the hash. It'd be a lot better if we had
a way to provide internal state to the hashfunction, but even just using
hash_combine()ing a random number into a hash would be a lot better than
what we're doing now.

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

07 December 2017, 00:21:17


On 12/06/2017 10:12 PM, Andres Freund wrote:
> On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
>> On 12/06/2017 09:46 PM, Andres Freund wrote:
>>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>>>> It's one thing when the hash table takes longer to lookup something or
>>>
>>> longer aka "forever".
>>>
>>
>> Not necessarily.
> 
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
> 
> Meh, we're talking about adversarial attacks here.
> 
> 
>>>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>>>> take. I'm perfectly fine with that, particularly when it's a worst-case
>>>> evil data set like this one.
>>>
>>> I think the way to prevent that kind of attack is to add randomization.
>>>
>>
>> By randomization you mean universal hashing [1], or something else?
> 
> No, adding in a random seed to the hash. It'd be a lot better if we
> had a way to provide internal state to the hashfunction, but even
> just using hash_combine()ing a random number into a hash would be a
> lot better than what we're doing now.
> 

The universal hashing (of "our" hash value) would have the benefit that
we could change it when growing the hash table. I.e. do this

    h(x) = ((A * hash_any(x) + B) mod P)) mod M

and generate new A, B, P whenever we grow the hashtable. That would
further reduce the probability of either of the failures described in
this thread (I mean, hitting it again after growing the table).

I don't know if it's worth the additional CPU cycles, of course.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

"Todd A. Cook"

Date:

07 December 2017, 00:38:18

On 12/06/17 16:12, Andres Freund wrote:
> 
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
> 
> Meh, we're talking about adversarial attacks here.

Hmmmmm...

I found this problem when I dropped 10.1 into a test environment to see
what would happen.  There was no deliberate attempt to break anything.

-- todd

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Andres Freund

Date:

07 December 2017, 00:40:03

On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
> On 12/06/17 16:12, Andres Freund wrote:
> > 
> > > The datasets I shared are somewhat extreme in the sense that there are
> > > many contiguous sequences of hash values, but it only takes one such
> > > sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
> > > the hash table may still be perfectly fine for most keys, and only
> > > slightly slower for the keys in the sequence.
> > 
> > Meh, we're talking about adversarial attacks here.
> 
> Hmmmmm...
> 
> I found this problem when I dropped 10.1 into a test environment to see
> what would happen.  There was no deliberate attempt to break anything.

Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com

Greetings,

Andres Freund

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From

Tom Lane

Date:

07 December 2017, 01:55:19

Andres Freund <andres@anarazel.de> writes:
> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
>> I found this problem when I dropped 10.1 into a test environment to see
>> what would happen.  There was no deliberate attempt to break anything.

> Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com

I'm confused by Tomas' claim that

>> (essentially hashint8 only ever produces 60% of
>> values from [0,1000000], which likely increases collision rate).

This is directly contradicted by the simple experiments I've done, eg

regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v;
 count
--------
 999879
(1 row)

regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v;
 count
--------
 644157
(1 row)

regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v;
  count
---------
 1048514
(1 row)

It's certainly not perfect, but I'm not observing any major failure to
span the output space.

            regards, tom lane

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

07 December 2017, 02:45:34

On 12/06/2017 11:55 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-12-06 16:38:18 -0500, Todd A. Cook wrote:
>>> I found this problem when I dropped 10.1 into a test environment to see
>>> what would happen.  There was no deliberate attempt to break anything.
> 
>> Read Thomas' message at:
http://archives.postgresql.org/message-id/263b03b1-3e1c-49ca-165a-8ac6751419c4%402ndquadrant.com
> 
> I'm confused by Tomas' claim that
> 
>>> (essentially hashint8 only ever produces 60% of
>>> values from [0,1000000], which likely increases collision rate).
> 
> This is directly contradicted by the simple experiments I've done, eg
> 
> regression=# select count(distinct hashint8(v)) from generate_series(0,1000000::int8) v;
>  count  
> --------
>  999879
> (1 row)
> 
> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,1000000::int8) v;
>  count  
> --------
>  644157
> (1 row)
> 
> regression=# select count(distinct hashint8(v) & (1024*1024-1)) from generate_series(0,10000000::int8) v;
>   count  
> ---------
>  1048514
> (1 row)
> 
> It's certainly not perfect, but I'm not observing any major failure to
> span the output space.
> 

That is not the claim I was making, though. When generating data sets
I've been considering only values where hashint8(v) is between 0 and
1000000 *without* the masking. Otherwise growing the hash table would
break the "chain" of values in the hash table, which would resolve the
issue with SH_GROW_MAX_MOVE.

So you'd have to do something like this:

    select count(distinct hashint8(v))
      from generate_series(0,1000000::int8) v
     where hashint8(v) between 0 and 1024*1024;

but to get some numbers you'd have to also increase the value passed to
generate_series (because it'll filter most of the values).

Which is why I generated the data by an ugly C program, similar to the
attached one. It keeps a small bitmap for generated hashes in the [0,1M]
range, and prints the number of bits set after every 1e9 values processed.

What I do see is this:

    i=1000000000 nset=217666
    i=2000000000 nset=389526
    i=3000000000 nset=525135
    i=4000000000 nset=632305
    i=5000000000 nset=659574
    i=6000000000 nset=659574
    i=7000000000 nset=659574
    i=8000000000 nset=659574
    i=9000000000 nset=659574
    ...

So somewhere between 4e9 and 5e9 we generate 659574 hashes in the range,
and then never increase it further.

I don't know if this an issue in practice, but it seems that for large
hash tables (on bigint), growing the hash table may not be that great
improvement because we're not really doubling the number of slots we can
hit directly. We'll probably find an empty slot nearby, though.

It was just an observation - I only noticed that because I tried to
construct the adversarial data set on bigint, and couldn't make it work
no matter what.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

duplicities-bigint.c

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From

Tomas Vondra

Date:

11 December 2017, 01:09:42

On 12/06/2017 06:19 PM, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
> 
> It's intentionally designed to cause problems afaict:
> 
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com
> 
> 
>> In any case, the hashtable code needs to not fall over in the
>> presence of a lot of collisions, regardless of the exact reason
>> for there being a lot.
> 
> Yes, we need to be more resilient about it. Working on a patch.
> 

I've been playing with this a bit today (sorry for trespassing on your
turf a bit, but curiosity killed the cat), particularly with the tree
basic ideas mentioned in this thread. Let me share some numbers.

1) randomized hashing, i.e. using the extended hash functions and a
random seed

2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
formula, where h is "our" hash, and (a,b) are random and p is a prime.
This can't possibly fix the collisions reported by Todd, because the
result will remain the same. But it should fix "my" chains of values by
spreading them a bit (and the a,b,p values are generated on each grow,
so in the unlikely case where we get a chain, it should disappear after
a growth)

[1] https://en.wikipedia.org/wiki/Universal_hashing

3) disabling growth of the hash table when the fillfactor would drop too
much (e.g. below 0.2)

Very rough patches for these three approaches are attached.

Now, some very rough numbers comparing the impact of those approaches
(and their combinations). The following two tables present timings for
for a couple of datasets - five cases without any collisions (allowing
us to quantify impact on regular datasets), the two datasets generated
by me (one without collisions, one with collisions), and the two issues
reported by Todd.

RH - randomized hashing (seed + extended)
UH - universal hashing
ST - stop growing

             master     RH     UH     ST    RH     UH    ST
    -------------------------------------------------------
    Ok/1        542    540    552    537    0%     2%   -1%
    Ok/2        232    221    242    237   -5%     4%    2%
    Ok/3        161    167    180    161    4%    12%    0%
    Ok/4        276    270    285    274   -2%     3%   -1%
    Ok/5        174    179    193    174    3%    11%    0%
    Tomas/1     543    555    551    533    2%     1%   -2%
    Tomas/2     OOM   2322   3251      -
    Todd/1      OOM    OOM     63     63
    Todd/2      OOM    OOM    OOM     72

This is mostly as expected, although I'm surprised that the universal
hashing seems to fix Todd's first reproducer. The "stop growing"
approach technically fixes "my" dataset, but it takes so much time I
haven't timed it.

In terms of slowdown (compared to master), I'd say RH and ST are mostly
neutral. Universal hashing has significant impact in some cases, but I
suppose that might be fixed by using a slightly different scheme
(instead of using simple modulo arithmetic).

Now, let's see on combinations of the approaches (broken into two
tables, so that it doesn't get mangled by mail clients):

             master  RH+UH  RH+ST  UH+ST  RH+UH+ST
    Ok/1        542    553    545    549       558
    Ok/2        232    241    234    241       250
    Ok/3        161    180    170    180       187
    Ok/4        276    274    270    285       290
    Ok/5        174    187    182    193       198
    Tomas/1     543    571    561    554       570
    Tomas/2       -   2380   2340   3130      2380
    Todd/1        -     61     65     63        64
    Todd/2        -      -     90     91        93

              RH+UH   RH+ST  UH+ST  RH+UH+ST
    Ok/1         2%      1%     1%        3%
    Ok/2         4%      1%     4%        8%
    Ok/3        12%      6%    12%       16%
    Ok/4        -1%     -2%     3%        5%
    Ok/5         7%      5%    11%       14%
    Tomas/1      5%      3%     2%        5%

It seems only the "stop growth" has to be part of the solution, as it's
included in any combination fixing all cases shared in this thread. I'd
say (randomized hashing + stop growing) seems like the best option.

FWIW I do agree the data sets shared in this thread are pretty extreme
and it doesn't make much sense to slow the regular cases. I'll be
perfectly happy if we stop the OOM, making those cases fast is a bonus.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

On 2018-01-26 18:48:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> >> FWIW I do agree the data sets shared in this thread are pretty extreme
> >> and it doesn't make much sense to slow the regular cases. I'll be
> >> perfectly happy if we stop the OOM, making those cases fast is a bonus.
> 
> > Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
> > and so something better in 11. And then later possibly backpatch if
> > we've grown some confidence?
> 
> +1.  I'd like to see some response to this included in 10.2, and time
> grows short for that.

Here are two patches that I think we want for 10.2, and the start of one
that I think we want for master.  0002 is needed because otherwise the
lack of extra growth leads to noticeably worse performance when filling
an underestimated a coordinator hash table from the workers - turns out
our hash combine (and most hash combines) let a lot of clustering
survive. By adding a final hashing round the bit perturbation is near
perfect.  The commit messages need to be polished a bit, but other than
that I think these are reasonable fixes. Plan to push by Monday evening
at the latest.

The third patch is a version of the random IV discussed in this
thread. I do think we want to add usage of the extended hash functions,
as prototyped by Tomas, as that actually helps to fix issues with actual
hash conflicts. But we additionally need a fallback path for types
without extended hashtables, and the random IV is a good idea
nonetheless.  There's no ABI difference in my patch, so I think this is
actually something we could backpatch. But I don't think it's urgent, so
I'm not planning to do that for 10.2.  The one thing that could confuse
people is that it can lead to output order changes from run to run - I
think that's actually good, nobody should rely on hashagg etc output
being stable, but it might be a bit much in a stable release?

In my tests this solves the worst performance issues in Todd's case,
Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18
(which showcases the underestimated worker hashtable into leader
hashtable issue).

Greetings,

Andres Freund

On 2018-01-29 13:20:25 -0500, Todd A. Cook wrote:
> I applied 0001 and 0002 to REL_10_STABLE at 1b2a3860d3ea81825e9bbad2c7dbf66db87445c1
> and got the following build error:
> 
>     execGrouping.c:26:29: fatal error: utils/hashutils.h: No such file or directory
>      #include "utils/hashutils.h"

Yea, there's some minor differences from master to the 10 branch,
needing to backport one commit beforehand. Attached are three patches
applying to 10.

Greetings,

Andres Freund

On 01/30/18 14:34, Andres Freund wrote:
> On 2018-01-30 13:57:44 -0500, Todd A. Cook wrote:
>> Out of curiosity, I then modified hashint8() as previously described.
>> With that change, run time dropped to 11 minutes (also average of 3
>> runs).
>>
>> FWIW, the data values in these sets are sort-of random (where I can't
>> explain the "sort-of" in a public forum), but strongly biased towards
>> negative infinity.  Starting again from scratch, we could probably remove
>> the bias, but we have 28-30 billion of these things collected over the
>> last 14 years (starting in PostgreSQL 7.4) and it's kinda tough to change
>> directions at this point...
> 
> FWIW, you could just create a different hash opclass and use it for
> those queries...

Hmmm, that's intriguing; I'll look into it.  Thanks! :)

-- todd