Thread: LDAP authentication fails with concurrent create extensions

LDAP authentication fails with concurrent create extensions

From
Greg k
Date:
We are using postgresql 10.3 on Centos 7.2 with LDAP authentication (samba4 with AD domain controller). We've recently moved to LDAP authentication and are now encountering a problem where some concurrent connections that create extensions in different databases at the same time are failing with a "Can't contact LDAP server" error. The postgres error log contains:

========================================================================================
Apr 10 16:55:27.988 kvmdevdb12 postgres[10901]: [9-1] db=db_test_022,user=fullaccess LOG:  00000: LDAP login failed for user "cn=dev_fullaccess,
OU=StandardUsers,OU=Postgres,OU=ZZ,DC=company,DC=com,DC=XX" on server "server1.ZZ.net server2.ZZ.net": Can't contact LDAP server
Apr 10 16:55:27.988 kvmdevdb12 postgres[10901]: [9-2] db=db_test_022,user=fullaccess LOCATION:  CheckLDAPAuth, auth.c:2593
Apr 10 16:55:27.988 kvmdevdb12 postgres[10901]: [10-1] db=db_test_022,user=fullaccess FATAL:  28000: LDAP authentication failed for user "fullaccess"
Apr 10 16:55:27.989 kvmdevdb12 postgres[10901]: [10-2] db=db_test_022,user=fullaccess DETAIL:  Connection matched pg_hba.conf line 12: "host       all
           +systemlogin 0.0.0.0/0               ldap ldapserver="server1.ZZ.net server2.ZZ.net" ldapprefix="cn=dev_" ldapsuffix=",OU=StandardUsers,OU=Postgres,OU=ZZ,DC=company,DC=com,DC=XX""
Apr 10 16:55:27.989 kvmdevdb12 postgres[10901]: [10-3] db=db_test_022,user=fullaccess LOCATION:  auth_failed, auth.c:328
========================================================================================

The postgres version is "PostgreSQL 10.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit" although we get similar failures with Postgres 9.5.9.

The problem is reproducible with this bash script and assumes the user password is in .pgpass:

========================================================================================
#!/bin/bash

set -u

declare -r NUM_DATABASES=50
declare -r NUM_CREATE_EXTENSIONS=1

declare -r DBHOST=kvmdevdb12
declare -r PORT=5437
declare -r USER=fullaccess

declare -r EXTENSION_NAME="btree_gin"
#declare -r EXTENSION_NAME="pgcrypto"
#declare -r EXTENSION_NAME="pg_buffercache"
#declare -r EXTENSION_NAME="tablefunc"

declare -r LOG=/tmp/drop_create.log
declare -r SIGNAL_FILE=/tmp/start_drop_create.flg
declare -r ERROR_SIGNAL_FILE=/tmp/connect_error


function get_now()
{
    date +'%Y-%m-%d %H:%M:%S.%N'
}


function create_extension()
{
    local -r DBNAME=$1

    psql -qt -p $PORT -h $DBHOST -d $DBNAME -U $USER -c "DROP EXTENSION IF EXISTS $EXTENSION_NAME; CREATE EXTENSION $EXTENSION_NAME SCHEMA extensions"
    if [[ $? != 0 ]] ; then
        echo "$(get_now): *** Error: could not create extension \"$EXTENSION_NAME\" in database \"$DBNAME\" for user \"$USER\"." | tee -a $LOG
        touch $ERROR_SIGNAL_FILE
        exit 1
    fi

}


function drop_create_db()
{
    local -r DBNAME=$1

    psql -qt -p $PORT -h $DBHOST -d postgres -U $USER -c "DROP DATABASE IF EXISTS $DBNAME"
    psql -qt -p $PORT -h $DBHOST -d postgres -U $USER -c "CREATE DATABASE $DBNAME"
        if [[ $? != 0 ]] ; then
        echo "$(get_now): *** Error: could not create database \"$DBNAME\"." | tee -a $LOG
        touch $ERROR_SIGNAL_FILE
        exit 1
    fi
    psql -qt -p $PORT -h $DBHOST -d $DBNAME -U $USER -c "CREATE SCHEMA extensions"
    if [[ $? != 0 ]] ; then
        echo "$(get_now): *** Error: could not create schema for \"$DBNAME\"." | tee -a $LOG
        touch $ERROR_SIGNAL_FILE
        exit 1
    fi

    while [[ ! -f $SIGNAL_FILE ]]; do
        :
    done;

    for i in $(seq 1 $NUM_CREATE_EXTENSIONS); do
        create_extension $DBNAME
    done
}

echo "$(get_now): Postgres version: $(psql -qt -p $PORT -h $DBHOST -d postgres -U $USER -c 'SELECT VERSION()')"
echo

echo "$(get_now): Setting up $NUM_DATABASES background jobs..."

rm -f $SIGNAL_FILE
rm -f $ERROR_SIGNAL_FILE

declare DBNAME=""
declare NUM=1
while [[ 1 ]]; do
    echo "$(get_now): Creating \"$DBHOST:$PORT/$DBNAME\"..."  >> $LOG
    DBNAME=$(printf "db_test_%03d" $NUM)
    drop_create_db $DBNAME &
    if [[ $NUM -ge $NUM_DATABASES ]]; then
        break
    fi
    NUM=$(($NUM + 1))
    echo >> $LOG
done

echo "$(get_now): Signal the $NUM_DATABASES background jobs to start creating extensions..."
touch $SIGNAL_FILE

echo "$(get_now): Waiting for all the create extensions to finish..."
echo
wait
echo

if [[ -f $ERROR_SIGNAL_FILE ]]; then
    echo "$(get_now): THERE WERE ERRORS."
    echo
fi
echo "$(get_now): *** Finished." | tee -a $LOG

========================================================================================


I did some simple debugging into 'openldap' and could get this debug information about the failure:

========================================================================================
ldap_init: trying /var/lib/pgsql/ldap/etc/openldap/ldap.conf
ldap_init: HOME env is /var/lib/pgsql
ldap_init: trying /var/lib/pgsql/ldaprc
ldap_init: trying /var/lib/pgsql/.ldaprc
ldap_init: trying ldaprc
ldap_init: LDAPCONF env is NULL
ldap_init: LDAPRC env is NULL
ldap_create
ldap_simple_bind_s
ldap_sasl_bind_s
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP server1.om.net:389
ldap_new_socket: 6
ldap_prepare_socket: 6
ldap_connect_to_host: Trying 192.168.111.111:389
ldap_pvt_connect: fd: 6 tm: -1 async: 0
attempting to connect:
connect success
ldap_open_defconn: successful
ldap_send_server_request
ldap_result ld 0x1d76380 msgid 1
wait4msg ld 0x1d76380 msgid 1 (infinite timeout)
wait4msg continue ld 0x1d76380 msgid 1 all 1
** ld 0x1d76380 Connections:
* host: server1.om.net  port: 389  (default)
  refcnt: 2  status: Connected
  last used: Tue Apr 10 15:01:01 2018

** ld 0x1d76380 Outstanding Requests:
 * msgid 1,  origid 1, status InProgress
   outstanding referrals 0, parent count 0
  ld 0x1d76380 request count 1 (abandoned 0)
** ld 0x1d76380 Response Queue:
   Empty
  ld 0x1d76380 response count 0
ldap_chkResponseList ld 0x1d76380 msgid 1 all 1
ldap_chkResponseList returns ld 0x1d76380 NULL
ldap_int_select
ldap_int_select returned -1: errno 4                              <====================== ERROR HERE
ldap_unbind
ldap_free_request (origid 1, msgid 1)
ldap_free_connection 1 1
ldap_send_unbind
ldap_free_connection: actually freed
ldap_err2string
========================================================================================

errno 4 is EINTR and the function actually failing is 'poll' inside the os-ip.c function 'ldap_int_select'.

Re: LDAP authentication fails with concurrent create extensions

From
Stephen Frost
Date:
Greetings,

* Greg k (gregg.kay@gmail.com) wrote:
> We are using postgresql 10.3 on Centos 7.2 with LDAP authentication (samba4
> with AD domain controller). We've recently moved to LDAP authentication and
> are now encountering a problem where some concurrent connections that
> create extensions in different databases at the same time are failing with
> a "Can't contact LDAP server" error. The postgres error log contains:

You really shouldn't be using LDAP in an AD environment for
authentication- configure and use Kerberos instead, which is much more
secure than having cleartext passwords seen by the PG server and then
proxied to the LDAP server.

That said, there does appear to be an issue here, thanks for creating a
test case.

Stephen

Attachment

Re: LDAP authentication fails with concurrent create extensions

From
Greg k
Date:
Hi Stephen,

Do I need to raise this as an issue or bug?

Thanks,
Greg

On Tue, 10 Apr 2018 at 21:31 Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Greg k (gregg.kay@gmail.com) wrote:
> We are using postgresql 10.3 on Centos 7.2 with LDAP authentication (samba4
> with AD domain controller). We've recently moved to LDAP authentication and
> are now encountering a problem where some concurrent connections that
> create extensions in different databases at the same time are failing with
> a "Can't contact LDAP server" error. The postgres error log contains:

You really shouldn't be using LDAP in an AD environment for
authentication- configure and use Kerberos instead, which is much more
secure than having cleartext passwords seen by the PG server and then
proxied to the LDAP server.

That said, there does appear to be an issue here, thanks for creating a
test case.

Stephen

Re: LDAP authentication fails with concurrent create extensions

From
Thomas Munro
Date:
On Fri, Apr 20, 2018 at 1:43 PM, Greg k <gregg.kay@gmail.com> wrote:
> Do I need to raise this as an issue or bug?

It seems like we might need a patch on that adds ldap_set_option(ldap,
LDAP_OPT_RESTART, LDAP_OPT_ON) somewhere in auth.c to tell it to
restart on EINTR.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: LDAP authentication fails with concurrent create extensions

From
Thomas Munro
Date:
On Fri, Apr 20, 2018 at 2:09 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Apr 20, 2018 at 1:43 PM, Greg k <gregg.kay@gmail.com> wrote:
>> Do I need to raise this as an issue or bug?
>
> It seems like we might need a patch on that adds ldap_set_option(ldap,
> LDAP_OPT_RESTART, LDAP_OPT_ON) somewhere in auth.c to tell it to
> restart on EINTR.

I'm a bit confused about that; how are you supposed to do that early
enough to cover all its syscalls if you don't have the handle until
after you've connected?  Perhaps CheckLDAPAuth() (or higher up) should
be blocking signals instead, or at least SIGUSR1 (I guess that's what
you're receiving -- can you tell?)

-- 
Thomas Munro
http://www.enterprisedb.com