Re: Receiving many more rows than expected - Mailing list pgsql-general

From Vincent de Phily
Subject Re: Receiving many more rows than expected
Date
Msg-id 1755004.P8tzSYzpmk@moltowork
Whole thread Raw
In response to Re: Receiving many more rows than expected  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Receiving many more rows than expected  (David G Johnston <david.g.johnston@gmail.com>)
Re: Receiving many more rows than expected  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
On Friday 09 May 2014 06:52:33 Adrian Klaver wrote:
> On 05/09/2014 05:36 AM, Vincent de Phily wrote:
> > On Friday 09 May 2014 07:01:32 Tom Lane wrote:
> >> Vincent de Phily <vincent.dephily@mobile-devices.fr> writes:
> >>> In case it changes anything, this is the uncut (but still anonimized)
> >>>
> >>> function:
> >>>      query = """UPDATE foo SET processing = 't' WHERE id IN
> >>>
> >>>             (SELECT id FROM foo WHERE processing = 'f' ORDER BY id ASC
> >>>             LIMIT %d
> >>>
> >>>              FOR UPDATE)
> >>>
> >>>             RETURNING *""" % (conf_getint('DEFAULT', 'push_count',
> >>>             5000),)
> >>
> >> Well, of course this view of things exposes a relevant failure mode
> >> you hadn't mentioned: maybe sometimes the conf_getint() call returns
> >> something other than 5000?
> >
> > True. But I've commented already that I'd be very surprised (and wouldn't
> > know how to begin) if that value was faulty (even though it would explain
> > things nicely), because
> > * It is parsed once at program start (using python's ConfigParser library)
>
> What is parsed?

The ini-style config files which contain the 'push_count' setting in the
'DEFAULT' section :

def conf_getint(section, option, default='required'):
    return conf_get(section, option, default, lambda(c):int(c))
def conf_get(section, option, default='required', treat=lambda(c):c):
    try:
        return treat(CONFIG.get(section, option))
    except Exception, e:
        if default == 'required':
            raise Exception("Can't get required value for %s %s: %s" %
(section, option, e))
        else:
            LOG.warning("Using default value %s for %s %s: %s", default,
section, option, e)
            return default

if __name__ == '__main__':
    # [...]
    configfiles = sys.argv[1:]
    CONFIG = ConfigParser.SafeConfigParser()
    CONFIG.read(configfiles)


> > * It has the correct value of 5000 in most cases (as demonstrated by the
> >    frequency of number of rows returned)
>
> Yes, but those are not the ones of interest.

In know, but it shows that I'm not always getting a wild value to begin with.
I'm getting the expected/configured/default value of 5000, not some overflowed
integer nonsense.


> > * There is no sign that I exited the loop (and therefore got the
> > opportunity>
> >    to change the value of the query) before I start receiving overlong
> >    results.
> Not sure I follow, You are passing a function as a parameter, it
> would be re-run each time the query was run in:
>
> curs.execute(query)

No, curs.execute(query) does not re-run conf_getint(). The "query" variable is
a standard python string that already got formated. And since that formating
is done outside the loop, I need to exit the loop in order to re-format the
query string with a possibly messed-up value.

But the only way to exit-and-reenter that loop is to get an exception, which
would show up in the logs. And since I don't see an exception in the logs at
those times, I must have stayed in that loop, with the same value of "query" I
have had since the begining.


> FYI, the psycopg2 docs recommend you not use the parameter passing
> method above as it is susceptible to SQL injection:
>
> http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries

I know, but
* The fact that I'm using %d, that I coerced my parameter to be an int, and
  that this parameter is not user-controled make things safe.
* I do not want to reformat and refetch the value each time I run execute(),
  as my query is the same for the whole lifetime of the program, so it would
  be a waste of CPU.
* I could use psycopg's mogrify() instead of python's %, but there's simply no
  need in this case. I promess that I do when it's needed :)



Thanks to all for taking an interest so far, this bug is... weird.


--
Vincent de Phily


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Receiving many more rows than expected
Next
From: Vincent de Phily
Date:
Subject: Re: Receiving many more rows than expected