Thread: Connection function

Connection function

From
Bill House
Date:
Hello all,

I am very new to postgresql and python, but I have written a function to
retrieve some admin data from the server.

I think this is more of a python question rather than a psycopg2
question, but the context is psycopg2.

I was wondering if this is the right place to post it and ask for a
critique?

Thanks,

Bill

Re: Connection function

From
Adrian Klaver
Date:
On 03/24/2012 05:03 PM, Bill House wrote:
> Hello all,
>
> I am very new to postgresql and python, but I have written a function to
> retrieve some admin data from the server.
>
> I think this is more of a python question rather than a psycopg2
> question, but the context is psycopg2.
>
> I was wondering if this is the right place to post it and ask for a
> critique?

I would say yes, though I am not sure what the question is:)?

>
> Thanks,
>
> Bill
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Connection function

From
Bill House
Date:
On 03/24/2012 08:25 PM, Adrian Klaver wrote:
> On 03/24/2012 05:03 PM, Bill House wrote:
>> Hello all,
>>
>> I am very new to postgresql and python, but I have written a function to
>> retrieve some admin data from the server.
>>
>> I think this is more of a python question rather than a psycopg2
>> question, but the context is psycopg2.
>>
>> I was wondering if this is the right place to post it and ask for a
>> critique?
>
> I would say yes, though I am not sure what the question is:)?
>
>>
>> Thanks,
>>
>> Bill
>>
>
>
Thanks,

Sometimes it is helpful to know how many records one is working with.

From my reading in the psycopg2 documentation (2.4.4) and
experimentation, I think that this information is only available if one
uses an unnamed cursor (in which case all the data will have been
fetched to the client) or with an named cursor (but only after all the
data has been fetched).

If there is a large amount of data, this may be problematic.

Am I wrong about this?  If so, please let me know.

Anyway, the way I approached this issue was to count the records I was
interested in and return the value.

The code below works.

I am interested in a critique of it from more experienced persons.  Are
there more elegant or efficient ways to do what I have done?

In particular, I am interested in:
    1) the python string substitution technique I used
    2) the choice of the postgresql source of data I was seeking (the
record count)
    3) if there was an existing function I overlooked which does the
same thing (in psycopg2 or postgresql)

Regarding # 2 above, I had been relying on the table:
pg_stat_user_tables and the n_live_tup value.  This item has shown the
correct count for many days.  Today, after updating my machines and
rebooting, the count is 0 and I am not really sure why.  Records are
still there; and even though I have selected data from the database, the
count is still 0.

So I took the more direct but time consuming approach of just counting
the records in the table.

This will be better anyway because many times one will not be selecting
all the records in a table, so this function can be expanded to include
a parameter for the selection criteria so that that identical criteria
may be used again in the work portion of any operation.

Thanks in advance for your input.

Bill

code (indent 2):
-------------------------------------------------------
#!/usr/bin/python
# z_psycopg2_row_cnt7f.py
# A program to illustrate the methods of creating or supplying runtime
variables
# to functions using psycopg2 functions

import sys
import os

import psycopg2
import psycopg2.extras
import psycopg2.extensions

#for string generator
import string
import random

con = None

def gen_random_str(str_len):
  """generates a random string str_len long -
  randomly chooses and assembles a number of characters from the digits
and the upper and lower case alphabet """
  rand_str = ''.join(random.choice(string.ascii_uppercase +
string.digits + string.ascii_lowercase) for x in range(str_len))
  return rand_str

def rec_cnt_func(conn, table_name):
  """Count the records selected from the table, right now all of them"""
  #Generate a random string to use as a curson name (short right now for
debugging purposes)
  #The commands following will use this string
  rand_curs_name = gen_random_str(3)
  #Substitute data into a command string and execute
  #create the cursor
  exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" %
(rand_curs_name,)
  #count the records in the table
  exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name,
table_name)
  #fetch the data
  exec "row = %s.fetchone()" % (rand_curs_name,)
  #All done now, throw the cursor away
  exec "%s.close()" % (rand_curs_name,)
  #return the data to the calling program
  return row[0]


def main():
  #supply your connection string here
  current_dsn = "host=xx"

  try:
    con = psycopg2.connect(current_dsn)

    #supply your table name here
    table_name_in = "xxxx"

    #Call the function which will retrieve the data from postgresql
    rec_cnt_val = rec_cnt_func(con, table_name_in)

    #print the formatted data in an informative way
    print "Records in table: {}: {:> 20,}".format(table_name_in,
rec_cnt_val)

  except psycopg2.DatabaseError, e:
    print "psycopg2 reports error: {}".format(e)
    sys.exit(1)

  finally:

    if con:
      con.close()
    return

if __name__ == '__main__':
  main()
-----------------------------------------------------
end code

Re: Connection function

From
Adrian Klaver
Date:
On 03/25/2012 08:14 AM, Bill House wrote:
> On 03/24/2012 08:25 PM, Adrian Klaver wrote:
>> On 03/24/2012 05:03 PM, Bill House wrote:
>>> Hello all,
>>>
>>> I am very new to postgresql and python, but I have written a function to
>>> retrieve some admin data from the server.
>>>
>>> I think this is more of a python question rather than a psycopg2
>>> question, but the context is psycopg2.
>>>
>>> I was wondering if this is the right place to post it and ask for a
>>> critique?
>>
>> I would say yes, though I am not sure what the question is:)?
>>
>>>
>>> Thanks,
>>>
>>> Bill
>>>
>>
>>
> Thanks,
>
> Sometimes it is helpful to know how many records one is working with.
>
>  From my reading in the psycopg2 documentation (2.4.4) and
> experimentation, I think that this information is only available if one
> uses an unnamed cursor (in which case all the data will have been
> fetched to the client) or with an named cursor (but only after all the
> data has been fetched).
>
> If there is a large amount of data, this may be problematic.
>
> Am I wrong about this?  If so, please let me know.

See comments in line.

>
> Anyway, the way I approached this issue was to count the records I was
> interested in and return the value.
>
> The code below works.
>
> I am interested in a critique of it from more experienced persons.  Are
> there more elegant or efficient ways to do what I have done?
>
> In particular, I am interested in:
>      1) the python string substitution technique I used

That works, though it also possible to use dictionaries for the Psycopg
strings see here:

http://initd.org/psycopg/docs/usage.html#query-parameters

I find the dict form easier to use for many parameters because it is
self documenting.


The rest of the string substitution seems to be tied to using exec which
I do not think is necessary, see below for more.


>      2) the choice of the postgresql source of data I was seeking (the
> record count)
>      3) if there was an existing function I overlooked which does the
> same thing (in psycopg2 or postgresql)
>
> Regarding # 2 above, I had been relying on the table:
> pg_stat_user_tables and the n_live_tup value.  This item has shown the
> correct count for many days.  Today, after updating my machines and
> rebooting, the count is 0 and I am not really sure why.  Records are
> still there; and even though I have selected data from the database, the
> count is still 0.

The above are system tables/views used by the statistics mechanism of
Postgres. For a starting point on this look at:

http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html

Short version running Analyze will repopulate the tables.

>
> So I took the more direct but time consuming approach of just counting
> the records in the table.

That is the only way to get a 'true' count due to the MVCC nature of
Postgres. There has been a recent optimization of that, but I cannot
remember what version of Postgres has it.

>
> This will be better anyway because many times one will not be selecting
> all the records in a table, so this function can be expanded to include
> a parameter for the selection criteria so that that identical criteria
> may be used again in the work portion of any operation.
>
> Thanks in advance for your input.
>
> Bill
>
> code (indent 2):
> -------------------------------------------------------
> #!/usr/bin/python
> # z_psycopg2_row_cnt7f.py
> # A program to illustrate the methods of creating or supplying runtime
> variables
> # to functions using psycopg2 functions
>
> import sys
> import os
>
> import psycopg2
> import psycopg2.extras
> import psycopg2.extensions
>
> #for string generator
> import string
> import random
>
> con = None
>
> def gen_random_str(str_len):
>    """generates a random string str_len long -
>    randomly chooses and assembles a number of characters from the digits
> and the upper and lower case alphabet """
>    rand_str = ''.join(random.choice(string.ascii_uppercase +
> string.digits + string.ascii_lowercase) for x in range(str_len))
>    return rand_str
>
> def rec_cnt_func(conn, table_name):
>    """Count the records selected from the table, right now all of them"""
>    #Generate a random string to use as a curson name (short right now for
> debugging purposes)
>    #The commands following will use this string
>    rand_curs_name = gen_random_str(3)
>    #Substitute data into a command string and execute
>    #create the cursor
>    exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" %
> (rand_curs_name,)
>    #count the records in the table
>    exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name,
> table_name)

Not sure why you are using exec? Why not?:

cur =
conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name)

Same for below.

>    #fetch the data
>    exec "row = %s.fetchone()" % (rand_curs_name,)
>    #All done now, throw the cursor away
>    exec "%s.close()" % (rand_curs_name,)
>    #return the data to the calling program
>    return row[0]
>
>
> def main():
>    #supply your connection string here
>    current_dsn = "host=xx"
>
>    try:
>      con = psycopg2.connect(current_dsn)
>
>      #supply your table name here
>      table_name_in = "xxxx"
>
>      #Call the function which will retrieve the data from postgresql
>      rec_cnt_val = rec_cnt_func(con, table_name_in)
>
>      #print the formatted data in an informative way
>      print "Records in table: {}: {:>  20,}".format(table_name_in,
> rec_cnt_val)
>
>    except psycopg2.DatabaseError, e:
>      print "psycopg2 reports error: {}".format(e)
>      sys.exit(1)
>
>    finally:
>
>      if con:
>        con.close()
>      return
>
> if __name__ == '__main__':
>    main()
> -----------------------------------------------------
> end code
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Connection function

From
Bill House
Date:
On 03/25/2012 02:12 PM, Adrian Klaver wrote:
> On 03/25/2012 08:14 AM, Bill House wrote:
>> On 03/24/2012 08:25 PM, Adrian Klaver wrote:
>>> On 03/24/2012 05:03 PM, Bill House wrote:
>>>> Hello all,
>>>>
>>>> I am very new to postgresql and python, but I have written a
>>>> function to
>>>> retrieve some admin data from the server.
>>>>
>>>> I think this is more of a python question rather than a psycopg2
>>>> question, but the context is psycopg2.
>>>>
>>>> I was wondering if this is the right place to post it and ask for a
>>>> critique?
>>>
>>> I would say yes, though I am not sure what the question is:)?
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bill
>>>>
>>>
>>>
>> Thanks,
>>
>> Sometimes it is helpful to know how many records one is working with.
>>
>>  From my reading in the psycopg2 documentation (2.4.4) and
>> experimentation, I think that this information is only available if one
>> uses an unnamed cursor (in which case all the data will have been
>> fetched to the client) or with an named cursor (but only after all the
>> data has been fetched).
>>
>> If there is a large amount of data, this may be problematic.
>>
>> Am I wrong about this?  If so, please let me know.
>
> See comments in line.
>
>>
>> Anyway, the way I approached this issue was to count the records I was
>> interested in and return the value.
>>
>> The code below works.
>>
>> I am interested in a critique of it from more experienced persons.  Are
>> there more elegant or efficient ways to do what I have done?
>>
>> In particular, I am interested in:
>>      1) the python string substitution technique I used
>
> That works, though it also possible to use dictionaries for the
> Psycopg strings see here:
>
> http://initd.org/psycopg/docs/usage.html#query-parameters
>
> I find the dict form easier to use for many parameters because it is
> self documenting.
>
>
> The rest of the string substitution seems to be tied to using exec
> which I do not think is necessary, see below for more.
>
>
>>      2) the choice of the postgresql source of data I was seeking (the
>> record count)
>>      3) if there was an existing function I overlooked which does the
>> same thing (in psycopg2 or postgresql)
>>
>> Regarding # 2 above, I had been relying on the table:
>> pg_stat_user_tables and the n_live_tup value.  This item has shown the
>> correct count for many days.  Today, after updating my machines and
>> rebooting, the count is 0 and I am not really sure why.  Records are
>> still there; and even though I have selected data from the database, the
>> count is still 0.
>
> The above are system tables/views used by the statistics mechanism of
> Postgres. For a starting point on this look at:
>
> http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html
>
> Short version running Analyze will repopulate the tables.
>
>>
>> So I took the more direct but time consuming approach of just counting
>> the records in the table.
>
> That is the only way to get a 'true' count due to the MVCC nature of
> Postgres. There has been a recent optimization of that, but I cannot
> remember what version of Postgres has it.
>
>>
>> This will be better anyway because many times one will not be selecting
>> all the records in a table, so this function can be expanded to include
>> a parameter for the selection criteria so that that identical criteria
>> may be used again in the work portion of any operation.
>>
>> Thanks in advance for your input.
>>
>> Bill
>>
>> code (indent 2):
>> -------------------------------------------------------
>> #!/usr/bin/python
>> # z_psycopg2_row_cnt7f.py
>> # A program to illustrate the methods of creating or supplying runtime
>> variables
>> # to functions using psycopg2 functions
>>
>> import sys
>> import os
>>
>> import psycopg2
>> import psycopg2.extras
>> import psycopg2.extensions
>>
>> #for string generator
>> import string
>> import random
>>
>> con = None
>>
>> def gen_random_str(str_len):
>>    """generates a random string str_len long -
>>    randomly chooses and assembles a number of characters from the digits
>> and the upper and lower case alphabet """
>>    rand_str = ''.join(random.choice(string.ascii_uppercase +
>> string.digits + string.ascii_lowercase) for x in range(str_len))
>>    return rand_str
>>
>> def rec_cnt_func(conn, table_name):
>>    """Count the records selected from the table, right now all of
>> them"""
>>    #Generate a random string to use as a curson name (short right now
>> for
>> debugging purposes)
>>    #The commands following will use this string
>>    rand_curs_name = gen_random_str(3)
>>    #Substitute data into a command string and execute
>>    #create the cursor
>>    exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" %
>> (rand_curs_name,)
>>    #count the records in the table
>>    exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name,
>> table_name)
>
> Not sure why you are using exec? Why not?:
>
> cur =
> conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name)
>
> Same for below.
>
>>    #fetch the data
>>    exec "row = %s.fetchone()" % (rand_curs_name,)
>>    #All done now, throw the cursor away
>>    exec "%s.close()" % (rand_curs_name,)
>>    #return the data to the calling program
>>    return row[0]
>>
>>
>> def main():
>>    #supply your connection string here
>>    current_dsn = "host=xx"
>>
>>    try:
>>      con = psycopg2.connect(current_dsn)
>>
>>      #supply your table name here
>>      table_name_in = "xxxx"
>>
>>      #Call the function which will retrieve the data from postgresql
>>      rec_cnt_val = rec_cnt_func(con, table_name_in)
>>
>>      #print the formatted data in an informative way
>>      print "Records in table: {}: {:>  20,}".format(table_name_in,
>> rec_cnt_val)
>>
>>    except psycopg2.DatabaseError, e:
>>      print "psycopg2 reports error: {}".format(e)
>>      sys.exit(1)
>>
>>    finally:
>>
>>      if con:
>>        con.close()
>>      return
>>
>> if __name__ == '__main__':
>>    main()
>> -----------------------------------------------------
>> end code
>>
>
>
Thanks, I will do the reading you have suggested.

Regarding your question about why the use of exec: I read where psycopg2
has quite a bit of capability for parameter substitution, but that
functionality is limited to just psycopg2.

I grew to rely on runtime construction, substitution and execution of
commands in another environment and I was trying to mimic that behavior
in a more general way in python.  This way works, I was just wondering
if there was a better way.

Regarding the statistics, I have since learned that the command:

    select reltuples from pg_class where relname = 'your_file_name';

Will give a record count.  I don't know how well it will keep up in a
dynamic environment but it's a start.

Thanks,

Bill




Re: Connection function

From
Adrian Klaver
Date:
On 03/25/2012 06:00 PM, Bill House wrote:
>
>>
> Thanks, I will do the reading you have suggested.
>
> Regarding your question about why the use of exec: I read where psycopg2
> has quite a bit of capability for parameter substitution, but that
> functionality is limited to just psycopg2.

The benefit of using the psycopg2 parameter substitution is it takes
care of escaping strings and avoiding SQL injection. See here for
humorous take on that:

http://xkcd.com/327/

>
> I grew to rely on runtime construction, substitution and execution of
> commands in another environment and I was trying to mimic that behavior
> in a more general way in python.  This way works, I was just wondering
> if there was a better way.

I have been told over the years that using exec is not a good idea. For
more detail:


http://books.google.com/books?id=JnR9hQA3SncC&pg=PA328&lpg=PA328&dq=python++exec+bad&source=bl&ots=Jb4TLr-03u&sig=_Jz5cTdPZq1PeU1vajQgaDIgtgo&hl=en&sa=X&ei=GslvT-q1FuzYiAKI85zDBQ&ved=0CGMQ6AEwCA


Python has a variety of ways to work with strings. I found the following
a good introduction to Python:

http://shop.oreilly.com/product/9780596158071.do

>
> Regarding the statistics, I have since learned that the command:
>
>      select reltuples from pg_class where relname = 'your_file_name';
>
> Will give a record count.  I don't know how well it will keep up in a
> dynamic environment but it's a start.

If you have not already found it, look at:

http://wiki.postgresql.org/wiki/Slow_Counting

Not sure what your use case for the count is but remember when using
named cursors:
"
Named cursors are also iterable like regular cursors. Note however that
before Psycopg 2.4 iteration was performed fetching one record at time
from the backend, resulting in a large overhead. The attribute itersize
now controls how many records are fetched at time during the iteration:
the default value of 2000 allows to fetch about 100KB per roundtrip
assuming records of 10-20 columns of mixed number and strings; you may
decrease this value if you are dealing with huge records."

>
> Thanks,
>
> Bill
>
>
>
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Connection function

From
Joe Abbate
Date:
On 03/25/2012 09:00 PM, Bill House wrote:
> Regarding the statistics, I have since learned that the command:
>
>     select reltuples from pg_class where relname = 'your_file_name';
>
> Will give a record count.  I don't know how well it will keep up in a
> dynamic environment but it's a start.

reltuples "is only an estimate used by the planner."  See
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html

So if you want a more or less accurate count of rows at that given point
in time, then you should not depend on it.

As for the optimization of SELECT count(*) which Adrian had mentioned,
it's a 9.2 feature (as a result of index-only scans).  See
https://wiki.postgresql.org/wiki/Todo under Cache Usage.

Joe

Re: Connection function

From
Federico Di Gregorio
Date:
On 25/03/12 17:14, Bill House wrote:
[snip]
> Sometimes it is helpful to know how many records one is working with.
>
>  From my reading in the psycopg2 documentation (2.4.4) and
> experimentation, I think that this information is only available if one
> uses an unnamed cursor (in which case all the data will have been
> fetched to the client) or with an named cursor (but only after all the
> data has been fetched).
>
> If there is a large amount of data, this may be problematic.

Not at all. You just need to MOVE to the end of the result set and count
how many rows you skipped. At the moment psycopg doesn't use MOVE return
value (that tells you what you want to know) but can just .execute() the
MOVE and access the result on the cursor. Also, we could probably extend
scroll() to return the number of scrolled rows in the future.

> In particular, I am interested in:
>      1) the python string substitution technique I used

Use bound parameters:

http://packages.python.org/psycopg2/usage.html#query-parameters

federico

Re: Connection function

From
Bill House
Date:
Thanks for all the help from everyone; I have a lot of studying to do.

Regards

Bill

On 03/26/2012 02:52 AM, Federico Di Gregorio wrote:
> On 25/03/12 17:14, Bill House wrote:
> [snip]
>> Sometimes it is helpful to know how many records one is working with.
>>
>>  From my reading in the psycopg2 documentation (2.4.4) and
>> experimentation, I think that this information is only available if one
>> uses an unnamed cursor (in which case all the data will have been
>> fetched to the client) or with an named cursor (but only after all the
>> data has been fetched).
>>
>> If there is a large amount of data, this may be problematic.
>
> Not at all. You just need to MOVE to the end of the result set and
> count how many rows you skipped. At the moment psycopg doesn't use
> MOVE return value (that tells you what you want to know) but can just
> .execute() the MOVE and access the result on the cursor. Also, we
> could probably extend scroll() to return the number of scrolled rows
> in the future.
>
>> In particular, I am interested in:
>>      1) the python string substitution technique I used
>
> Use bound parameters:
>
> http://packages.python.org/psycopg2/usage.html#query-parameters
>
> federico
>