Thread: Problem with index using regular expression
Hi, with psycopg 2-2.4.1 I have some issue with incorrect use of indexes using regular expression.
I my case I have create on index on the surname of users.
CREATE INDEX income_surname ON income(substring(income.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$'))
then used
SELECT Distinct On (nodes.id) nodes.ID from nodes inner join income on substring(nodes.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$')
with psql the index works fine, but with psycopg I don't think it's used at all. The index substring has to be identical with the substring in the query. Are there some encoding issues with psycopg?
--
Håvard Wahl Kongsgård
http://havard.security-review.net/
--
Håvard Wahl Kongsgård
http://havard.security-review.net/
On 06/06/11 08:32, Håvard Wahl Kongsgård wrote: > Hi, with psycopg 2-2.4.1 I have some issue with incorrect use of indexes > using regular expression. > I my case I have create on index on the surname of users. > CREATE INDEX income_surname ON income(substring(income.name > <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$')) > > then used > SELECT Distinct On (nodes.id <http://nodes.id>) nodes.ID from nodes > inner join income on substring(nodes.name <http://nodes.name> from > E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name > <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') > > with psql the index works fine, but with psycopg I don't think it's used > at all. The index substring has to be identical with the substring in > the query. Are there some encoding issues with psycopg? psycopg does encode your strings to the backend encoding but this shouldn't be a problem. You can: 1) Send us the code so we can analyze it. 2) Enable PostgreSQL query logging and check the query sent by psycopg. 3) I feel you're doing something wrong with the '\' in the regular expression. Can you should us the Python of the execute() call? federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it There's no certainty - only opportunity. -- V
conn = psycopg2.connect(database="tax_analytical", user="postgres", password="xxxx", host='xxxx')
cur = conn.cursor('nodes_database')
SQL = ("SELECT Distinct On (nodes.id) nodes.ID from nodes inner join income on substring(nodes.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000")
cur.execute(SQL)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=5)
pool.map(network_file.main, cur, 1) # Ensure the chunk size is 1
pool.close()
pool.join()
cur.close()
conn.close()
-Håvard Wahl Kongsgård
On Mon, Jun 6, 2011 at 8:58 AM, Federico Di Gregorio <federico.digregorio@dndg.it> wrote:
On 06/06/11 08:32, Håvard Wahl Kongsgård wrote:> <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$'))
> Hi, with psycopg 2-2.4.1 I have some issue with incorrect use of indexes
> using regular expression.
> I my case I have create on index on the surname of users.
> CREATE INDEX income_surname ON income(substring(income.name
>
> then used
> SELECT Distinct On (nodes.id <http://nodes.id>) nodes.ID from nodes
> inner join income on substring(nodes.name <http://nodes.name> from> E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name> <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$')>psycopg does encode your strings to the backend encoding but this
> with psql the index works fine, but with psycopg I don't think it's used
> at all. The index substring has to be identical with the substring in
> the query. Are there some encoding issues with psycopg?
shouldn't be a problem. You can:
1) Send us the code so we can analyze it.
2) Enable PostgreSQL query logging and check the query sent by psycopg.
3) I feel you're doing something wrong with the '\' in the regular
expression. Can you should us the Python of the execute() call?
federico
--
Federico Di Gregorio federico.digregorio@dndg.it
Studio Associato Di Nunzio e Di Gregorio http://dndg.it
There's no certainty - only opportunity. -- V
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
On 06/06/11 09:03, Håvard Wahl Kongsgård wrote: > substring(nodes.name <http://nodes.name> from > E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name > <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000") Try doubling every '\' because that's a Python escape character too and you want to send to the database the "\\X" sequence, not "\X"; i.e., ... from E'\\\\w.*\\\\s(\\\\w.*)$|\\\\w+\\\\s(\\\\w.*)$') ... Hope this helps, federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it Qu'est ce que la folie? Juste un sentiment de liberté si fort qu'on en oublie ce qui nous rattache au monde... -- J. de Loctra
On 06/06/11 17:07, Federico Di Gregorio wrote: > On 06/06/11 09:03, Håvard Wahl Kongsgård wrote: >> substring(nodes.name <http://nodes.name> from >> E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name >> <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000") > > Try doubling every '\' because that's a Python escape character too and > you want to send to the database the "\\X" sequence, not "\X"; i.e., > > ... from E'\\\\w.*\\\\s(\\\\w.*)$|\\\\w+\\\\s(\\\\w.*)$') ... > It's easier to just put an r at the front of the string. ie r"SELECT Distinct On (nodes.id) nodes.ID from nodes inner join income on substring(nodes.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000"
Thanks, that solved it
On Mon, Jun 6, 2011 at 9:10 AM, Alejandro Dubrovsky <alex.dubrovsky@hitwise.com> wrote:
On 06/06/11 17:07, Federico Di Gregorio wrote:It's easier to just put an r at the front of the string.
> On 06/06/11 09:03, Håvard Wahl Kongsgård wrote:
>> substring(nodes.name <http://nodes.name> from
>> E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') = substring(income.name
>> <http://income.name> from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000")
>
> Try doubling every '\' because that's a Python escape character too and
> you want to send to the database the "\\X" sequence, not "\X"; i.e.,
>
> ... from E'\\\\w.*\\\\s(\\\\w.*)$|\\\\w+\\\\s(\\\\w.*)$') ...
>
ier"SELECT Distinct On (nodes.id) nodes.ID from nodes inner join income onsubstring(nodes.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') =
substring(income.name from E'\\w.*\\s(\\w.*)$|\\w+\\s(\\w.*)$') limit 1000"
--Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg