Sphinx indexing problem - Mailing list pgsql-novice

From Mladen Gogala
Subject Sphinx indexing problem
Date
Msg-id 4BF9ADF2.7030200@vmsinfo.com
Whole thread Raw
Responses Re: Sphinx indexing problem  (Joshua Tolley <eggyknap@gmail.com>)
List pgsql-novice
I am trying to create a Sphinx index on a fairly large Postgres table.
My
problem is the fact that the Postgres API is trying to put the entire
result set into the memory:


[root@medo etc]# ../bin/indexer
--all
  Sphinx 0.9.9-release (r2117)
  Copyright (c) 2001-2009, Andrew Aksyonoff

  using config file
'/usr/local/etc/sphinx.conf'...
  indexing index 'test1'...
  E<font
 color="#ff0000">RROR: index 'test1': sql_query: out of memory for
query result
   (DSN=pgsql://news:***@medo:5432/news).
  total 0 docs, 0 bytes
  total 712.593 sec, 0 bytes/sec, 0.00
docs/sec
  total 0 reads, 0.000 sec, 0.0 kb/call avg,
0.0 msec/call avg
  total 0 writes, 0.000 sec, 0.0 kb/call avg,
0.0 msec/call avg


Corresponding log entries on the Postgres side are:

STATEMENT:  SELECT
segment_id,air_date,start_time,end_time,source_type,
  market_name,station_name,program_name,
content_text FROM news_segments
  LOG:  unexpected EOF on client connection
  LOG:  unexpected EOF on client connection
  LOG:  unexpected EOF on client connection


The Postgres message isn't exactly helpful, but given the
circumstances, it can't be more helpful. The problem is on the client
side. The table I am using is pretty large and has 14.3 million rows:

news=> select count(*) from
news_segments;
    count  
  ----------
   14366286
  (1 row)


Time: 233759.639 ms


Is there anything I can do to prevent the API from attempting to put
the
entire query result in memory?  I can partition the table, create
separate indexes and merge them, but that is a large unnecessary
maintenance. I also suspect that the other queries with a large result
set will start to fail.

I temporarily solved my problem by using "range query" option offered
by
sphinx:


sql_query_range = \
                  SELECT
min(segment_id),max(segment_id) FROM news_segments
          sql_range_step=10000
         

sql_query                              
=\ 
                  SELECT
  segment_id,air_date,start_time,end_time,source_type,
\

                      
market_name,station_name,program_name,
  segment_text \

                      
FROMnews_segments \ 

                      
WHERE
segment_id>=$start and segment_id<$end


Segment_id is a numeric field and the query will be executed many
times,
which is less than optimal. It does make the thing work, though. Would
it
be possible to set maximum memory for the query result caching from the

API itself? How can I increase the maximum memory size used by the
client API?

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

pgsql-novice by date:

Previous
From: Mladen Gogala
Date:
Subject: Re: Transferring Data between databases
Next
From: Joshua Tolley
Date:
Subject: Re: Sphinx indexing problem