Thread: Newbie
I'm Having a hard time with this statement: Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott Homes') AND (Jobs.JobID = ProjectedDates.JobID) Can anyone see why? the record and field names certainly do exist...any help appreciated!
Did you use speech marks around the column names when creating the tables? If so that makes them case sensitive and you'll always need them (unless there all lower case) - Stuart > -----Original Message----- > From: William Winter [SMTP:wilscott@earthlink.net] > Sent: Friday, October 12, 2001 4:29 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Newbie > > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott > Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do exist...any > help > appreciated! > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
"William Winter" <wilscott@earthlink.net> writes: > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do exist...any help > appreciated! It would help if you post the error message you're getting. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Are you getting an error message, or just not getting the results you expected? -Nick -------------------------------------------------------------------------- Nick Fankhauser nickf@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/ > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of William Winter > Sent: Thursday, October 11, 2001 10:29 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Newbie > > > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = > 'Elliott Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do > exist...any help > appreciated! > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Hi. I've been playing with Full Text Indexing for a few days now, and there is something in the behaviour of the queries that I don't fully understand. I have set up a little test database that contains sample job adverts (it's the sample data I had lying around froma different project) Selecting on 1 field returns results blindingly fast, as one would expect when indices are used. However, selecting on 2 fields takes forever. I have done SET ENABLE_SEQSCAN=OFF. Here's output of explain: postgres=> explain select jobs.title from jobs, jobs_description_fti, jobs_title_fti where (jobs_description_fti.string = 'linux' or jobs_title_fti.string = 'linux') and (jobs_description_fti.id = jobs.oid and jobs_title_fti.id = jobs.oid); NOTICE: QUERY PLAN: Nested Loop (cost=200000018.60..200000027.18 rows=1 width=48) -> Merge Join (cost=200000018.60..200000024.31 rows=1 width=32) -> Sort (cost=100000004.09..100000004.09 rows=75 width=16) -> Seq Scan on jobs_description_fti (cost=100000000.00..100000001.75 rows=75 width=16) -> Sort (cost=100000014.51..100000014.51 rows=251 width=16) -> Seq Scan on jobs_title_fti (cost=100000000.00..100000004.51 rows=251 width=16) -> Index Scan using jobs_description_oid_index on jobs (cost=0.00..2.01 rows=1 width=16) EXPLAIN This means, if I am understanding things correctly, that jobs_description_fti is scanned with a sequential scan. That would explain the slowness. Hwever, doing a: explain select jobs.title from jobs, jobs_description_fti where (jobs_description_fti.string = 'linux') and (jobs_description_fti.id = jobs.oid); NOTICE: QUERY PLAN: Nested Loop (cost=0.00..4.04 rows=1 width=20) -> Index Scan using jobs_description_fti_index on jobs_description_fti (cost=0.00..2.01 rows=1 width=4) -> Index Scan using jobs_description_oid_index on jobs (cost=0.00..2.01 rows=1 width=16) yields lightning fast results, as one would expect. Why does selecting from two fields on an "or" basis cause both scans to be sequential? Even when sequential scans are "disabled"? Because of the breakdown of descriptions into thousands of lookup rows in the FTI tables, this is actually slower than doing an index-less "ILIKE" search on both of the fields because of the huge number of records in the lookup tables... Can anyone suggest a way to do a two field "or" match using the FTI and indices? Cheers. Gordan
Gordan Bobic <gordan@bobich.net> writes: > [ why is this slow? ] > explain select jobs.title from jobs, jobs_description_fti, > jobs_title_fti where (jobs_description_fti.string = 'linux' or > jobs_title_fti.string = 'linux') and (jobs_description_fti.id = jobs.oid and > jobs_title_fti.id = jobs.oid); Because the query is wrong. The way you wrote the WHERE, given a match in jobs_description_fti and jobs, *any* jobs_title_fti row with a matching ID will result in an output row. Similarly, given a match in jobs_title_fti and jobs, *any* jobs_description_fti row with a matching ID will produce output. So the system generates what's essentially a doubly nested loop over the insufficiently constrained tables. A correct and practical form of the query would be something like select jobs.title from jobs, jobs_description_fti where jobs_description_fti.string = 'linux' and jobs_description_fti.id = jobs.oid union select jobs.title from jobs, jobs_title_fti where jobs_title_fti.string = 'linux' and jobs_title_fti.id = jobs.oid; One of the not-so-pleasant aspects of SQL is that erroneous queries frequently look like performance problems, because no one waits around for the enormous result set that the query actually generates ... they try to debug the performance problem instead of looking to see if the query requests what they want ... regards, tom lane
[Broken SQL instead of performance issue fixed] It would appear that when I define the index on the FTI table (string and oid) to be unique (which makes sense, since there is little point in having duplicate rows in this case), a lot of inserts fail where they shouldn't. I am guessing that if the insert into the look-up FTI table fails, the insert into the master table fails as well. I can understand that this might be useful for matches where the number of occurences is important, but in this particular application, that is not the case. Before I go and look into modifying the fti function code for my specific purpose, it would be nice to have a confirmation of this behaviour - otherwise it may take me a while to find what I'm looking for. ;-) Another question - there are (as often happens) multiple ways of doing what I want in SQL, but I am not sure what is the fastest and most efficient way of doing it (in theory at least). I want to do a multi-criterion search on the same field (the FTI indexed one), and have separate AND, NOT and OR search terms. AND = "terms that must occur in the text" OR = "terms of which at least one has to occur in the text" NOT = "terms which must not occur in the text" Initially, before FTI, I used a big ILIKE query which worked reasonably well. I should point out that my test bed machine for this is a Pentium 100 MHz with 128 MB of RAM and an IDE disk. My database is expected to be around 50K-100K records, and about 100-200 MB on disk in PostgreSQL files (that's what the disk consumption of the vacuumed database was before FTI). Using the same example data set as before, yhe query was something like: SELECT * FROM Jobs WHERE ( Description ILIKE '%AndTerm1%' AND Description ILIKE '%AndTerm2%' AND Description ILIKE '%AndTerm3%' ... ) AND ( Description ILIKE '%OrTerm1%' OR Description ILIKE '%OrTerm2%' OR Description ILIKE '%OrTerm3%' ... ) AND ( Description NOT ILIKE '%OrTerm1%' AND Description NOT ILIKE '%OrTerm2%' AND Description NOT ILIKE '%OrTerm3%' ... ) This usually returned the required data within 30 seconds or so, after, obviously, doing as sequential search through the database due to the non-anchored ILIKE match. After implementing FTI, the insertion speed has gone through the floor (as expected), but the select speed doesn't seem to be that much greater, even when using the index (string, oid) on the FTI look-up table. On simple queries that only require one or two terms there is a big speed improvement, but on queries with three or more terms, the improvement is not that great. The queries typically return 10 - 200 rows (give or take a bit, depending on the specific query). The queries I am using at the moment to replace the above ILIKE solution are in the form SELECT Jobs.* FROM Jobs, Jobs_Description_FTI WHERE Jobs_Description_FTI.string = $And/Or/NotTerms[$i] AND Jobs_Description_FTI.id = Jobs.oid The AND queries are INTERSECTed together, OR queries and UNIONed together, both are UNIONed, and then the NOT queries are EXCEPTed. In some cases, this has yielded a signifficant improvement in performance, as Tom suggested (thanks for that, it was much appreciated). Sometimes, however, things go the other way. To cut the long story short, I seem to have tracked the problem down to a certain situation. If there is, say, 10K records in the master table, there is about 4M records in the lookup table. This in itself isn't an issue. Queries that return small numbers of records, e.g. SELECT count(*) FROM Jobs_Description_FTI WHERE string = 'linux' (returns ~300 rows) happen more or less instantaneously. However, a very similar query such as: SELECT count(*) FROM Jobs_Description_FTI WHERE string = 'nt' (returns ~30K rows) takes around two-three minutes. I tried doing a SELECT count(*) FROM Jobs WHERE Description ILIKE '%nt%' (returns 11K records out of 12K) and that only takes about 10 seconds or so. SELECT count(*) FROM Jobs WHERE Description ILIKE '% nt %' returns ~800 records out of 12K, which is much more reasonable. Ideally, that should be SELECT count(*) FROM Jobs WHERE Description ~* '.*[!a-z]nt[!a-z].*' or something like that, which yields a similar number of records to the previous query, but is slower. I am fully aware that this is fairly normal under the circumstances, but I need a way of defeating this performance issue. The only way of doing that that I can see at the moment is to: 1) Modify the FTI function to split the text field only at non-alphanumeric characters, and only return whole words, rather than substrings of words. 2) Allow the insert into master table to succeed, even if some of the inserts driven by the trigger fail, and define a unique string-oid index, which would prevent duplicates, thus yielding a smaller lookup table. One of the other things I'm considering is pruning the lookup table duplicates periodically to shrink the table to a more reasonable size. If anyone can suggest other courses of action, I am most interested to hear them. Is there anything in the pipeline for addressing FTI in the next version of PostgreSQL? At the moment, the best average case scenario for my application seems to be just doing an ILIKE or a ~* search, because although it takes a while, it takes a comparatively similar amount of time for most queries, unlike the FTI search which can go away for minutes at a time. Is this another case of my being thick and producing broken SQL? Can anybody think of a different way of doing this that would yield a performance increase? I don't want to believe that doing a ~* unindexed sequential search is the best solution here... Thanks. Gordan
Hi. After looking again at the other email I've sent earlier, I realized that it goes on for far too long, so I'll try to summarize my question more briefly. 1) How can FTI be made to NOT break up words into sub-strings? Most of those are likely to be useless in my particular application. In fact, this feature is why my FTI queries run slower on certain strings than sequential ILIKE scans. 1.1) How do I make FTI only separate the text string into words on non-alphanumeric characters (i.e. [!a-z|!A-Z|!0-9] - I think...). Is that what it already does? 2) How can I make inserts work when enforcing unique records in the FTI table? There is hardly any need for duplicates in the FTI table... At the moment, the search for a particular string can take signifficantly longer with FTI than using the plain ILIKE. FTI also returns 30K matches in the FTI table for only 10K of records in the master table, which is clearly not very useful. Doing a DISTINCT on the FTI table with that string will yield 10K records, indicating that there are about 3 times more records in the FTI table which are just duplicates taking up space. I am NOT prepared to capitulate and use something MS SQL instead. How can I get this to work with PostgreSQL? Who maintains the FTI contrib? Kind regards. Gordan
Gordan Bobic <gordan@bobich.net> writes: > 1) Modify the FTI function to split the text field only at non-alphanumeric > characters, and only return whole words, rather than substrings of words. Sure, if that works better for you. The reason FTI is in contrib is that it's not really ready for prime time (if it were, it'd be in the mainline code). Feel free to hack it to make it match your application better. > Is there anything in the pipeline for addressing FTI in the next > version of PostgreSQL? There is a completely new contrib module 'tsearch' that will probably supersede FTI over time. I haven't looked at it closely myself, though. regards, tom lane
On Thursday 18 Oct 2001 19:59, Tom Lane wrote: > > Is there anything in the pipeline for addressing FTI in the next > > version of PostgreSQL? > > There is a completely new contrib module 'tsearch' that will probably > supersede FTI over time. I haven't looked at it closely myself, though. I've just read the README for it, and it looks like it is a bit closer to what I need. It's amazing how just before each beta release of PostgreSQL, I'd find a feature that I REALLY need, which just happens to be implemented in the next version. I wish other software was like that. ;^) Thanks for the help. Gordan
Without providing the error message returned, we can only guess. One likely issue is "Plotted = False", because all values must be quoted except numbers. Better to use Plotted = 'f'. Keary Suska Esoteritech, Inc. "Leveraging Open Source for a better Internet" > From: "William Winter" <wilscott@earthlink.net> > Organization: EarthLink Inc. -- http://www.EarthLink.net > Reply-To: "William Winter" <wilscott@earthlink.net> > Date: Fri, 12 Oct 2001 03:29:04 GMT > To: pgsql-general@postgresql.org > Subject: [GENERAL] Newbie > > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do exist...any help > appreciated! > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Fri, 12 Oct 2001, William Winter wrote: > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do exist...any help > appreciated! First off, what's the error you're getting? If you're getting a table/field name not found, try double quoting names. If you double quoted (or used an interface that does so for you) the names in mixed case on the create, you'll need to double quote the names in queries from that point forward.
On Fri, 12 Oct 2001 03:29:04 GMT, William Winter <wilscott@earthlink.net> wrote: > I'm Having a hard time with this statement: > > Select Jobs.*, ProjectedDates.* FROM Jobs, ProjectedDates WHERE (Plotted = > False) And (JobName = 'Boulder Valley') AND (BuilderName = 'Elliott Homes') > AND (Jobs.JobID = ProjectedDates.JobID) > > Can anyone see why? the record and field names certainly do exist...any help > appreciated! > > What error message are you getting? Also might be helpful: version schema sample data Idea for new newsgroup: comp.databases.postgresql.help.psychic