From py at landanger.fr Mon Jan 1 12:11:55 2007 From: py at landanger.fr (Guest) Date: Mon, 1 Jan 2007 18:11:55 +0100 Subject: [Ferret-talk] search on multiple table In-Reply-To: <20061231160029.GH2583@cordoba.webit.de> References: <208cb31afeac225d17f00cfd13f0bad0@ruby-forum.com> <20061231160029.GH2583@cordoba.webit.de> Message-ID: Thks Jens, It works fine now py -- Posted via http://www.ruby-forum.com/. From henry74 at gmail.com Mon Jan 1 16:47:31 2007 From: henry74 at gmail.com (Hh Hh) Date: Mon, 1 Jan 2007 22:47:31 +0100 Subject: [Ferret-talk] Possible Bug when Creating Indexes Message-ID: <2d62ed3ede9ff2a9fb694cdfe2c45d1e@ruby-forum.com> I'm running: ferret (0.10.9) ruby 1.8.5 (2006-08-25) [i386-mswin32] on Windows XP(SP2) When I create an index as follows: field_infos = FieldInfos.new(:store => :yes, :term_vector => :no, :index => :yes) field_infos.add_field(:id, :index => :untokenized) field_infos.add_field(:subject) field_infos.add_field(:author) field_infos.add_field(:tags, :store => :no) index = field_infos.create_index(THREAD_INDEX_DIR) then try to add to the index as follows: index << {:id => 1, :subject => "test subject", :author => "test author", :tags => "tags, like, this"} I get the following error: build_ferret_index.rb:39:in `<<': wrong argument type Hash (expected Data) (TypeError) **************** When I create the index as follows: field_infos = FieldInfos.new(:store => :yes, :term_vector => :no, :index => :yes) field_infos.add_field(:id, :index => :untokenized) field_infos.add_field(:subject) field_infos.add_field(:author) field_infos.add_field(:tags, :store => :no) index = Index::Index.new(:path => THREAD_INDEX_DIR, :field_infos => field_infos, :analyzer => Analyzer::WhiteSpaceAnalyzer.new) and run: index << {:id => 1, :subject => "test subject", :author => "test author", :tags => "tags, like, this"} Everything seems to work fine... Thoughts? -- Posted via http://www.ruby-forum.com/. From scottfr at gmail.com Tue Jan 2 14:49:26 2007 From: scottfr at gmail.com (Scott Fortmann-roe) Date: Tue, 2 Jan 2007 20:49:26 +0100 Subject: [Ferret-talk] Inconsistant Search Results Message-ID: Hi, I have this maddening strange bug using acts_as_ferret. If I search for a given phrase (let's say "xyz") I get one set of results (lets call them set A). I search for xyz again and I get set A again. I search for xyz a third time and I get a different set (set B). I can keep executing the search query and my result sets continue to cycle a-a-b-a-a-b-a-a-b. It's really strange. 'a' is the correct set of results, 'b' is related but contains some incorrect results too. I am using the fuzzy search algorithm by appending '~' to each word in the query executed with acts_as_ferrets find_by_contents. I am the only user of the server so nothing is being added to the index while I search. I am using rails 1.2RC1 the newest ferret and acts_as_ferret. Any ides? Thanks, Scott -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 3 05:37:00 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 3 Jan 2007 11:37:00 +0100 Subject: [Ferret-talk] Inconsistant Search Results In-Reply-To: References: Message-ID: <20070103103700.GA3835@cordoba.webit.de> On Tue, Jan 02, 2007 at 08:49:26PM +0100, Scott Fortmann-roe wrote: > Hi, > > I have this maddening strange bug using acts_as_ferret. If I search for > a given phrase (let's say "xyz") I get one set of results (lets call > them set A). I search for xyz again and I get set A again. I search for > xyz a third time and I get a different set (set B). > > I can keep executing the search query and my result sets continue to > cycle a-a-b-a-a-b-a-a-b. > > It's really strange. 'a' is the correct set of results, 'b' is related > but contains some incorrect results too. I am using the fuzzy search > algorithm by appending '~' to each word in the query executed with > acts_as_ferrets find_by_contents. I am the only user of the server so > nothing is being added to the index while I search. I am using rails > 1.2RC1 the newest ferret and acts_as_ferret. > > Any ides? could you try to reproduce this behaviour with a plain ruby script only using Ferret directly? I think it's a problem related to Ferret itself. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From osh.sean at gmail.com Wed Jan 3 13:49:42 2007 From: osh.sean at gmail.com (Sean Osh) Date: Wed, 3 Jan 2007 19:49:42 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> Message-ID: OK. I'm trying to start this fresh, but I guess I was never sorting the results via AAF to begin with. Does anyone know how to sort the ferret results? Here are some things I have tried (console output) >> members = Member.find_by_contents("bank", {:sort => "company_name", :limit => 50}, {}) ... [members array displayed]... >> members.each {|member| puts member.company_name} SouthTrust Bank Colonial Bank SunTrust Bank Citrus & Chemical Bank Bank of America Citizens Bank & Trust Platinum Bank CenterState Bank of Florida Riverside National Bank AmSouth Bank Community National Bank Washington Mutual Bank Providence Bank Olde Cypress Community Bank >> sort_fields = [] => [] >> sort_fields = Ferret::Search::SortField.new('company_name') => company_name: >> members = Member.find_by_contents("bank", {:sort => sort_fields, :limit => 50}, {}) >> members.each {|member| puts member.company_name} SouthTrust Bank Colonial Bank SunTrust Bank Citrus & Chemical Bank Bank of America Citizens Bank & Trust Platinum Bank CenterState Bank of Florida Riverside National Bank AmSouth Bank Community National Bank Washington Mutual Bank Providence Bank Olde Cypress Community Bank >> members = Member.find_by_contents("bank", {:limit => 50}, {:order => "company_name"}) >> members.each {|member| puts member.company_name} AmSouth Bank Bank of America CenterState Bank of Florida Citizens Bank & Trust Citrus & Chemical Bank Colonial Bank Community National Bank Olde Cypress Community Bank Platinum Bank Providence Bank Riverside National Bank SouthTrust Bank SunTrust Bank Washington Mutual Bank (Good, except for the fact that I only want 10 at a time)... >> members = Member.find_by_contents("bank", {:limit => 10}, {:order => "company_name"}) >> members.each {|member| puts member.company_name} AmSouth Bank Bank of America CenterState Bank of Florida Citizens Bank & Trust Citrus & Chemical Bank Colonial Bank Platinum Bank Riverside National Bank SouthTrust Bank SunTrust Bank (I have also tried the line below this with :sort => sort_fields AND :sort => "company_name" with no luck!) >> members = Member.find_by_contents("bank", {:limit => 10, :offset => 10}, {:order => "company_name"}) >> members.each {|member| puts member.company_name} Community National Bank Olde Cypress Community Bank Providence Bank Washington Mutual Bank Now I know thats the same problem I described above, but I will settle for any help I can get !!! I've tried Member.ferret_index.search("bank", :sort => .... also tried Member.ferret_index.search_each("bank", :sort => ... Is this even possible to do what I need? I cant pass the :order => "company_name ASC" as a find_option because by the time it goes to AR to get the records, it has already returned the IDs for the 10 records so I'm only ordering those 10 records and not the entire set of 14. PLEASE HELP!!! Thanks! -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Wed Jan 3 14:19:56 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Wed, 03 Jan 2007 20:19:56 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> Message-ID: <459C01DC.8040300@benjaminkrause.com> Hey.. I've got the same problem, sorting isn't working the way you would think.. I guess thats a bug in ferret, nothing AAF can do about.. there's a ticket and I guess David is aware of that.. lets see when a new version of ferret will be released. Ben From osh.sean at gmail.com Wed Jan 3 15:27:08 2007 From: osh.sean at gmail.com (Sean Osh) Date: Wed, 3 Jan 2007 21:27:08 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <459C01DC.8040300@benjaminkrause.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> Message-ID: <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> Thanks Ben. Wish I knew it was a bug, otherwise I wouldn't have wasted everyone's time reading over all that console output! Benjamin Krause wrote: > Hey.. > > I've got the same problem, sorting isn't working the way you would > think.. I guess thats > a bug in ferret, nothing AAF can do about.. there's a ticket and I guess > David is aware > of that.. lets see when a new version of ferret will be released. > > Ben -- Posted via http://www.ruby-forum.com/. From scottfr at gmail.com Wed Jan 3 18:25:17 2007 From: scottfr at gmail.com (Scott Fortmann-roe) Date: Thu, 4 Jan 2007 00:25:17 +0100 Subject: [Ferret-talk] Inconsistant Search Results In-Reply-To: <20070103103700.GA3835@cordoba.webit.de> References: <20070103103700.GA3835@cordoba.webit.de> Message-ID: <9fcc18180b40d76e6acb1492d5bbf55d@ruby-forum.com> > > could you try to reproduce this behaviour with a plain ruby script only > using Ferret directly? I think it's a problem related to Ferret itself. > > cheers, > Jens > I would be happy to but I am having a little bit of a hard time figuring out how to search the acts_as_ferret index using ferret by itself. Could you offer some guidance on this. (It seemed like ferret use to have a Search::IndexSearcher class that no longer exists, what do I use instead? The acts_as_ferret source is very convoluted) Thanks, Scott -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jan 4 04:36:46 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 4 Jan 2007 10:36:46 +0100 Subject: [Ferret-talk] Inconsistant Search Results In-Reply-To: <9fcc18180b40d76e6acb1492d5bbf55d@ruby-forum.com> References: <20070103103700.GA3835@cordoba.webit.de> <9fcc18180b40d76e6acb1492d5bbf55d@ruby-forum.com> Message-ID: <20070104093646.GA30555@cordoba.webit.de> On Thu, Jan 04, 2007 at 12:25:17AM +0100, Scott Fortmann-roe wrote: > > > > could you try to reproduce this behaviour with a plain ruby script only > > using Ferret directly? I think it's a problem related to Ferret itself. > > > > cheers, > > Jens > > > > I would be happy to but I am having a little bit of a hard time figuring > out how to search the acts_as_ferret index using ferret by itself. Could > you offer some guidance on this. (It seemed like ferret use to have a > Search::IndexSearcher class that no longer exists, what do I use > instead? The acts_as_ferret source is very convoluted) require 'rubygems' require 'ferret' index = Ferret::I.new(:path => 'path/to/ferret/index/dir') index.search('querystring here') easiest is to do this in irb. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From brydeemer at yahoo.com Thu Jan 4 07:50:36 2007 From: brydeemer at yahoo.com (Bryan Deemer) Date: Thu, 4 Jan 2007 13:50:36 +0100 Subject: [Ferret-talk] Ferret and Godaddy.com In-Reply-To: <3D01027C-F92D-4E22-941B-A6C49E741C2A@gmx.net> References: <20061221203103.727257784@localhost> <458AF98B.90406@benjaminkrause.com> <1f79a3ed87919f9354e12eae76b8e67d@ruby-forum.com> <3D01027C-F92D-4E22-941B-A6C49E741C2A@gmx.net> Message-ID: Andreas Korth wrote: > If you have a combination of Ferret/AAF working in your development > environment, you can copy the Ferret gem to your vendor folder and it > will be used instead of the one installed by Godaddy. This technique > is known as 'freezing' and is often applied to the Rails framework to > make sure everyone in a development team is working with the same > version. > > HTH > Andreas How exactly do I freeze GEMS? Since I'm using godaddy.com I can't run any commands, which is how every tutorial tells me how to do it. So what files do I put where? And I don't have a local development environment, I'm running only on godaddy. Thanks, Bry -- Posted via http://www.ruby-forum.com/. From evtroost at vub.ac.be Thu Jan 4 08:13:58 2007 From: evtroost at vub.ac.be (Ewout) Date: Thu, 4 Jan 2007 14:13:58 +0100 Subject: [Ferret-talk] Ferret and Godaddy.com In-Reply-To: References: <20061221203103.727257784@localhost> <458AF98B.90406@benjaminkrause.com> <1f79a3ed87919f9354e12eae76b8e67d@ruby-forum.com> <3D01027C-F92D-4E22-941B-A6C49E741C2A@gmx.net> Message-ID: <20070104131358.1354162286@localhost> You should at least be able to run the rails on your local machine. If you are using windows, you can do this with InstantRails, on a mac Locomotive does the job. Once you have rails locally installed, you should install the ferret gem. Then, you can test the rails application you are using (and obviously not developing) on your local machine. Documentation on freezing is available on , though I don't think the ferret gem can be frozen, since it includes native code. Can't you ask the developers of the rails application you are using for help? Regards >Andreas Korth wrote: > >> If you have a combination of Ferret/AAF working in your development >> environment, you can copy the Ferret gem to your vendor folder and it >> will be used instead of the one installed by Godaddy. This technique >> is known as 'freezing' and is often applied to the Rails framework to >> make sure everyone in a development team is working with the same >> version. >> >> HTH >> Andreas > >How exactly do I freeze GEMS? Since I'm using godaddy.com I can't run >any commands, which is how every tutorial tells me how to do it. So >what files do I put where? > >And I don't have a local development environment, I'm running only on >godaddy. > >Thanks, > >Bry > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From brydeemer at yahoo.com Thu Jan 4 08:24:16 2007 From: brydeemer at yahoo.com (Bryan Deemer) Date: Thu, 4 Jan 2007 14:24:16 +0100 Subject: [Ferret-talk] Ferret and Godaddy.com In-Reply-To: <20070104131358.1354162286@localhost> References: <20061221203103.727257784@localhost> <458AF98B.90406@benjaminkrause.com> <1f79a3ed87919f9354e12eae76b8e67d@ruby-forum.com> <3D01027C-F92D-4E22-941B-A6C49E741C2A@gmx.net> <20070104131358.1354162286@localhost> Message-ID: <0028d0e073ab4dfcd55408a4965d3f54@ruby-forum.com> Ewout wrote: > You should at least be able to run the rails on your local machine. If > you are using windows, you can do this with InstantRails, on a mac > Locomotive does the job. Once you have rails locally installed, you > should install the ferret gem. > > Then, you can test the rails application you are using (and obviously > not developing) on your local machine. > > Documentation on freezing is available on articles/2005/12/22/freeze-other-gems-to-rails-lib-directory>, though I > don't think the ferret gem can be frozen, since it includes native code. > > Can't you ask the developers of the rails application you are using for > help? > > Regards There's no one to ask since I'm the developer. I'm a one man shop building my first rails app. I'll check out the link you sent me and see if that clears anything up. Bry -- Posted via http://www.ruby-forum.com/. From scottfr at gmail.com Thu Jan 4 17:57:01 2007 From: scottfr at gmail.com (Scott Fortmann-roe) Date: Thu, 4 Jan 2007 23:57:01 +0100 Subject: [Ferret-talk] Inconsistant Search Results In-Reply-To: <20070104093646.GA30555@cordoba.webit.de> References: <20070103103700.GA3835@cordoba.webit.de> <9fcc18180b40d76e6acb1492d5bbf55d@ruby-forum.com> <20070104093646.GA30555@cordoba.webit.de> Message-ID: Ok, I accessed and carried out searches using Ferret alone. The same a-a-b-a-a-b cyclical patten of results to repeated searches occurred with two different sets of results being returned. It only occurs when I am searching for "xyz~" search for "xyz" by itself always returns the same results. So it looks like a ferret issue. Any ideas how to fix this? thanks, Scott -- Posted via http://www.ruby-forum.com/. From fez.ummyeah at gmail.com Thu Jan 4 19:18:01 2007 From: fez.ummyeah at gmail.com (Fez Bojangles) Date: Fri, 5 Jan 2007 01:18:01 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? Message-ID: Hey all! We've been using Ferret to great success these past six months. But recently we'ved tried adding many new ContentItems (only thing being index by Ferret at the moment), and things came crashing to a halt. ferret gem: 0.10.9 acts_as_ferret plugin (not sure which version) How we're using the plugin: class ContentItem < ActiveRecord::Base acts_as_ferret :fields => { 'title' => {}, 'description' => {} } ... end In the directory (on production) 'index/production/content_item', there are now 45812 files. (this is on Fedora Core 5, btw) This leads me to believe this could be the culprit... and if not the culprit now, it will be soon. >> ContentItem.count => 19603 Any ideas? Any help would be mucho appreciated. Thanks! - Fez http://ummyeah.com/ -- Posted via http://www.ruby-forum.com/. From evtroost at vub.ac.be Thu Jan 4 20:29:18 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 5 Jan 2007 02:29:18 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: References: Message-ID: <20070105012918.749991922@localhost> Ferret can optimize its index, which will collapse the files in an index directory. Sadly enough, acts_as_ferret does not call it unless you choose to rebuild its entire index. This could solve your problem: ContentItem.rebuild_index. This might take a while... Regards, Ewout >Hey all! > >We've been using Ferret to great success these past six months. But >recently we'ved tried adding many new ContentItems (only thing being >index by Ferret at the moment), and things came crashing to a halt. > >ferret gem: 0.10.9 >acts_as_ferret plugin (not sure which version) > >How we're using the plugin: > >class ContentItem < ActiveRecord::Base > acts_as_ferret :fields => { 'title' => {}, > 'description' => {} > } > ... >end > >In the directory (on production) 'index/production/content_item', there >are now 45812 files. (this is on Fedora Core 5, btw) > >This leads me to believe this could be the culprit... and if not the >culprit now, it will be soon. > >>> ContentItem.count >=> 19603 > >Any ideas? > >Any help would be mucho appreciated. Thanks! > >- Fez >http://ummyeah.com/ > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From fez.ummyeah at gmail.com Thu Jan 4 23:02:11 2007 From: fez.ummyeah at gmail.com (Fez Bojangles) Date: Fri, 5 Jan 2007 05:02:11 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: References: Message-ID: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> Just a heads up.. rebuilding the index did the trick. http://www.ruby-forum.com/topic/89245 I'm curious though, how many items can Ferret reasonably be expected to scale to? And, if anyone has hit Ferret's natural limits, are there any solutions (i.e. partitioning the index into manageable chunks, etc) that still use Ferret as the base search indexer / engine? Fez Bojangles wrote: > Hey all! > > We've been using Ferret to great success these past six months. But > recently we'ved tried adding many new ContentItems (only thing being > index by Ferret at the moment), and things came crashing to a halt. > > ferret gem: 0.10.9 > acts_as_ferret plugin (not sure which version) > > How we're using the plugin: > > class ContentItem < ActiveRecord::Base > acts_as_ferret :fields => { 'title' => {}, > 'description' => {} > } > ... > end > > In the directory (on production) 'index/production/content_item', there > are now 45812 files. (this is on Fedora Core 5, btw) > > This leads me to believe this could be the culprit... and if not the > culprit now, it will be soon. > >>> ContentItem.count > => 19603 > > Any ideas? > > Any help would be mucho appreciated. Thanks! > > - Fez > http://ummyeah.com/ -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Fri Jan 5 05:48:31 2007 From: jan.prill at gmail.com (Jan Prill) Date: Fri, 5 Jan 2007 10:48:31 +0000 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> References: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> Message-ID: <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> Hey Fez, the limit of indexed items of ferret (and lucene) shouldn't be in the thousands but in the millions. I've indexed hundreds of thousands of documents myself with ferret as well as with lucene and 20.000 is not even near the limit. Regarding the file-count in the index directory: It seems as if the index was never optimized. This defragments the chunks into one big index file. You should investigate why this didn't happen. I did not look into the aaf code for some time but I think that it should do index optimization from time to time. Cheers, Jan -- ?????????????????????????????? http://www.inviado.de - Internetseiten f?r RAe http://www.xing.com/profile/Jan_Prill -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070105/6eabc802/attachment-0001.html From evtroost at vub.ac.be Fri Jan 5 09:30:37 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 5 Jan 2007 15:30:37 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> References: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> Message-ID: <20070105143037.27523384@localhost> Actually, it does not. The only call to index.optimize is in the rebuild_index method. A possible extension for aaf is that index.optimize is called automatically each C insertions, where C is some constant (1000 seems reasonable). I can only agree with Jan on scalability, at the moment I'm keeping an index of over 700.000 bibliographic records. Searches are instant. Regards, Ewout >Hey Fez, > >the limit of indexed items of ferret (and lucene) shouldn't be in the >thousands but in the millions. I've indexed hundreds of thousands of >documents myself with ferret as well as with lucene and 20.000 is not even >near the limit. Regarding the file-count in the index directory: It seems as >if the index was never optimized. This defragments the chunks into one big >index file. You should investigate why this didn't happen. I did not look >into the aaf code for some time but I think that it should do index >optimization from time to time. > >Cheers, >Jan > > >-- >------------------------------ >http://www.inviado.de - Internetseiten f?r RAe >http://www.xing.com/profile/Jan_Prill >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From evtroost at vub.ac.be Fri Jan 5 10:02:11 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 5 Jan 2007 16:02:11 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: <20070105143037.27523384@localhost> References: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> <20070105143037.27523384@localhost> Message-ID: <20070105150211.1938008688@localhost> I created a patch for acts_as_ferret that will optimize the index every 100 insertions (experience will have to show weither this constant is adequate). The only prerequisite is that your model has an id attribute that increases 1 by 1, automatically, since the id is used to determine when to optimize. Just apply this patch to instance_methods.rb of acts_as_ferret to try it. Hope this will be of use. >Actually, it does not. The only call to index.optimize is in the >rebuild_index method. A possible extension for aaf is that >index.optimize is called automatically each C insertions, where C is >some constant (1000 seems reasonable). > >I can only agree with Jan on scalability, at the moment I'm keeping an >index of over 700.000 bibliographic records. Searches are instant. > >Regards, >Ewout > >>Hey Fez, >> >>the limit of indexed items of ferret (and lucene) shouldn't be in the >>thousands but in the millions. I've indexed hundreds of thousands of >>documents myself with ferret as well as with lucene and 20.000 is not even >>near the limit. Regarding the file-count in the index directory: It seems as >>if the index was never optimized. This defragments the chunks into one big >>index file. You should investigate why this didn't happen. I did not look >>into the aaf code for some time but I think that it should do index >>optimization from time to time. >> >>Cheers, >>Jan >> >> >>-- >>------------------------------ >>http://www.inviado.de - Internetseiten f?r RAe >>http://www.xing.com/profile/Jan_Prill >>_______________________________________________ >>Ferret-talk mailing list >>Ferret-talk at rubyforge.org >>http://rubyforge.org/mailman/listinfo/ferret-talk > >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/octet-stream Size: 559 bytes Desc: not available Url : http://rubyforge.org/pipermail/ferret-talk/attachments/20070105/62aa2743/attachment.obj From erik at ehatchersolutions.com Fri Jan 5 10:55:03 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 5 Jan 2007 10:55:03 -0500 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: <20070105150211.1938008688@localhost> References: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> <20070105143037.27523384@localhost> <20070105150211.1938008688@localhost> Message-ID: Ferret itself does not automatically optimize itself after so many document insertions? Lucene does, but maybe Ferret does not? It certainly causes indexing hiccups when it hits that optimization with Lucene, so care has to be taken to be sure you account for that possible optimization delay or to tune the parameters so you know when to expect it. Erik On Jan 5, 2007, at 10:02 AM, Ewout wrote: > I created a patch for acts_as_ferret that will optimize the index > every > 100 insertions (experience will have to show weither this constant is > adequate). > > The only prerequisite is that your model has an id attribute that > increases 1 by 1, automatically, since the id is used to determine > when > to optimize. > > Just apply this patch to instance_methods.rb of acts_as_ferret to > try it. > > Hope this will be of use. > >> Actually, it does not. The only call to index.optimize is in the >> rebuild_index method. A possible extension for aaf is that >> index.optimize is called automatically each C insertions, where C is >> some constant (1000 seems reasonable). >> >> I can only agree with Jan on scalability, at the moment I'm >> keeping an >> index of over 700.000 bibliographic records. Searches are instant. >> >> Regards, >> Ewout >> >>> Hey Fez, >>> >>> the limit of indexed items of ferret (and lucene) shouldn't be in >>> the >>> thousands but in the millions. I've indexed hundreds of thousands of >>> documents myself with ferret as well as with lucene and 20.000 is >>> not even >>> near the limit. Regarding the file-count in the index directory: >>> It seems as >>> if the index was never optimized. This defragments the chunks >>> into one big >>> index file. You should investigate why this didn't happen. I did >>> not look >>> into the aaf code for some time but I think that it should do index >>> optimization from time to time. >>> >>> Cheers, >>> Jan >>> >>> >>> -- >>> ------------------------------ >>> http://www.inviado.de - Internetseiten f?r RAe >>> http://www.xing.com/profile/Jan_Prill >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From jtkimbell at yahoo.com Fri Jan 5 11:54:44 2007 From: jtkimbell at yahoo.com (Jt Kimbell) Date: Fri, 5 Jan 2007 17:54:44 +0100 Subject: [Ferret-talk] Confused about Search Results Message-ID: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> Hi everyone, I'm pretty new to Lucene and Ferret, so I feel that this is most likely myself not completely understanding the correct way to do this. I haved indexed ~2200 text files (of various sizes), and I am now running searches on the index to get a feel for Lucene and Ferret. In my first program, which is using Lucene I search for 'influenza' and get the following result plus a listing of all the filenames: Found 210 document(s) that matched query 'influenza Here is the Lucene code specific to searching: Directory fsDir = FSDirectory.getDirectory(indexDir, false); IndexSearcher is = new IndexSearcher(fsDir); QueryParser qp = new QueryParser("contents", new StandardAnalyzer()); Query query = qp.parse(q); Hits hits = is.search(query); For my second program, I use Ferret to search an index of the same files, which was made using Ferret. I get the following results (id and score): Searching for 'influenza'... CDC Influenza Update with score of 0.897013485431671. CDC Influenza Update with score of 0.897013485431671. CDC Influenza Update with score of 0.897013485431671. CDC Influenza Update with score of 0.897013485431671. CDC Influenza Update with score of 0.897013485431671. CDC Update 4.3.06 (Avian & Seasonal Influenza) with score of 0.776836454868317. CDC Update 4.3.06 (Avian & Seasonal Influenza) with score of 0.776836454868317. CDC Update 4.3.06 (Avian & Seasonal Influenza) with score of 0.776836454868317. CDC Update 4.3.06 (Avian & Seasonal Influenza) with score of 0.776836454868317. CDC Update 4.3.06 (Avian & Seasonal Influenza) with score of 0.776836454868317. As you can see, there are only 10 results, and they are from two different files. Does Ferret only return 10 search results at a time or something? I've reindexed and stuff a few times, and the results changed slightly, but there are always 10 results. Here is my code: searcher.search_each(Search::TermQuery.new(:content, "influenza"),{}) do |id, score| puts "#{searcher[id][:title]} with score of #{score}." end What do I need to do to get the same results as I did using Lucene? I've read through every tutorial about Ferret I could find (that was about 4 or 5 of them), read through several threads here, and read the API, but I'm still not 100% clear on what to do. Thanks, JT -- Posted via http://www.ruby-forum.com/. From JanPrill at blauton.de Fri Jan 5 12:59:31 2007 From: JanPrill at blauton.de (Jan Prill) Date: Fri, 5 Jan 2007 17:59:31 +0000 Subject: [Ferret-talk] Confused about Search Results In-Reply-To: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> References: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> Message-ID: <562a35c10701050959s3ea3242cpe38848fef0021938@mail.gmail.com> Hi Jt, please have a look at the :limit symbol of the options-hash => http://ferret.davebalmain.com/api/classes/Ferret/Search/Searcher.html#M000238 Cheers, Jan Prill -- ?????????????????????????????? http://www.inviado.de - Internetseiten f?r RAe http://www.xing.com/profile/Jan_Prill -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070105/2a560b98/attachment.html From evtroost at vub.ac.be Fri Jan 5 13:00:33 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 5 Jan 2007 19:00:33 +0100 Subject: [Ferret-talk] Hitting Files per Directory Limits with Ferret? In-Reply-To: References: <097404a51e76ae76264c1e11ce2e64cb@ruby-forum.com> <562a35c10701050248m747a4f68sf305135a2838fc3b@mail.gmail.com> <20070105143037.27523384@localhost> <20070105150211.1938008688@localhost> Message-ID: <20070105180033.1993339959@localhost> If ferret would implement automatic optimization, it should indeed be optional and parameterizable. For example: suppose you are indexing 500 013 documents. After indexing, you would naturally call index.optimize. But suppose ferret automatically optimizes every 1000 insertions. Obviously, there's lots of overhead in here (optimize 501 times instead of just once). The ideal solution would be parellel: - index optimization happens in a separate process - while optimizing, the old index is still available Is this possible now? Is Ferret safe enough to allow one process to optimize the index while another is using it? Also, anyone has data about the duration of an optimization process? I don't think it takes too long, but haven't got any concrete data on that (yet). Ewout >Ferret itself does not automatically optimize itself after so many >document insertions? > >Lucene does, but maybe Ferret does not? It certainly causes >indexing hiccups when it hits that optimization with Lucene, so care >has to be taken to be sure you account for that possible optimization >delay or to tune the parameters so you know when to expect it. > > Erik > > >On Jan 5, 2007, at 10:02 AM, Ewout wrote: > >> I created a patch for acts_as_ferret that will optimize the index >> every >> 100 insertions (experience will have to show weither this constant is >> adequate). >> >> The only prerequisite is that your model has an id attribute that >> increases 1 by 1, automatically, since the id is used to determine >> when >> to optimize. >> >> Just apply this patch to instance_methods.rb of acts_as_ferret to >> try it. >> >> Hope this will be of use. >> >>> Actually, it does not. The only call to index.optimize is in the >>> rebuild_index method. A possible extension for aaf is that >>> index.optimize is called automatically each C insertions, where C is >>> some constant (1000 seems reasonable). >>> >>> I can only agree with Jan on scalability, at the moment I'm >>> keeping an >>> index of over 700.000 bibliographic records. Searches are instant. >>> >>> Regards, >>> Ewout >>> >>>> Hey Fez, >>>> >>>> the limit of indexed items of ferret (and lucene) shouldn't be in >>>> the >>>> thousands but in the millions. I've indexed hundreds of thousands of >>>> documents myself with ferret as well as with lucene and 20.000 is >>>> not even >>>> near the limit. Regarding the file-count in the index directory: >>>> It seems as >>>> if the index was never optimized. This defragments the chunks >>>> into one big >>>> index file. You should investigate why this didn't happen. I did >>>> not look >>>> into the aaf code for some time but I think that it should do index >>>> optimization from time to time. >>>> >>>> Cheers, >>>> Jan >>>> >>>> >>>> -- >>>> ------------------------------ >>>> http://www.inviado.de - Internetseiten f?r RAe >>>> http://www.xing.com/profile/Jan_Prill >>>> _______________________________________________ >>>> Ferret-talk mailing list >>>> Ferret-talk at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From wmorgan-ferret at masanjin.net Fri Jan 5 13:01:18 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 05 Jan 2007 10:01:18 -0800 Subject: [Ferret-talk] Confused about Search Results In-Reply-To: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> References: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> Message-ID: <1168019944-redwood-312@south> Excerpts from Jt Kimbell's message of Fri Jan 05 08:54:44 -0800 2007: > As you can see, there are only 10 results, and they are from two > different files. Does Ferret only return 10 search results at a time or > something? http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000022 > searcher.search_each(Search::TermQuery.new(:content, > "influenza"),{}) do |id, score| search_each(Search::TermQuery.new(:content, "influenza"), :limit => :all) (Or :limit => 1000, etc.) -- William From dylanvaughn at yahoo.com Fri Jan 5 13:38:08 2007 From: dylanvaughn at yahoo.com (Dylan Vaughn) Date: Fri, 5 Jan 2007 10:38:08 -0800 (PST) Subject: [Ferret-talk] adding one url to rdig index? Message-ID: <20070105183808.19867.qmail@web34707.mail.mud.yahoo.com> Hey there, I'm building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn't see an example on how to do this so I am looking for some guidance here :). Is this possible with RDig and if so what is the syntax? Thanks in advance, Dylan __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kraemer at webit.de Sat Jan 6 05:44:35 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 6 Jan 2007 11:44:35 +0100 Subject: [Ferret-talk] adding one url to rdig index? In-Reply-To: <20070105183808.19867.qmail@web34707.mail.mud.yahoo.com> References: <20070105183808.19867.qmail@web34707.mail.mud.yahoo.com> Message-ID: <20070106104434.GA21940@cordoba.webit.de> Hi! On Fri, Jan 05, 2007 at 10:38:08AM -0800, Dylan Vaughn wrote: > > Hey there, > > I'm building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn't see an example on how to do this so I am looking for some guidance here :). Is this possible with RDig and if so what is the syntax? I'm afraid this is not possible with RDig out of the box. However it should be fairly easy to hack RDig so it does what you want. If you told the crawler to not follow any links, and then gave your urls as start url list, you'd be nearly done. Nearly because you have to make sure any older versions of the same documents get removed from the index properly before re-adding them. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From teles at mail.com Sat Jan 6 17:05:15 2007 From: teles at mail.com (=?utf-8?Q?Vin=c3=adcius_Manh=c3=a3es_Teles?=) Date: Sat, 6 Jan 2007 23:05:15 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <459C01DC.8040300@benjaminkrause.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> Message-ID: <6bee3ae1fa62e5677a4a650c0ec54dec@ruby-forum.com> I'm also facing this problem here. It's been driving me crazy since yesterday. I also hope it's a bug and will be fixed soon. Vin?cius Benjamin Krause wrote: > Hey.. > > I've got the same problem, sorting isn't working the way you would > think.. I guess thats > a bug in ferret, nothing AAF can do about.. there's a ticket and I guess > David is aware > of that.. lets see when a new version of ferret will be released. > > Ben -- Posted via http://www.ruby-forum.com/. From nappin713 at yahoo.com Sat Jan 6 22:22:21 2007 From: nappin713 at yahoo.com (Raymond O'connor) Date: Sun, 7 Jan 2007 04:22:21 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <6bee3ae1fa62e5677a4a650c0ec54dec@ruby-forum.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <6bee3ae1fa62e5677a4a650c0ec54dec@ruby-forum.com> Message-ID: Another question regarding the same subject. How do you sort by relevance? Or is that the default behavior? Speaking of that, what is the default behavior if you don't give any order preference? Thanks, Ray -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Sun Jan 7 08:11:46 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sun, 07 Jan 2007 14:11:46 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <6bee3ae1fa62e5677a4a650c0ec54dec@ruby-forum.com> Message-ID: <45A0F192.4070507@benjaminkrause.com> Raymond O'connor schrieb: > Another question regarding the same subject. > How do you sort by relevance? Or is that the default behavior? Speaking > of that, what is the default behavior if you don't give any order > preference? > hey .. as you write.. sorting by relevance is the default behaviour.. btw.. there are no bugs when sorting the results on integer values, the problem seems to be purely text-sorting related. so, ordering by relevance is working .. Ben From kraemer at webit.de Mon Jan 8 03:41:54 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 8 Jan 2007 09:41:54 +0100 Subject: [Ferret-talk] Possible Bug when Creating Indexes In-Reply-To: <2d62ed3ede9ff2a9fb694cdfe2c45d1e@ruby-forum.com> References: <2d62ed3ede9ff2a9fb694cdfe2c45d1e@ruby-forum.com> Message-ID: <20070108084154.GJ28256@cordoba.webit.de> On Mon, Jan 01, 2007 at 10:47:31PM +0100, Hh Hh wrote: > I'm running: > > ferret (0.10.9) > ruby 1.8.5 (2006-08-25) [i386-mswin32] > > on Windows XP(SP2) > > When I create an index as follows: > > field_infos = FieldInfos.new(:store => :yes, :term_vector => :no, :index > => :yes) > field_infos.add_field(:id, :index => :untokenized) > field_infos.add_field(:subject) > field_infos.add_field(:author) > field_infos.add_field(:tags, :store => :no) > index = field_infos.create_index(THREAD_INDEX_DIR) create_index does not return a created index instance, but self. use index = Ferret::I.new :path => THREAD_INDEX_DIR after the create_index statement. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From jtkimbell at yahoo.com Mon Jan 8 09:07:15 2007 From: jtkimbell at yahoo.com (JT Kimbell) Date: Mon, 8 Jan 2007 15:07:15 +0100 Subject: [Ferret-talk] Confused about Search Results In-Reply-To: <1168019944-redwood-312@south> References: <564f70441535ec1f9e793176105bbe5a@ruby-forum.com> <1168019944-redwood-312@south> Message-ID: <07036b5354942362bbe9f6b520823b51@ruby-forum.com> Thanks so much to both of you! Sorry I missed that. JT -- Posted via http://www.ruby-forum.com/. From py at landanger.fr Mon Jan 8 13:10:00 2007 From: py at landanger.fr (Pierre-Yves Landanger) Date: Mon, 8 Jan 2007 19:10:00 +0100 Subject: [Ferret-talk] debug : monitor ferret content Message-ID: hello, I am using acts_as_ferret, and i got a strange behavior. Object creation and search is working fine. But, if i update an object, it seems to deseaper from the index... So, in order to debug, i want to see what is inside the ferret index in real time. Is there a way to do so ? Thks. -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Mon Jan 8 15:26:41 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 08 Jan 2007 12:26:41 -0800 Subject: [Ferret-talk] Possible Bug when Creating Indexes In-Reply-To: <20070108084154.GJ28256@cordoba.webit.de> References: <2d62ed3ede9ff2a9fb694cdfe2c45d1e@ruby-forum.com> <20070108084154.GJ28256@cordoba.webit.de> Message-ID: <1168287974-redwood-3268@south> Excerpts from Jens Kraemer's message of Mon Jan 08 00:41:54 -0800 2007: > create_index does not return a created index instance, but self. Maybe it should. -- William From john at digitalpulp.com Mon Jan 8 16:35:36 2007 From: john at digitalpulp.com (John Bachir) Date: Mon, 8 Jan 2007 16:35:36 -0500 Subject: [Ferret-talk] mocking/stubbing ferret Message-ID: <89B16BAA-40A7-4019-8734-3EF214477ECC@digitalpulp.com> Does anyone have an experience with mocking/stubbing ferret, in order to increase speed? I've taken a look at what it might entail and from what I've seen it would have to be a pretty elaborate system. But I am not very experienced with mocking/stubbing. Any tips much appreciated. John From eostrom at drowning.org Mon Jan 8 17:49:01 2007 From: eostrom at drowning.org (Erik Ostrom) Date: Mon, 8 Jan 2007 23:49:01 +0100 Subject: [Ferret-talk] Using custom stem analyzer giving mongrel errors In-Reply-To: <20061208092112.GO4076@cordoba.webit.de> References: <2c4660e31d411c290d4a3adb6d6e6041@ruby-forum.com> <20061208092112.GO4076@cordoba.webit.de> Message-ID: <3dd76a90da0e3b98d1220e500ca9bee2@ruby-forum.com> > looks like inheriting from Analyzer is essential... Not necessarily. I had a similar problem when I upgraded to Rails 1.2, and the solution seems to have been putting the 'include' statement inside the class, instead of outside. It looks like the same change happened here, in addition to the change in inheritance. -- Posted via http://www.ruby-forum.com/. From julioody at gmail.com Tue Jan 9 22:03:55 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 10 Jan 2007 14:03:55 +1100 Subject: [Ferret-talk] LazyDoc over DRb Message-ID: Hey all, I'm distributing requests to a small farm of Ferret servers across the network using DRb. In a specific part of my program, I'm trying to find an entry across servers, and for that, I'm using index['example_doc_id'].load as the return value of the function in question. This returns a Ferret::Index::LazyDoc, which is all fine and dandy, except that for some reason DRb doesn't like it and won't return the "hash" to the remote client. I tried using index['example_doc_id'].load.to_a, and the array gets returned just fine. For the record, yes, I've tried index['example_doc_id'].load.to_hash, but what you get from that is again a Ferret::Index::LazyDoc, and not a Hash. I suppose I could manually copy the elements from LazyDoc to a new Hash, but that's hacky. Before I resort to it, does anyone have any ideas? Thanks in advance. -- Julio C. Ody http://rootshell.be/~julioody From wmorgan-ferret at masanjin.net Tue Jan 9 22:53:56 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Tue, 09 Jan 2007 19:53:56 -0800 Subject: [Ferret-talk] outstanding patches Message-ID: <1168401227-redwood-4342@south> Where tha Dave at? I've got 3 outstanding patches I'd like to see committed, or at least get some feedback on. -- William From sethm at loomcom.com Wed Jan 10 00:47:03 2007 From: sethm at loomcom.com (Seth J. Morabito) Date: Tue, 9 Jan 2007 21:47:03 -0800 Subject: [Ferret-talk] Corrupt index and segfaults with heavy writes? Message-ID: <20070110054703.GA14274@motherbrain.retronet.net> Hi everyone, We're running a fairly heavily used Rails app that uses ferret (and acts_as_ferret) for search. We're running on mongrel+Apache, Ruby 1.8.4, and ferret 0.10.13. We're indexing a handful of attributes on our "Image" and "User" models. After the system has been running for several days, the index gradually becomes corrupted, and ferret begins to segfault once the corruption is bad enough. We're running with ten mongrel servers balanced behind Apache, so it takes a while before they all die due to the segfaulting. The segfault spits out the following line into mongrel.log: /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:271: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i686-linux] I haven't done a ton of deeper investigation, but we suspect it may be related to locking problems. As I said, we're getting fairly heavy use, and every time a user views an Image, a view counter is updated and the Image is saved. This causes acts_as_ferret to re-add the model to the index, so the index is getting heavy write use. As a side effect of this, we see a lot of locking errors in the logs, which cause 500 error for our users: Ferret::Store::Lock::LockError (Lock Error occured at :103 in xpop_context Error occured in index.c:5368 - iw_open Couldn't obtain write lock when opening IndexWriter Eventually, we start seeing corruption errors like these (as an example): End-of-File Error occured at :79 in xraise Error occured in compound_io.c:123 - cmpdi_read_i Tried to read past end of file. File length is <9> and tried to read to <19> And then boom, mongrel processes start to die, slowly. IF the locking is leading to corruption problems, one thing that would really help is if we didn't update the index on every write. We're not searching on the image view counter, so this might end up being more of an acts_as_ferret question than a ferret question (i.e., it'd be nice to tell acts_as_ferret not to reindex the model if we're not updating an attribute we search on!). But that aside, has anyone else encountered problems with heavy writing? Thanks much, -Seth From bk at benjaminkrause.com Wed Jan 10 02:19:13 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Wed, 10 Jan 2007 08:19:13 +0100 Subject: [Ferret-talk] outstanding patches In-Reply-To: <1168401227-redwood-4342@south> References: <1168401227-redwood-4342@south> Message-ID: <45A49371.1020003@benjaminkrause.com> William Morgan schrieb: > Where tha Dave at? I've got 3 outstanding patches I'd like to see > committed, or at least get some feedback on. Hey.. David is currently trying to get back on his feet in australia (he lived in japan for the last couple of month). He's aware of all patches and reports on his trac and is trying to get a new ferret version out as soon as possible. Right now he's improving the testcases for ferret, to hunt down some of the segfault bugs that a few people have reported. in the same time he's trying to simplify the code. So i guess you need a be patient for a couple of weeks, hopefully dave's back on the mailinglist regulary then.. Ben From kraemer at webit.de Wed Jan 10 03:29:37 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 10 Jan 2007 09:29:37 +0100 Subject: [Ferret-talk] Corrupt index and segfaults with heavy writes? In-Reply-To: <20070110054703.GA14274@motherbrain.retronet.net> References: <20070110054703.GA14274@motherbrain.retronet.net> Message-ID: <20070110082937.GA32013@cordoba.webit.de> Hi! On Tue, Jan 09, 2007 at 09:47:03PM -0800, Seth J. Morabito wrote: > Hi everyone, > [..] > > IF the locking is leading to corruption problems, one thing that would > really help is if we didn't update the index on every write. We're not > searching on the image view counter, so this might end up being more of > an acts_as_ferret question than a ferret question (i.e., it'd be nice > to tell acts_as_ferret not to reindex the model if we're not updating an > attribute we search on!). if you're on aaf trunk, this is possible: model_instance.disable_ferret # will disable ferret for the next save model_instance.save or model_instance.disable_ferret do # ferret is disabled for all saves model_instance.save # occuring inside the block end > But that aside, has anyone else encountered problems with heavy > writing? Yes, we've had the very same errors in an application not using aaf. Moveing all the indexing into a single backgroundrb process. Since then everything is fine. I have a drb indexing feature for aaf in the works, too. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed Jan 10 04:00:52 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 10 Jan 2007 10:00:52 +0100 Subject: [Ferret-talk] Search multiple models In-Reply-To: <1c596576461a10c930df1dc5f11dc944@ruby-forum.com> References: <537c544d1d0870ae2f1e80de36b79f1e@ruby-forum.com> <20061023092218.GD24958@cordoba.webit.de> <1c596576461a10c930df1dc5f11dc944@ruby-forum.com> Message-ID: <20070110090052.GB32013@cordoba.webit.de> On Wed, Dec 27, 2006 at 06:05:46PM +0100, Martin Bernd Schmeil wrote: > Hi all, > > just started to play with (acts_as_)ferret a couple of hours ago, when I > learned that ferret supports fuzzy search. > > I could not find an answer to the problem i need to solve yet: > > I have a few models with one to many relations to Clients: Addresses, > Contacts, Phone numbers, etc. > > i.e. a client may have many addresses and so on. > > I need to match a "flat" (each attribute only once) client record > against all the models attributes mentioned above and get a list of > clients with descending probability of being a duplicate. > > Is this possible? As a first try I'd build a single Ferret document for each client, containing all his contacts, addresses and phone numbers. For better results you could keep all addresses in one field, phone numbers in another, and contact names in a third field. Then take each record you suspect being a duplicate and build a query from it, using the same way of distributing the data to different fields. Running that query against the index should give you a list of possible duplicate records sorted by relevance. > Which options should I use to save memory and > performance? There seems to be no need to store the field contents themselves in the index, so this should be turned off with :store => :no when the index is created. Otherwise I'd first make it work and then look if further optimization is needed at all - Ferret is *really* fast. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From evtroost at vub.ac.be Wed Jan 10 04:38:49 2007 From: evtroost at vub.ac.be (Ewout) Date: Wed, 10 Jan 2007 10:38:49 +0100 Subject: [Ferret-talk] Corrupt index and segfaults with heavy writes? In-Reply-To: <20070110054703.GA14274@motherbrain.retronet.net> References: <20070110054703.GA14274@motherbrain.retronet.net> Message-ID: <20070110093849.530213619@localhost> Hi, > >IF the locking is leading to corruption problems, one thing that would >really help is if we didn't update the index on every write. We're not >searching on the image view counter, so this might end up being more of >an acts_as_ferret question than a ferret question (i.e., it'd be nice >to tell acts_as_ferret not to reindex the model if we're not updating an >attribute we search on!). acts_as_ferret(:fields => [:filename, :creator, ...]) With this you can control the fields that are indexed with ferret. It will produce less overhead if you don't index fields you don't search full-text. Regards, Ewout From no at spam.com Thu Jan 11 07:46:06 2007 From: no at spam.com (W S) Date: Thu, 11 Jan 2007 13:46:06 +0100 Subject: [Ferret-talk] Ferret Locking issues Message-ID: <110a6197b1607a8a859f27ad22f64be6@ruby-forum.com> Dave and all, I run a medium RoR app using Ferret and acts_as_ferret. I get a lot of lock errors. Not always but around 5% of all searches (aspecially during peak periods). Here are the messages I get: A NameError occurred in szukaj#index: uninitialized constant Ferret::Index::Index::LockError [RAILS_ROOT]/vendor/rails/activesupport/lib/active_support/dependencies.rb:478:in `const_missing' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:674:in `ensure_reader_open' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:383:in `[]' /usr/lib/ruby/1.8/monitor.rb:238:in `synchronize' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:382:in `[]' [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/class_methods.rb:413:in `find_id_by_contents' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:371:in `search_each' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:370:in `search_each' /usr/lib/ruby/1.8/monitor.rb:238:in `synchronize' /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:366:in `search_each' [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/class_methods.rb:411:in `find_id_by_contents' [RAILS_ROOT]/app/controllers/szukaj_controller.rb:38:in `index' [etc] As you can see I use ferret 0.10.13. I read: http://rubyforge.org/pipermail/ferret-talk/2006-September/001184.html but obviously it didn't help much. Is it possible to disable locking when searching? The index is updated once per week at most... -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jan 11 08:11:16 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 11 Jan 2007 14:11:16 +0100 Subject: [Ferret-talk] Ferret Locking issues In-Reply-To: <110a6197b1607a8a859f27ad22f64be6@ruby-forum.com> References: <110a6197b1607a8a859f27ad22f64be6@ruby-forum.com> Message-ID: <20070111131116.GE7007@cordoba.webit.de> Hi! On Thu, Jan 11, 2007 at 01:46:06PM +0100, W S wrote: > Dave and all, > > I run a medium RoR app using Ferret and acts_as_ferret. I get a lot of > lock errors. Not always but around 5% of all searches (aspecially during > peak periods). Here are the messages I get: > > A NameError occurred in szukaj#index: > uninitialized constant Ferret::Index::Index::LockError > [RAILS_ROOT]/vendor/rails/activesupport/lib/active_support/dependencies.rb:478:in > `const_missing' > /home/user/.gems/gems/ferret-0.10.13/lib/ferret/index.rb:674:in > `ensure_reader_open' this actually means that the LockError class could not be found in line 674, which is a rescue statement. Replacing LockError with Lock::LockError in line 674 of index.rb should help to solve that. Then you'll probably see what's really going on. Please tell us if this gets you any further. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From pjones at pmade.com Thu Jan 11 11:00:33 2007 From: pjones at pmade.com (Peter Jones) Date: Thu, 11 Jan 2007 17:00:33 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key Message-ID: I'm getting this exception from acts_as_ferret: A RuntimeError occurred in search#similar: cannot determine document number from primary key: # [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:132:in `document_number' As a result of this call: object.more_like_this The relevant backtrace: [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:132:in `document_number' [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/more_like_this.rb:64:in `more_like_this' /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize' [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/more_like_this.rb:61:in `more_like_this' [RAILS_ROOT]/app/controllers/search_controller.rb:52:in `similar' I've played with this for hours and can't seem to track it down. My best guess is a corrupt index. Does that sound about right? BTW, I'm on trunk, revision 118. It's a strange error because it up until last night, I never saw it in production, only in development. Now I'm getting exception notification emails from production. -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Thu Jan 11 15:59:09 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 11 Jan 2007 21:59:09 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> Message-ID: <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> Hey... just checked back with david about that sorting issue.. The problem on my index was, that you cannot sort by fields that you've indexed. you need to store the fields untokenized if you want to sort be them. maybe that'll fix your problem as well? Ben From nappin713 at yahoo.com Thu Jan 11 18:07:07 2007 From: nappin713 at yahoo.com (Raymond O'connor) Date: Fri, 12 Jan 2007 00:07:07 +0100 Subject: [Ferret-talk] stop words in query Message-ID: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> Hello all, Quick question, I'm using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop word I get the normal results back. Do I need to do a search of my query for stop words and remove them myself? Or is there something I'm doing wrong with passing my query to AAF? Thanks, Ray -- Posted via http://www.ruby-forum.com/. From nappin713 at yahoo.com Thu Jan 11 18:18:24 2007 From: nappin713 at yahoo.com (Raymond O'connor) Date: Fri, 12 Jan 2007 00:18:24 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key In-Reply-To: References: Message-ID: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> I just got this same error yesterday. I determined it was because my id field was set to be tokenized. Once I turned off tokenization this problem went away. I must also mention that I use a modified version of AAF that allows id to be a string instead. I only receieved this error when using search by similar results and when searching for an id that had a letter in it. If the query had only numbers in it, I would not recieve this error. Kinda strange... Anyhow, as I said above, the solution for me was to make id untokenized again (not a big deal, although I would of preferred it to be tokenized). Hope that helps, Ray -- Posted via http://www.ruby-forum.com/. From pjones at pmade.com Thu Jan 11 18:47:32 2007 From: pjones at pmade.com (Peter Jones) Date: Fri, 12 Jan 2007 00:47:32 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key In-Reply-To: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> References: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> Message-ID: Does AAF use object.id to get the id, or object.to_param? I have a few models that override to_param to return something like "25-FooBar", but id still returns 25. -- Posted via http://www.ruby-forum.com/. From evtroost at vub.ac.be Thu Jan 11 19:05:14 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 12 Jan 2007 01:05:14 +0100 Subject: [Ferret-talk] stop words in query In-Reply-To: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> References: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> Message-ID: <20070112000514.498566575@localhost> Depends on how you produced your query. In general, your query has to pass through the same analyzer that was used for indexing. So, when building a PhraseQuery, for instance, you have to get each word from the analyzer. keywords.each {|keyword| query = Search::PhraseQuery.new(:fieldname) analyzer = StemmedAnalyzer.new tokenizer = analyzer.token_stream(:fieldname, keyword) while (token = tokenizer.next) query << token.text end } This is how I do it, it would be nicer if AAF would encapsulate this. Regards, Ewout >Hello all, >Quick question, I'm using AAF and the following custom analyzer: > >class StemmedAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = ENGLISH_STOP_WORDS) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new >(StandardTokenizer.new(str)), >@stop_words)) > end > > >However when my search term includes a stop word I never get any results >back. Once I remove the stop word I get the normal results back. Do I >need to do a search of my query for stop words and remove them myself? >Or is there something I'm doing wrong with passing my query to AAF? > >Thanks, >Ray > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From kraemer at webit.de Fri Jan 12 05:22:35 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 11:22:35 +0100 Subject: [Ferret-talk] stop words in query In-Reply-To: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> References: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> Message-ID: <20070112102235.GN7007@cordoba.webit.de> On Fri, Jan 12, 2007 at 12:07:07AM +0100, Raymond O'connor wrote: > Hello all, > Quick question, I'm using AAF and the following custom analyzer: > > class StemmedAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = ENGLISH_STOP_WORDS) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), > @stop_words)) > end > > > However when my search term includes a stop word I never get any results > back. Once I remove the stop word I get the normal results back. Do I > need to do a search of my query for stop words and remove them myself? > Or is there something I'm doing wrong with passing my query to AAF? what version of aaf do you use, and how does your call to acts_as_ferret look like ? cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Fri Jan 12 05:26:06 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 11:26:06 +0100 Subject: [Ferret-talk] stop words in query In-Reply-To: <20070112000514.498566575@localhost> References: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> <20070112000514.498566575@localhost> Message-ID: <20070112102606.GO7007@cordoba.webit.de> On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote: > Depends on how you produced your query. In general, your query has to > pass through the same analyzer that was used for indexing. > > So, when building a PhraseQuery, for instance, you have to get each word > from the analyzer. > > keywords.each {|keyword| > query = Search::PhraseQuery.new(:fieldname) > analyzer = StemmedAnalyzer.new > tokenizer = analyzer.token_stream(:fieldname, keyword) > while (token = tokenizer.next) > query << token.text > end > } > > This is how I do it, it would be nicer if AAF would encapsulate this. it should do this, if it doesn't, I'd consider this a bug. There have been problems with stop words in the past, but these should finally be sorted out in current trunk. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Fri Jan 12 05:10:36 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 11:10:36 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key In-Reply-To: References: Message-ID: <20070112101036.GM7007@cordoba.webit.de> On Thu, Jan 11, 2007 at 05:00:33PM +0100, Peter Jones wrote: > I'm getting this exception from acts_as_ferret: > > A RuntimeError occurred in search#similar: > > cannot determine document number from primary key: > # > [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:132:in > `document_number' > > As a result of this call: > > object.more_like_this > > The relevant backtrace: > > [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:132:in > `document_number' the query to retrieve the document number is built in query_for_self (also in instance_methods.rb). You could insert some debugging code to output that query and check if it looks right (e.g. by running it manually against your index). It should return exactly one hit, matching the record you're calling more_like_this on. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Fri Jan 12 05:07:06 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 11:07:06 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key In-Reply-To: References: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> Message-ID: <20070112100706.GL7007@cordoba.webit.de> On Fri, Jan 12, 2007 at 12:47:32AM +0100, Peter Jones wrote: > Does AAF use object.id to get the id, or object.to_param? I have a few > models that override to_param to return something like "25-FooBar", but > id still returns 25. aaf uses self.id to determine the value that goes into the :id field of the ferret index. see to_doc in instance_methods.rb . cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Fri Jan 12 05:05:51 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 11:05:51 +0100 Subject: [Ferret-talk] ASF: cannot determine document number from primary key In-Reply-To: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> References: <17d5f577a09db394b7c775cc1a38a480@ruby-forum.com> Message-ID: <20070112100551.GK7007@cordoba.webit.de> On Fri, Jan 12, 2007 at 12:18:24AM +0100, Raymond O'connor wrote: > I just got this same error yesterday. I determined it was because my id > field was set to be tokenized. Once I turned off tokenization this > problem went away. I must also mention that I use a modified version of > AAF that allows id to be a string instead. I only receieved this error > when using search by similar results and when searching for an id that > had a letter in it. If the query had only numbers in it, I would not > recieve this error. Kinda strange... > Anyhow, as I said above, the solution for me was to make id untokenized > again (not a big deal, although I would of preferred it to be > tokenized). the id field is meant to be a unique key identifying a single record. It should not be tokenized in any case. If you need to run searches on it, you should consider adding an additional tokenized field to the index containing the same value. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From evtroost at vub.ac.be Fri Jan 12 07:15:29 2007 From: evtroost at vub.ac.be (Ewout) Date: Fri, 12 Jan 2007 13:15:29 +0100 Subject: [Ferret-talk] stop words in query In-Reply-To: <20070112102606.GO7007@cordoba.webit.de> References: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> <20070112000514.498566575@localhost> <20070112102606.GO7007@cordoba.webit.de> Message-ID: <20070112121529.514146455@localhost> >On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote: >> Depends on how you produced your query. In general, your query has to >> pass through the same analyzer that was used for indexing. >> >> So, when building a PhraseQuery, for instance, you have to get each word >> from the analyzer. >> >> keywords.each {|keyword| >> query = Search::PhraseQuery.new(:fieldname) >> analyzer = StemmedAnalyzer.new >> tokenizer = analyzer.token_stream(:fieldname, keyword) >> while (token = tokenizer.next) >> query << token.text >> end >> } >> >> This is how I do it, it would be nicer if AAF would encapsulate this. > >it should do this, if it doesn't, I'd consider this a bug. There have >been problems with stop words in the past, but these should finally be >sorted out in current trunk. I don't see this solved in the trunk @ . In single_index_find_by_contents and find_by_contents, the ferret query should be taken apart, and be analyzed using the analyzer given by the user in the acts_as_ferret call. Right? Ewout > >Jens > >-- >webit! Gesellschaft f?r neue Medien mbH www.webit.de >Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de >Schnorrstra?e 76 Tel +49 351 46766 0 >D-01069 Dresden Fax +49 351 46766 66 >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From kraemer at webit.de Fri Jan 12 08:09:54 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 12 Jan 2007 14:09:54 +0100 Subject: [Ferret-talk] stop words in query In-Reply-To: <20070112121529.514146455@localhost> References: <2ca8e1d02a3e86604ba787352e5459ff@ruby-forum.com> <20070112000514.498566575@localhost> <20070112102606.GO7007@cordoba.webit.de> <20070112121529.514146455@localhost> Message-ID: <20070112130954.GQ7007@cordoba.webit.de> On Fri, Jan 12, 2007 at 01:15:29PM +0100, Ewout wrote: > >On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote: > >> Depends on how you produced your query. In general, your query has to > >> pass through the same analyzer that was used for indexing. > >> > >> So, when building a PhraseQuery, for instance, you have to get each word > >> from the analyzer. > >> > >> keywords.each {|keyword| > >> query = Search::PhraseQuery.new(:fieldname) > >> analyzer = StemmedAnalyzer.new > >> tokenizer = analyzer.token_stream(:fieldname, keyword) > >> while (token = tokenizer.next) > >> query << token.text > >> end > >> } > >> > >> This is how I do it, it would be nicer if AAF would encapsulate this. > > > >it should do this, if it doesn't, I'd consider this a bug. There have > >been problems with stop words in the past, but these should finally be > >sorted out in current trunk. > > I don't see this solved in the trunk @ acts_as_ferret/browser/trunk>. > > In single_index_find_by_contents and find_by_contents, the ferret query > should be taken apart, and be analyzed using the analyzer given by the > user in the acts_as_ferret call. no, this is done by the Ferret-Index instance aaf internally uses. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From rilianx at gmail.com Fri Jan 12 10:13:16 2007 From: rilianx at gmail.com (Ignacio) Date: Fri, 12 Jan 2007 16:13:16 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> Message-ID: > Hi Jens, I'm running ferret 0.10.13. I think this could be the reason > why. I altered my code to run find_by_contents(query) and it works fine. > Looks like multi-search on AAF doesn't jive yet with the latest ferret. > Thanks for the tip. Hi, i have the same problem with ferret 0.10.11. Some can help me? -- Posted via http://www.ruby-forum.com/. From rilianx at gmail.com Fri Jan 12 10:36:18 2007 From: rilianx at gmail.com (Ignacio) Date: Fri, 12 Jan 2007 16:36:18 +0100 Subject: [Ferret-talk] Getting "ArgumentError ( isn't a valid directory argume In-Reply-To: References: <3986844f3d800df30733372554d993a4@ruby-forum.com> <28b1b191392270b0d5545555885147df@ruby-forum.com> <20061217220707.GA21628@cordoba.webit.de> <860f8413835b75e43b8b0c866b78ad04@ruby-forum.com> Message-ID: <002a6391f3b20ac93d068e6ce6f998c0@ruby-forum.com> Ignacio wrote: >> Hi Jens, I'm running ferret 0.10.13. I think this could be the reason >> why. I altered my code to run find_by_contents(query) and it works fine. >> Looks like multi-search on AAF doesn't jive yet with the latest ferret. >> Thanks for the tip. > > Hi, i have the same problem with ferret 0.10.11. Some can help me? I find the problem. The method multi-search supposes that the index has been constructed before. A solution can be to realize a find_by_contents in all models at begin (to construct the indexes), some one knows other better solution? -- Posted via http://www.ruby-forum.com/. From wflanagan at gmail.com Sat Jan 13 14:31:19 2007 From: wflanagan at gmail.com (William Flanagan) Date: Sat, 13 Jan 2007 14:31:19 -0500 Subject: [Ferret-talk] Problems using acts_as_ferret Message-ID: <16A686AE-41AF-41E6-8FB1-FF8DC5A0CF01@gmail.com> Hi all, I'm trying to use acts_as_ferret and have run into a brick wall. My model is Page My controller is Pages_controller. When in console, I can search for contents, and find results. For example, when I search for "spam" it "finds" 7 results. => # you can do a "p.total_hits" and get 7.. but the results are empty. If i iterate over the p in this i get no entries. How do I get to the entries that it found for the search? Am I doing something wrong? Thanks, William From kraemer at webit.de Mon Jan 15 01:45:46 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 15 Jan 2007 07:45:46 +0100 Subject: [Ferret-talk] Problems using acts_as_ferret In-Reply-To: <16A686AE-41AF-41E6-8FB1-FF8DC5A0CF01@gmail.com> References: <16A686AE-41AF-41E6-8FB1-FF8DC5A0CF01@gmail.com> Message-ID: <20070115064546.GA5304@cordoba.webit.de> Hi! On Sat, Jan 13, 2007 at 02:31:19PM -0500, William Flanagan wrote: > Hi all, > > I'm trying to use acts_as_ferret and have run into a brick wall. > > My model is Page > My controller is Pages_controller. > > When in console, I can search for contents, and find results. For > example, when I search for "spam" it "finds" 7 results. > > => # @total_hits=7, @results=[]> > > you can do a "p.total_hits" and get 7.. but the results are empty. If > i iterate over the p in this i get no entries. > > How do I get to the entries that it found for the search? Am I doing > something wrong? Looks like Ferret got 7 hits from it's index, but acts_as_ferret could not find them in your db. could you please look into development.log for some relevant output when doing this search on the console ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From carsten at sarum.dk Mon Jan 15 18:47:17 2007 From: carsten at sarum.dk (Carsten Gehling) Date: Tue, 16 Jan 2007 00:47:17 +0100 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_contents Message-ID: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> I don't know if this is a bug, or wanted behavior, but for me it was a pain in... So here's the problem + a bugfix. Lets say you have a model "Article" with the following fields: title, visible - and these records [code]title, visible ferret talk, 1 ruby talk, 0 ruby on rails, 1 lets talk about ruby, 1[/code] If I let Article act as a ferret, and do: result = Article.find_by_content('ruby') Result will contain 3 items and "total_hits" will return 3 However, if I add a condition: result = Article.find_by_content('ruby', {}, 'visible = 1') Result will contain 2 items - which is correct But "hotal_hits" will still return 3 - not what I would expect. ----------------------------- Fix for this: 1) In the acts_as_ferret plugin, find the file class_methods.rb 2) Go to line 276 where you have this code-block [code]if results.any? conditions = combine_conditions([ "#{table_name}.#{primary_key} in (?)", results.keys ], find_options[:conditions]) result = self.find(:all, find_options.merge(:conditions => conditions)) end[/code] and add this line: [code]if results.any? conditions = combine_conditions([ "#{table_name}.#{primary_key} in (?)", results.keys ], find_options[:conditions]) result = self.find(:all, find_options.merge(:conditions => conditions)) total_hits = result.length <===== ADD THIS!!! end[/code] - Carsten -- Posted via http://www.ruby-forum.com/. From carsten at sarum.dk Mon Jan 15 18:48:07 2007 From: carsten at sarum.dk (Carsten Gehling) Date: Tue, 16 Jan 2007 00:48:07 +0100 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_conten In-Reply-To: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> References: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> Message-ID: <857458bad471071144f2c00c419229d4@ruby-forum.com> Ah bloody hell... Sorry for the strange tags, I thought the forum supported BBCode... - Carsten -- Posted via http://www.ruby-forum.com/. From bschmeil at autoscout24.com Tue Jan 16 05:49:12 2007 From: bschmeil at autoscout24.com (Martin Bernd Schmeil) Date: Tue, 16 Jan 2007 11:49:12 +0100 Subject: [Ferret-talk] Search multiple models In-Reply-To: <20070110090052.GB32013@cordoba.webit.de> References: <537c544d1d0870ae2f1e80de36b79f1e@ruby-forum.com> <20061023092218.GD24958@cordoba.webit.de> <1c596576461a10c930df1dc5f11dc944@ruby-forum.com> <20070110090052.GB32013@cordoba.webit.de> Message-ID: <85c0156701edf542267f4645b0251d26@ruby-forum.com> Hi Jens, thanks for the answer. (Because of time constraints) I solved the problem in a different way, i.e. providing each model a client_id method and then summing up the individual fuzzy search results for each attribute. I guess this is neither legant nor performant and I'm not happy with the resulting scores. But we can live with it for now. The main issues we have is the well known locking problem and the scores. The scores leave us with the problem that - while the order seems to be correct - we don't know where to cut the line to display results and what a relevant match is. For a dozen attributes I've seen scores from 0.something to 9.something, with a result close below 9 not even looling similar while just above 9 seems to be a "99 percent" match. If someone would tell me - in case this is possible at all - how to normalize the scores I'd be very happy. Another thing which I didn't understand yet is what actually happens if I do a multi token fuzzy search; currently I'm splitting the string up in multiple tokens and build one query "attribute:token1~ AND attribute:token2~ AND ...". Maybe not really what I should do to get correct scores. Anyways, thanks for your work and for answering my post. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Jan 16 06:13:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 16 Jan 2007 12:13:05 +0100 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_contents In-Reply-To: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> References: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> Message-ID: <20070116111305.GE11020@cordoba.webit.de> Hi! On Tue, Jan 16, 2007 at 12:47:17AM +0100, Carsten Gehling wrote: > I don't know if this is a bug, or wanted behavior, but for me it was a > pain in... So here's the problem + a bugfix. right, that's a bug. I just committed a less invasive fix that will only set total_hits to the Active Record result set size if the user gave any active record conditions with his queries (see below). But please keep in mind that total_hits still may be wrong under certain circumstances - e.g. if you specify the :num_docs ferret option and some active record conditions further limiting the result set. > Lets say you have a model "Article" with the following fields: title, > visible - and these records > > [code]title, visible > ferret talk, 1 > ruby talk, 0 > ruby on rails, 1 > lets talk about ruby, 1[/code] > > If I let Article act as a ferret, and do: > > result = Article.find_by_content('ruby') > > Result will contain 3 items and "total_hits" will return 3 > > However, if I add a condition: > result = Article.find_by_content('ruby', {}, 'visible = 1') > Result will contain 2 items - which is correct > > But "hotal_hits" will still return 3 - not what I would expect. > > if results.any? > conditions = combine_conditions([ "#{table_name}.#{primary_key} in > (?)", results.keys ], > find_options[:conditions]) > result = self.find(:all, > find_options.merge(:conditions => conditions)) > total_hits = result.length <===== ADD THIS!!! even better, add total_hits = result.length if find_options[:conditions] so total_hits stays correct if you use any ferret options like :num_docs instead of AR conditions to limit the result set. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 16 06:13:37 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 16 Jan 2007 12:13:37 +0100 Subject: [Ferret-talk] Wrong total_hits when using conditions in find_by_conten In-Reply-To: <857458bad471071144f2c00c419229d4@ruby-forum.com> References: <7165a1f793cba929faa2a2dd9ab1ffb8@ruby-forum.com> <857458bad471071144f2c00c419229d4@ruby-forum.com> Message-ID: <20070116111337.GF11020@cordoba.webit.de> On Tue, Jan 16, 2007 at 12:48:07AM +0100, Carsten Gehling wrote: > Ah bloody hell... > > Sorry for the strange tags, I thought the forum supported BBCode... that's because it's a mailing list in the first place ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 16 06:38:51 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 16 Jan 2007 12:38:51 +0100 Subject: [Ferret-talk] Search multiple models In-Reply-To: <85c0156701edf542267f4645b0251d26@ruby-forum.com> References: <537c544d1d0870ae2f1e80de36b79f1e@ruby-forum.com> <20061023092218.GD24958@cordoba.webit.de> <1c596576461a10c930df1dc5f11dc944@ruby-forum.com> <20070110090052.GB32013@cordoba.webit.de> <85c0156701edf542267f4645b0251d26@ruby-forum.com> Message-ID: <20070116113851.GH11020@cordoba.webit.de> On Tue, Jan 16, 2007 at 11:49:12AM +0100, Martin Bernd Schmeil wrote: > Hi Jens, > [..] > The main issues we have is the well known locking problem and the > scores. Making sure you only have one process writing to the index, i.e. via an indexer running in backgroundrb, should solve these issues. > The scores leave us with the problem that - while the order seems to be > correct - we don't know where to cut the line to display results and > what a relevant match is. For a dozen attributes I've seen scores from > 0.something to 9.something, with a result close below 9 not even > looling similar while just above 9 seems to be a "99 percent" match. the calculation of scores is quite complex. To get an idea what happens in there you can use Ferret's explain method (in Ferret::Search::Searcher). > If someone would tell me - in case this is possible at all - how to > normalize the scores I'd be very happy. no idea if this is possible - maybe you find some information about this in the context of Lucene (i.e. in Eric Hatcher's fine Lucene book or on the lucene mailing list). > Another thing which I didn't understand yet is what actually happens if > I do a multi token fuzzy search; currently I'm splitting the string up > in multiple tokens and build one query "attribute:token1~ AND > attribute:token2~ AND ...". Maybe not really what I should do to get > correct scores. don't know if there is another way to express this with ferret. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From bschmeil at autoscout24.com Tue Jan 16 06:50:30 2007 From: bschmeil at autoscout24.com (Martin Bernd Schmeil) Date: Tue, 16 Jan 2007 12:50:30 +0100 Subject: [Ferret-talk] Search multiple models In-Reply-To: <20070116113851.GH11020@cordoba.webit.de> References: <537c544d1d0870ae2f1e80de36b79f1e@ruby-forum.com> <20061023092218.GD24958@cordoba.webit.de> <1c596576461a10c930df1dc5f11dc944@ruby-forum.com> <20070110090052.GB32013@cordoba.webit.de> <85c0156701edf542267f4645b0251d26@ruby-forum.com> <20070116113851.GH11020@cordoba.webit.de> Message-ID: <32c1abfcb5e399dc0a86da3391f12a81@ruby-forum.com> Again thanks for the answers. I did read the score formula, but my maths knowlege is almost gone now. I'll look at the docs again if I have more spare time. With the most relavant word I should be able to scale the scores to percentage. We have multiple servers running, so we definitly have concurrency problems. I just didn't play with stuff like backgroundrb yet, so I need to investigate on how to implemment a single writer solution. But thanks for the hint. -- Posted via http://www.ruby-forum.com/. From saimonmoore at gmail.com Tue Jan 16 19:03:30 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Wed, 17 Jan 2007 01:03:30 +0100 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration Message-ID: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> Hi, I've modified the latest acts_as_ferret code (version 0.3.0) to integrate with the Globalize (http://www.globalize-rails.org/globalize/) plugin. Essentially, I've added the ability to use a separate index per locale (It basically adds the language code as a suffix to the index and switches between indexes when the active locale changes). Since this introduces an optional external dependency and as I've had to touch the code up in a few files, I'm still trying to think of the best way to make this available to others. If others think this is worthwile, I'd be interested in adding this as something optional to acts_as_ferret. P.S. Currently, I've added the option like so: class Foo acts_as_ferret :single_index => true, :store_class_name => true, :localized => true, #=> this activates the option. :fields => {...} end Regards, Saimon (http://saimonmoore.net) -- Posted via http://www.ruby-forum.com/. From erik at ehatchersolutions.com Wed Jan 17 00:40:13 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Wed, 17 Jan 2007 00:40:13 -0500 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration In-Reply-To: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> References: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> Message-ID: <8B336A22-3EA4-47BC-9FBA-D97A04B4D768@ehatchersolutions.com> Did you consider using a single index, but add a locale field to every record to allow easy filtering by selected locale? If so, what are the advantages to separate indexes? Erik On Jan 16, 2007, at 7:03 PM, Saimon Moore wrote: > Hi, > > I've modified the latest acts_as_ferret code (version 0.3.0) to > integrate with the Globalize (http://www.globalize-rails.org/ > globalize/) > plugin. > > Essentially, I've added the ability to use a separate index per locale > (It basically adds the language code as a suffix to the index and > switches between indexes when the active locale changes). > > Since this introduces an optional external dependency and as I've > had to > touch the code up in a few files, I'm still trying to think of the > best > way to make this available to others. > > If others think this is worthwile, I'd be interested in adding this as > something optional to acts_as_ferret. > > P.S. Currently, I've added the option like so: > > class Foo > acts_as_ferret :single_index => true, > :store_class_name => true, > :localized => true, #=> this activates the > option. > :fields => {...} > end > > Regards, > > Saimon > > (http://saimonmoore.net) > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From shaklev at gmail.com Wed Jan 17 01:05:14 2007 From: shaklev at gmail.com (Stian Haklev) Date: Wed, 17 Jan 2007 13:05:14 +0700 Subject: [Ferret-talk] Tokenizers? Message-ID: <566574ef0701162205i301b640j54cdb4ce649a901a@mail.gmail.com> Hi everyone. First a quick word - I am relatively new to Ruby and Ruby on Rails, but I love learning about it and using it. Currently I am working on extending Boxroom (file repository RoR app) for the CARE Indonsia intranet, where I work as an intern. I am using ferret, and it's working great. I noticed that if a file contains something like this "applications/entries", this will be parsed as one word, and "applications" as a query will not yield anything, you have to search for "applications*"... This isn't entirely logical, since . , etc presumably are not included. I am quite new to search engines, and not sure exactly about the terminology - does this have something to do with a tokenizer? Where do I change the settings for this? Right now my code is very simple, just a few lines of using inserting the new files, and of searching for them (I love the automatic markup as well!) and I don't want to make my code very complex by using lower level functions, but is there a way I could easily configure the "tokenizing" behaviour (let me know if my terminology is wrong) to split for example "applications/entries" into two words, searchable by themselves? Thank you very much! Stian -- Stian Haklev - University of Toronto http://houshuang.org/blog - Random Stuff that Matters From jduflost at ben.vub.ac.be Wed Jan 17 02:02:33 2007 From: jduflost at ben.vub.ac.be (johan duflost) Date: Wed, 17 Jan 2007 08:02:33 +0100 Subject: [Ferret-talk] ferret and mongrel Message-ID: <001301c73a05$76a9b770$0700000a@ORION> Dear all, Does anybody know if there's problem with ferret running with mongrel ? I got unpredictable segfaults. Everything works well with fastcgi instead of mongrel. I use ruby 1.8.5, apache 2, mod proxy, mongrel 0.3.13.4, ferret 0.10.13 and rails 1.1.6 Thanks, Johan Johan Duflost Analyst Programmer Belgian Biodiversity Platform ( http://www.biodiversity.be) Belgian Federal Science Policy Office (http://www.belspo.be ) Tel:+32 2 650 5751 Fax: +32 2 650 5124 From saimonmoore at gmail.com Wed Jan 17 02:51:40 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Wed, 17 Jan 2007 08:51:40 +0100 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration In-Reply-To: <8B336A22-3EA4-47BC-9FBA-D97A04B4D768@ehatchersolutions.com> References: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> <8B336A22-3EA4-47BC-9FBA-D97A04B4D768@ehatchersolutions.com> Message-ID: <15a735e0273deacc8481b5310b9d81dc@ruby-forum.com> Hi Erik, I did consider this but I think having separate indexes per locale is slightly cleaner and seemed a more logical approach to me. A rebuild can be ordered for one locale without having to affect the other indices. The overhead of switching indices when the locale changes is minimal, and no filtering is required so it should be slightly faster as well overall. Also each individual index is only as large as the data available in that locale. In my particular case I'm using a single index and searching through multiple models at the same time. Since my application uses 4 locales, I have 4 indices, one shared index per locale. I think perhaps that if one-index-per-model schema is used then storing the locale within each record is perhaps a better option. Another advantage I see to this is that if this becomes a part of acts_as_ferret then most users will have to do very little to have localised ferret searching. This works well for my use-case but I'd add both possibilities to the acts_as_ferret documentation. Regards, Saimon Erik Hatcher wrote: > Did you consider using a single index, but add a locale field to > every record to allow easy filtering by selected locale? If so, what > are the advantages to separate indexes? > > Erik -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 17 04:22:13 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 17 Jan 2007 10:22:13 +0100 Subject: [Ferret-talk] ferret and mongrel In-Reply-To: <001301c73a05$76a9b770$0700000a@ORION> References: <001301c73a05$76a9b770$0700000a@ORION> Message-ID: <20070117092213.GK11020@cordoba.webit.de> On Wed, Jan 17, 2007 at 08:02:33AM +0100, johan duflost wrote: > > Dear all, > > Does anybody know if there's problem with ferret running with mongrel ? I > got unpredictable segfaults. Everything works well with fastcgi instead of > mongrel. > I use ruby 1.8.5, apache 2, mod proxy, mongrel 0.3.13.4, ferret 0.10.13 and > rails 1.1.6 I have several applications using Ferret that are running fine in Mongrel. However if it doesn't happen with fastcgi there might be a problem as well. There have been several threads about finding out how and why Mongrel segfaulted on the mongrel-users list in the past. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed Jan 17 04:40:39 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 17 Jan 2007 10:40:39 +0100 Subject: [Ferret-talk] Tokenizers? In-Reply-To: <566574ef0701162205i301b640j54cdb4ce649a901a@mail.gmail.com> References: <566574ef0701162205i301b640j54cdb4ce649a901a@mail.gmail.com> Message-ID: <20070117094039.GL11020@cordoba.webit.de> Hi! On Wed, Jan 17, 2007 at 01:05:14PM +0700, Stian Haklev wrote: [..] > but is there a way I could easily configure the > "tokenizing" behaviour (let me know if my terminology is wrong) to > split for example "applications/entries" into two words, searchable by > themselves? your terminology is correct, the tokenizer is responsible of splitting document content into single terms. You can get an idea of how this works at http://ferret.davebalmain.com/api/classes/Ferret/Analysis.html If you want to use a custom tokenizer you'll have to write your own analyzer which then makes use of this tokenizer. Don't be afraid, this is really easy: def MyAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, str) return StemFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str))) end end (from http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html) hope this gets you started. Cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From maz at rift.fr Wed Jan 17 12:25:46 2007 From: maz at rift.fr (maz) Date: Wed, 17 Jan 2007 18:25:46 +0100 Subject: [Ferret-talk] Dump and load functionnalities? Test patch provided Message-ID: <6cf0084ee5bcea78103b8806936009e7@ruby-forum.com> Hello everyone, We need to create backups of our index, but there are a few constraints: - our application shouldn't go offline for that - it has to be done quickly Ferret doesn't seem to have this kind of functionnality (though I'm very new to Ferret, I may be wrong), and I figured that I couldn't do it using plain Ruby (it's way too slow, try with a 2000000+ documents index), so the only choice left was to incorporate its support into Ferret itself. I added this couple of features: - IndexReader#dump("file") Dump the whole index to a binary, non-portable file. - IndewWriter#load("file") Load this file, append it to the current index. I wrote a somewhat dirty patch for Ferret 0.10.14 (works with 0.10.13 too), you can find it here: http://pastie.caboo.se/33769 The fact that the dump file format is binary and home made doesn't really matter to me, as long as it's fast, but it's probably not very safe either (about security checks in my code). Basically, the dump file format for one document is: ... - "int" being the C integer in the native endian and size, thus the file is only safely loadable one the same arch. - hash keys are converted from/to symbols during dump/load. - strings are stored without a ending \0. - sizes are in bytes. - of course, it's all packed together, that's binary. Now, about the feature itself, is there another, better way to do that? If not, could that find its place into Ferret (probably after some code cleaning, or even with a portable file format)? Also, I don't really know what to do with locks or mutexes, I didn't put any into my code and I couldn't figure out how ferret did for thread safety. Any ideas? Thanks, -- Maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Wed Jan 17 15:42:31 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 17 Jan 2007 15:42:31 -0500 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> Message-ID: <8761C17A-3802-4C27-9277-E5AA447D98E6@digitalpulp.com> sorting by a column, or by score: if ('date' == sort) options.merge!({ :sort => Ferret::Search::SortField.new ('search_date', :reverse => :true) }) else options.merge!({ :sort => Ferret::Search::SortField::SCORE }) end hope this helps. John From john at digitalpulp.com Wed Jan 17 15:44:14 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 17 Jan 2007 15:44:14 -0500 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <8761C17A-3802-4C27-9277-E5AA447D98E6@digitalpulp.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <8761C17A-3802-4C27-9277-E5AA447D98E6@digitalpulp.com> Message-ID: On Jan 17, 2007, at 3:42 PM, John Bachir wrote: > sorting by a column, or by score: > > if ('date' == sort) > options.merge!({ :sort => Ferret::Search::SortField.new > ('search_date', :reverse => :true) }) > else > options.merge!({ :sort => Ferret::Search::SortField::SCORE }) > end > > hope this helps. > > John Also, if you want to sort by a column, it can't be tokenized acts_as_ferret :fields => { ............ :search_date => {:term_vectors => :no, :index => :untokenized } ........... }, From john at digitalpulp.com Wed Jan 17 16:24:48 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 17 Jan 2007 16:24:48 -0500 Subject: [Ferret-talk] removing special/syntax characters Message-ID: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> Is there any somewhat standard way to remove or otherwise handle special or syntax characters from a user's search, such as a colon? I was thinking maybe there was something akin to Ferret::Analysis::FULL_ENGLISH_STOP_WORDS, like Ferret::Analysis::FERRET_SYNTAX_CHARS, but no such luck. How are other folks dealing with filtering user input? John From bk at benjaminkrause.com Wed Jan 17 17:26:05 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Wed, 17 Jan 2007 23:26:05 +0100 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> Message-ID: On 2007-01-17, at 10:24, John Bachir wrote: > Is there any somewhat standard way to remove or otherwise handle > special or syntax characters from a user's search, such as a colon? > > I was thinking maybe there was something akin to > Ferret::Analysis::FULL_ENGLISH_STOP_WORDS, like > Ferret::Analysis::FERRET_SYNTAX_CHARS, but no such luck. > > How are other folks dealing with filtering user input? Hey John, i guess that would be a nice addition to have a const defined.. i'll do it manually .. if not defined?(FERRET_SPECIAL_CHARS) FERRET_SPECIAL_CHARS = [ /:/, /\(/, /\)/, /\[/, /\]/, /!/, /\ +/, /"/, /~/, /\^/, /-/, /|/, />/, / Upon trying to install I keep getting the following error message in my terminal... svn: command not found Any help would be greatly appreciated -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Wed Jan 17 18:46:24 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Wed, 17 Jan 2007 15:46:24 -0800 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> Message-ID: <1169077226-sup-8834@south> Excerpts from John Bachir's message of Wed Jan 17 13:24:48 -0800 2007: > Is there any somewhat standard way to remove or otherwise handle > special or syntax characters from a user's search, such as a colon? If you want to allow them the full syntax, just use QueryParser#parse (and handle the QueryParseException). If you want to disallow anything special, you could split on whitespace and turn each token into a TermQuery, then throw them all into a BooleanQuery. Anything in between (e.g. allow phrase queries, but disallow everything else) will be more complicated. But I can't think of many good reasons to disallow the full syntax in the first place. -- William From flo at andersground.net Wed Jan 17 18:53:51 2007 From: flo at andersground.net (Florian Gilcher) Date: Thu, 18 Jan 2007 00:53:51 +0100 Subject: [Ferret-talk] svn: command not found In-Reply-To: References: Message-ID: <45AEB70F.9060005@andersground.net> Do you have Subversion ( http://subversion.tigris.org ) installed? Otherwise, installing a plugin won't work - rails plugin-feature depends on this. Adam wrote: > Upon trying to install I keep getting the following error message in my > terminal... > > svn: command not found > > Any help would be greatly appreciated > From john at digitalpulp.com Wed Jan 17 19:14:47 2007 From: john at digitalpulp.com (John Bachir) Date: Wed, 17 Jan 2007 19:14:47 -0500 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: <1169077226-sup-8834@south> References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> <1169077226-sup-8834@south> Message-ID: <4E078079-F5EE-4F53-85AF-3D57765C8558@digitalpulp.com> On Jan 17, 2007, at 5:26 PM, Benjamin Krause wrote: > i guess that would be a nice addition to have a const defined.. > i'll do > it manually .. > > if not defined?(FERRET_SPECIAL_CHARS) > FERRET_SPECIAL_CHARS = [ /:/, /\(/, /\)/, /\[/, /\]/, /!/, /\ > +/, /"/, /~/, /\^/, > /-/, /|/, />/, / \./, /&/ ] > end Thanks Benjamin! On Jan 17, 2007, at 6:46 PM, William Morgan wrote: > If you want to allow them the full syntax, just use QueryParser#parse > (and handle the QueryParseException). If you want to disallow anything > special, you could split on whitespace and turn each token into a > TermQuery, then throw them all into a BooleanQuery. > > Anything in between (e.g. allow phrase queries, but disallow > everything > else) will be more complicated. But I can't think of many good reasons > to disallow the full syntax in the first place. William- I agree. If it was up to me, I would allow the full syntax. Unfortunately, one of the things that the client has asked for is one two three to be transformed to *one* *two* *three* And also to be able to transparently search FOR the special characters themselves. Which means I will actually not be filtering, but escaping the special characters. (I'm assuming Ferret has some facility for searching for special characters, although I admit I haven't looked into it much yet). Cheers, John From adam at macrofiche.com Wed Jan 17 20:04:30 2007 From: adam at macrofiche.com (Adam) Date: Thu, 18 Jan 2007 02:04:30 +0100 Subject: [Ferret-talk] svn: command not found In-Reply-To: <45AEB70F.9060005@andersground.net> References: <45AEB70F.9060005@andersground.net> Message-ID: Florian Gilcher wrote: > Do you have Subversion ( http://subversion.tigris.org ) installed? > Otherwise, installing a plugin won't work - rails plugin-feature depends > on this. I installed Locomotive. Can I assume I have Subversion? -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Wed Jan 17 20:15:05 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Thu, 18 Jan 2007 02:15:05 +0100 Subject: [Ferret-talk] svn: command not found In-Reply-To: References: <45AEB70F.9060005@andersground.net> Message-ID: <31A1343A-373D-4177-97B3-575C68265E2D@gmx.net> On 18.01.2007, at 02:04, Adam wrote: > Florian Gilcher wrote: >> Do you have Subversion ( http://subversion.tigris.org ) installed? >> Otherwise, installing a plugin won't work - rails plugin-feature >> depends >> on this. > > > I installed Locomotive. Can I assume I have Subversion? You can pretty safely assume that you haven't, or else you wouldn't have got that error message. So go grab it from: http://metissian.com/projects/macosx/subversion/ Cheers, Andy From wmorgan-ferret at masanjin.net Wed Jan 17 20:23:04 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Wed, 17 Jan 2007 17:23:04 -0800 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: <4E078079-F5EE-4F53-85AF-3D57765C8558@digitalpulp.com> References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com1169077226-sup-8834@south> <4E078079-F5EE-4F53-85AF-3D57765C8558@digitalpulp.com> Message-ID: <1169082589-sup-1386@south> Excerpts from John Bachir's message of Wed Jan 17 16:14:47 -0800 2007: > Unfortunately, one of the things that the client has asked for is > > one two three > > to be transformed to > > *one* *two* *three* Ok. Then I don't think you really need to worry about escaping anything. You can split on whitespace, and wrap each token in a WildcardQuery, prefixed and suffixed with a star. Unless you're supporting phrase queries surrounded by quotes, in which case "split on whitespace" becomes something more complicated. Or unless you want to disallow wildcards from the user, in which case you'll need to escape * and ?. > And also to be able to transparently search FOR the special characters > themselves. Which means I will actually not be filtering, but escaping > the special characters. (I'm assuming Ferret has some facility for > searching for special characters, although I admit I haven't looked > into it much yet). Yep, as long as your tokenizer doesn't discard them, you're fine. Basically if you're avoiding QueryParser and building Query objects directly from the strings, then none of these characters have special semantics (except for * and ? with WildcardQuery). -- William From samuelgiffney at gmail.com Wed Jan 17 20:41:42 2007 From: samuelgiffney at gmail.com (Sam Giffney) Date: Thu, 18 Jan 2007 02:41:42 +0100 Subject: [Ferret-talk] Dump and load functionnalities? Test patch provided In-Reply-To: <6cf0084ee5bcea78103b8806936009e7@ruby-forum.com> References: <6cf0084ee5bcea78103b8806936009e7@ruby-forum.com> Message-ID: <81ddcdc794319230699dafa2acf186aa@ruby-forum.com> The easiest way would just be to copy/zip the directory that the index is in. maz wrote: > Hello everyone, > > We need to create backups of our index, but there are a few > constraints: > > - our application shouldn't go offline for that > - it has to be done quickly SNIP > Maz > Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/. From adam at macrofiche.com Wed Jan 17 23:58:18 2007 From: adam at macrofiche.com (Adam) Date: Thu, 18 Jan 2007 05:58:18 +0100 Subject: [Ferret-talk] svn: command not found In-Reply-To: <31A1343A-373D-4177-97B3-575C68265E2D@gmx.net> References: <45AEB70F.9060005@andersground.net> <31A1343A-373D-4177-97B3-575C68265E2D@gmx.net> Message-ID: Andreas Korth wrote: > On 18.01.2007, at 02:04, Adam wrote: > >> Florian Gilcher wrote: >>> Do you have Subversion ( http://subversion.tigris.org ) installed? >>> Otherwise, installing a plugin won't work - rails plugin-feature >>> depends >>> on this. >> >> >> I installed Locomotive. Can I assume I have Subversion? > > You can pretty safely assume that you haven't, or else you wouldn't > have got that error message. > > So go grab it from: > > http://metissian.com/projects/macosx/subversion/ > > Cheers, > Andy Didn't have subversion properly configured, all set now. Thanks for your help. -- Posted via http://www.ruby-forum.com/. From ferret.5.micboh at spamgourmet.com Thu Jan 18 01:42:02 2007 From: ferret.5.micboh at spamgourmet.com (Joe Mestople) Date: Thu, 18 Jan 2007 07:42:02 +0100 Subject: [Ferret-talk] corrupt index immediately after rebuild Message-ID: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> Hello, I'm usin gferret and I've just attempted to build an index that contains 15,968,046 documents. I've rebuild the index from scratch, but when I try to search for some items I get this error: IOError: IO Error occured at :79 in xraise Error occured in fs_store.c:289 - fsi_seek_i seeking pos -1284143798: This is happening when I'm trying to look up a document with id 13,677,803. Interestingly, any document after id 12,098,067 seems to trigger the error. Any ideas? Thanks! -Mike -- Posted via http://www.ruby-forum.com/. From maz at rift.fr Thu Jan 18 05:35:52 2007 From: maz at rift.fr (maz) Date: Thu, 18 Jan 2007 11:35:52 +0100 Subject: [Ferret-talk] Dump and load functionnalities? Test patch provided In-Reply-To: <81ddcdc794319230699dafa2acf186aa@ruby-forum.com> References: <6cf0084ee5bcea78103b8806936009e7@ruby-forum.com> <81ddcdc794319230699dafa2acf186aa@ruby-forum.com> Message-ID: <5afa4d274f4b8f023011b7e0bfe35381@ruby-forum.com> Sam Giffney wrote: > The easiest way would just be to copy/zip the directory that the index > is in. Isn't there a risk of data loss? I mean, if Ferret is already using the index, I can't just copy it like that because of locks and buffered data that *may* leave the index in a bad state. -- Maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jan 18 06:17:25 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 18 Jan 2007 12:17:25 +0100 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration In-Reply-To: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> References: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> Message-ID: <20070118111725.GU11020@cordoba.webit.de> Hi! On Wed, Jan 17, 2007 at 01:03:30AM +0100, Saimon Moore wrote: > Hi, > > I've modified the latest acts_as_ferret code (version 0.3.0) to > integrate with the Globalize (http://www.globalize-rails.org/globalize/) > plugin. > > Essentially, I've added the ability to use a separate index per locale > (It basically adds the language code as a suffix to the index and > switches between indexes when the active locale changes). sounds cool :-) > Since this introduces an optional external dependency and as I've had to > touch the code up in a few files, I'm still trying to think of the best > way to make this available to others. > > If others think this is worthwile, I'd be interested in adding this as > something optional to acts_as_ferret. I'd really like to have a look at it. Do you think you could provide a patch against the current acts_as_ferret trunk ? cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Thu Jan 18 04:32:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 18 Jan 2007 10:32:57 +0100 Subject: [Ferret-talk] corrupt index immediately after rebuild In-Reply-To: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> References: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> Message-ID: <20070118093257.GS11020@cordoba.webit.de> On Thu, Jan 18, 2007 at 07:42:02AM +0100, Joe Mestople wrote: > Hello, > > I'm usin gferret and I've just attempted to build an index that contains > 15,968,046 documents. I've rebuild the index from scratch, but when I > try to search for some items I get this error: > > IOError: IO Error occured at :79 in xraise > Error occured in fs_store.c:289 - fsi_seek_i > seeking pos -1284143798: > > This is happening when I'm trying to look up a document with id > 13,677,803. Interestingly, any document after id 12,098,067 seems to > trigger the error. > > Any ideas? maybe you hit some file size limit with your index? How large is it? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From saimonmoore at gmail.com Thu Jan 18 11:07:19 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Thu, 18 Jan 2007 17:07:19 +0100 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration In-Reply-To: <20070118111725.GU11020@cordoba.webit.de> References: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> <20070118111725.GU11020@cordoba.webit.de> Message-ID: <009c920e91e40ef3c89242e03bba1c17@ruby-forum.com> Hi Jens, Yep no probs. I'll send it to you as soon as I can as I'll have to modify accordingly for trunk. Regards, Saimon Jens Kraemer wrote: > Hi! > > On Wed, Jan 17, 2007 at 01:03:30AM +0100, Saimon Moore wrote: >> Hi, >> >> I've modified the latest acts_as_ferret code (version 0.3.0) to >> integrate with the Globalize (http://www.globalize-rails.org/globalize/) >> plugin. >> >> Essentially, I've added the ability to use a separate index per locale >> (It basically adds the language code as a suffix to the index and >> switches between indexes when the active locale changes). > > sounds cool :-) > >> Since this introduces an optional external dependency and as I've had to >> touch the code up in a few files, I'm still trying to think of the best >> way to make this available to others. >> >> If others think this is worthwile, I'd be interested in adding this as >> something optional to acts_as_ferret. > > I'd really like to have a look at it. Do you think you could provide a > patch against the current acts_as_ferret trunk ? > > > cheers, > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From scott.bradley.wilson at gmail.com Thu Jan 18 11:09:31 2007 From: scott.bradley.wilson at gmail.com (Scott Wilson) Date: Thu, 18 Jan 2007 17:09:31 +0100 Subject: [Ferret-talk] Updating index when non-rails app creates entries? Message-ID: <15106dc89921636640a48cda78446406@ruby-forum.com> I have a database shared between a Rails app (gui) and a Java app (daemon). When the java app periodically updates the database, this isn't reflected in Ferret indexes visible via acts_as_ferret in Rails. How do I trigger re-indexing? Do I just make my Java daemon delete the index files, or is there something cleverer than that..? Thanks! -- Posted via http://www.ruby-forum.com/. From michael at mahemoff.com Thu Jan 18 07:05:47 2007 From: michael at mahemoff.com (Michael Mahemoff) Date: Thu, 18 Jan 2007 13:05:47 +0100 Subject: [Ferret-talk] [ActsAsFerret] Index Directory Disappears and Not Re-created Message-ID: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> Hi, This is a recurring issue for me - the index directory on my production server and everything below it occasionally disappears and isn't reconstructed. I tried manually creating the entire index path manually before starting the server, but it still happened while the server is running. I don't know what's causing the index to disappear and I'm also not sure why it's not automagically re-created in any event? Using: Rails 1.1.6 Ferret 0.10.13 Acts_as_ferret (can't see a version no. but it's recent) Any help would be greatly appreciated. ---- IOError (IO Error occured at :79 in xraise Error occured in fs_store.c:185 - fs_clear_all clearing all files in /var/www/apps/accounts/current/config/../index/production/account: ): /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:664:in `initialize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:664:in `ensure_writer_open' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:270:in `<<' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.13/lib/ferret/index.rb:254:in `<<' /vendor/plugins/acts_as_ferret/lib/instance_methods.rb:85:in `ferret_create' /vendor/rails/activerecord/lib/active_record/callbacks.rb:344:in `callback' /vendor/rails/activerecord/lib/active_record/callbacks.rb:341:in `callback' /vendor/rails/activerecord/lib/active_record/callbacks.rb:266:in `create_without_timestamps' /vendor/rails/activerecord/lib/active_record/timestamp.rb:30:in `create' /vendor/rails/activerecord/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks' /vendor/rails/activerecord/lib/active_record/callbacks.rb:253:in `create_or_update' /vendor/rails/activerecord/lib/active_record/base.rb:1392:in `save_without_validation' /vendor/rails/activerecord/lib/active_record/validations.rb:736:in `save_without_transactions' /vendor/rails/activerecord/lib/active_record/transactions.rb:126:in `save' /vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:51:in `transaction' /vendor/rails/activerecord/lib/active_record/transactions.rb:91:in `transaction' /vendor/rails/activerecord/lib/active_record/transactions.rb:118:in `transaction' /vendor/rails/activerecord/lib/active_record/transactions.rb:126:in `save' -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Thu Jan 18 12:07:47 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Thu, 18 Jan 2007 09:07:47 -0800 Subject: [Ferret-talk] corrupt index immediately after rebuild In-Reply-To: <20070118093257.GS11020@cordoba.webit.de> References: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> <20070118093257.GS11020@cordoba.webit.de> Message-ID: <1169140036-sup-3746@south> Excerpts from Jens Kraemer's message of Thu Jan 18 01:32:57 -0800 2007: > maybe you hit some file size limit with your index? Also check to make sure you didn't just run out of disk space. -- William From ferret.5.micboh at spamgourmet.com Thu Jan 18 14:32:22 2007 From: ferret.5.micboh at spamgourmet.com (Joe Mestople) Date: Thu, 18 Jan 2007 20:32:22 +0100 Subject: [Ferret-talk] corrupt index immediately after rebuild In-Reply-To: <1169140036-sup-3746@south> References: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> <20070118093257.GS11020@cordoba.webit.de> <1169140036-sup-3746@south> Message-ID: William Morgan wrote: > Excerpts from Jens Kraemer's message of Thu Jan 18 01:32:57 -0800 2007: >> maybe you hit some file size limit with your index? > > Also check to make sure you didn't just run out of disk space. file size is 3,711,610,109 bytes -- the volume is ext3 and it has 74% available so I don't think it's either running out of space or exceeding the maximum file size. Has anyone else ran into a similar problem? -- Posted via http://www.ruby-forum.com/. From flo at andersground.net Thu Jan 18 15:56:36 2007 From: flo at andersground.net (Florian Gilcher) Date: Thu, 18 Jan 2007 21:56:36 +0100 Subject: [Ferret-talk] [ActsAsFerret] Globalize integration In-Reply-To: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> References: <0a90adc27f0307b4dff16e7c54dde7dd@ruby-forum.com> Message-ID: <45AFDF04.70409@andersground.net> Hi, Just for the record: i would be quite interested in such a plugin - i developed my own (quite poor) hack for it myself (which is not in production yet), but your approach seems to be much better. If there is any help or testing needed, feel free to contact me. Greetings Florian Saimon Moore wrote: > Hi, > > I've modified the latest acts_as_ferret code (version 0.3.0) to > integrate with the Globalize (http://www.globalize-rails.org/globalize/) > plugin. > > Essentially, I've added the ability to use a separate index per locale > (It basically adds the language code as a suffix to the index and > switches between indexes when the active locale changes). > > Since this introduces an optional external dependency and as I've had to > touch the code up in a few files, I'm still trying to think of the best > way to make this available to others. > > If others think this is worthwile, I'd be interested in adding this as > something optional to acts_as_ferret. > > P.S. Currently, I've added the option like so: > > class Foo > acts_as_ferret :single_index => true, > :store_class_name => true, > :localized => true, #=> this activates the > option. > :fields => {...} > end > > Regards, > > Saimon > > (http://saimonmoore.net) > From dous at penarmac.com Fri Jan 19 04:33:51 2007 From: dous at penarmac.com (Aldous D. Penaranda) Date: Fri, 19 Jan 2007 17:33:51 +0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. Message-ID: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com> Hi, We're using Ferret 0.9.4 and we've observed the following behavior. Searching for 'fieldname: foo and bar' works fine while 'fieldname: "foo and bar"' doesn't return any results. Is there a way to make ferret recognize the 'and' inside the query as a search term and not an operator? (I hope I got the terminology right) Thanks in advance. -- Linux Just Simply Rocks! dous at penarmac.com | dous at ubuntu.com http://deathwing.penarmac.com/ GPG: 0xD6655C18 From kraemer at webit.de Fri Jan 19 05:21:56 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 19 Jan 2007 11:21:56 +0100 Subject: [Ferret-talk] Updating index when non-rails app creates entries? In-Reply-To: <15106dc89921636640a48cda78446406@ruby-forum.com> References: <15106dc89921636640a48cda78446406@ruby-forum.com> Message-ID: <20070119102156.GW11020@cordoba.webit.de> On Thu, Jan 18, 2007 at 05:09:31PM +0100, Scott Wilson wrote: > I have a database shared between a Rails app (gui) and a Java app > (daemon). When the java app periodically updates the database, this > isn't reflected in Ferret indexes visible via acts_as_ferret in Rails. > How do I trigger re-indexing? Do I just make my Java daemon delete the > index files, or is there something cleverer than that..? you could have some backgroundrb process regularly check your records for changes (i.e. by checking a 'to_index' flag or just the updated_at value). That process could then call ferret_update on the records that have been updated. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From david at owen.se Fri Jan 19 05:25:13 2007 From: david at owen.se (David Wennergren) Date: Fri, 19 Jan 2007 11:25:13 +0100 Subject: [Ferret-talk] corrupt index immediately after rebuild In-Reply-To: References: <8c01e7ccc3b148a38ee10a62166f073a@ruby-forum.com> <20070118093257.GS11020@cordoba.webit.de> <1169140036-sup-3746@south> Message-ID: <55911e90d2076a3cbfbedbb26bf0c24c@ruby-forum.com> Perhaps it's the same problem as in this post: http://www.ruby-forum.com/topic/84237#151791 There is a 2GB limit to a single index file if you don't compile Ferret with large-file support. An alternative is to use :max_merge_docs to stop index merging when segments reaches a certain size. Like this: index = Index::Index.new(:path => "path", :max_merge_docs => 150000) /David Wennergren Joe Mestople wrote: > William Morgan wrote: >> Excerpts from Jens Kraemer's message of Thu Jan 18 01:32:57 -0800 2007: >>> maybe you hit some file size limit with your index? >> >> Also check to make sure you didn't just run out of disk space. > > file size is 3,711,610,109 bytes -- the volume is ext3 and it has 74% > available so I don't think it's either running out of space or exceeding > the maximum file size. > > Has anyone else ran into a similar problem? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jan 19 05:51:19 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 19 Jan 2007 11:51:19 +0100 Subject: [Ferret-talk] [ActsAsFerret] Index Directory Disappears and Not Re-created In-Reply-To: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> References: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> Message-ID: <20070119105119.GZ11020@cordoba.webit.de> On Thu, Jan 18, 2007 at 01:05:47PM +0100, Michael Mahemoff wrote: > Hi, > > This is a recurring issue for me - the index directory on my production > server and everything below it occasionally disappears and isn't > reconstructed. I tried manually creating the entire index path manually > before starting the server, but it still happened while the server is > running. > > I don't know what's causing the index to disappear and I'm also not sure > why it's not automagically re-created in any event? I suspect you use capistrano for your deployment - is it possible that on each deploy your index gets lost because it is located inside the releases/.../ subdirectory? for the index not being recreated problem - maybe it's just a permission issue? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From wmorgan-ferret at masanjin.net Fri Jan 19 11:08:38 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 19 Jan 2007 08:08:38 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com> Message-ID: <1169222395-sup-3301@south> Excerpts from Aldous D. Penaranda's message of Fri Jan 19 01:33:51 -0800 2007: > Is there a way to make ferret recognize the 'and' inside the query as > a search term and not an operator? (I hope I got the terminology > right) You need to use an Analyzer that does not remove 'and'. The default analyzer removes all words in FULL_ENGLISH_STOP_WORDS, which includes 'and'. (So does ENGLISH_STOP_WORDS.) The analyzer needs to be used both while adding documents to the index and when parsing the query parsing time (i.e. passed to both QueryParser.new and IndexWriter.new/Index.new). If you've been using the default analyzer, you'll have to reindex so that the occurrences of 'and' get written to disk. -- William From dous at penarmac.com Fri Jan 19 12:01:43 2007 From: dous at penarmac.com (Aldous D. Penaranda) Date: Sat, 20 Jan 2007 01:01:43 +0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <1169222395-sup-3301@south> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com> <1169222395-sup-3301@south> Message-ID: <24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> On 1/20/07, William Morgan wrote: > Excerpts from Aldous D. Penaranda's message of Fri Jan 19 01:33:51 -0800 2007: > > Is there a way to make ferret recognize the 'and' inside the query as > > a search term and not an operator? (I hope I got the terminology > > right) > > You need to use an Analyzer that does not remove 'and'. The default > analyzer removes all words in FULL_ENGLISH_STOP_WORDS, which includes > 'and'. (So does ENGLISH_STOP_WORDS.) Thanks. I noticed, however, that the documentation for Ferret::Index::Index says that the default analyzer is StandardAnalyzer. The StandardAnalyzer documentation says that it filters LetterTokenizer with LowerCaseFilter. Are you talking about StopAnalyzer? If so, perhaps the documentation is wrong and should be updated. I've checked both the 0.9 and 0.10 api documentation and they say the same thing. > The analyzer needs to be used both while adding documents to the index > and when parsing the query parsing time (i.e. passed to both > QueryParser.new and IndexWriter.new/Index.new). If you've been using > the default analyzer, you'll have to reindex so that the occurrences of > 'and' get written to disk. Again, many thanks! I'll try this out after I get some sleep. :) -- Linux Just Simply Rocks! dous at penarmac.com | dous at ubuntu.com http://deathwing.penarmac.com/ GPG: 0xD6655C18 From john at smokinggun.com Fri Jan 19 12:12:12 2007 From: john at smokinggun.com (John Private) Date: Fri, 19 Jan 2007 18:12:12 +0100 Subject: [Ferret-talk] =?utf-8?b?SG93IHRvIGhhdmUgJ28nID09ICfDtic=?= Message-ID: <11d4433ce411d7f457fdf09671e32b58@ruby-forum.com> Greetings, (using acts_as_ferret) So I have a book title "M?ngrel ?Horsemen?" in my index. Searching for "M?ngrel" retrieves the document. But I would like searching for "Mongrel" to also retrieve the document. Which it does not currently. Anyone have any good solutions to this problem? I suppose I could filter the documents and queries first which something like: (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "M?ngrel ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") But perhaps there is a better, or built in solution. Thanks -- Posted via http://www.ruby-forum.com/. From peter at ioffer.com Thu Jan 18 23:57:34 2007 From: peter at ioffer.com (peter) Date: Thu, 18 Jan 2007 20:57:34 -0800 Subject: [Ferret-talk] corrupt index immediately after rebuild In-Reply-To: Message-ID: I know that the the indexer hits a 2gig file limit (per file), which is a limit because of how ferret is compiled (I believe). What we've done to offset this was to, when indexing, we optimize the index every so often, so that we never hit this limit. (because the optimized file size is quite smaller than unoptimized). How many documents do you have indexed? > From: Joe Mestople > Reply-To: ferret-talk at rubyforge.org > Date: Thu, 18 Jan 2007 20:32:22 +0100 > To: ferret-talk at rubyforge.org > Subject: Re: [Ferret-talk] corrupt index immediately after rebuild > > William Morgan wrote: >> Excerpts from Jens Kraemer's message of Thu Jan 18 01:32:57 -0800 2007: >>> maybe you hit some file size limit with your index? >> >> Also check to make sure you didn't just run out of disk space. > > file size is 3,711,610,109 bytes -- the volume is ext3 and it has 74% > available so I don't think it's either running out of space or exceeding > the maximum file size. > > Has anyone else ran into a similar problem? > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From wmorgan-ferret at masanjin.net Fri Jan 19 14:47:29 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 19 Jan 2007 11:47:29 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south> <24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> Message-ID: <1169235728-sup-8063@south> Excerpts from Aldous D. Penaranda's message of Fri Jan 19 09:01:43 -0800 2007: > The StandardAnalyzer documentation says that it filters > LetterTokenizer with LowerCaseFilter. My interpretation of http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html is that StandardAnalyzer uses FULL_ENGLISH_STOP_WORDS as the stopword list. Perhaps I'm wrong; I've never verified it empirically. I'm of the opinion that the whole concept of stopwords is a relic of 1970's technology and the TREC ad-hoc query paradigm, neither of which are particularly relevant for modern-day web search, so I typically turn them off. -- William From andreas.korth at gmx.net Fri Jan 19 15:26:20 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Fri, 19 Jan 2007 21:26:20 +0100 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <1169235728-sup-8063@south> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south> <24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> <1169235728-sup-8063@south> Message-ID: On 19.01.2007, at 20:47, William Morgan wrote: > Perhaps I'm wrong; I've never verified it empirically. I'm of the > opinion that the whole concept of stopwords is a relic of 1970's > technology and the TREC ad-hoc query paradigm, neither of which are > particularly relevant for modern-day web search, so I typically turn > them off. Could you elaborate on that, please? What exactly has changed since the 70's which isn't relevant any more and what is the TREC ad-hoc query paradigm anyway? My understanding is that stop words reduce the size of the index (and hence speed up queries) by filtering out words that occur frequently in almost any text of considerable length. Isn't it even worse if you store term vectors? I'd turn off stop words right away if there wasn't any considerable impact on performance, but I'd like to have a little more information on that. I'd appreciate if you could give some pointers. Thanks! Andy From john at digitalpulp.com Fri Jan 19 16:48:44 2007 From: john at digitalpulp.com (John Bachir) Date: Fri, 19 Jan 2007 16:48:44 -0500 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: <1169082589-sup-1386@south> References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com1169077226-sup-8834@south> <4E078079-F5EE-4F53-85AF-3D57765C8558@digitalpulp.com> <1169082589-sup-1386@south> Message-ID: On Jan 17, 2007, at 8:23 PM, William Morgan wrote: > You can split on whitespace, and wrap each token in a WildcardQuery, > prefixed and suffixed with a star. Unless you're supporting phrase > queries surrounded by quotes, in which case "split on whitespace" > becomes something more complicated. Or unless you want to disallow > wildcards from the user, in which case you'll need to escape * and ?. Yes, I want to do all of the above :-D Thanks for all the tips William, I'm going to look into this in the future when I make a more refined solution. In the meantime, I am just going to strip out all special/syntax chars from the queries, which I believe will have the behavior I desire. i want a search for one-two to pull up results with one two one-two onetwo John From john at digitalpulp.com Fri Jan 19 18:57:35 2007 From: john at digitalpulp.com (John Bachir) Date: Fri, 19 Jan 2007 18:57:35 -0500 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.com> Message-ID: On Jan 17, 2007, at 5:26 PM, Benjamin Krause wrote: > FERRET_SPECIAL_CHARS = [ /:/, /\(/, /\)/, /\[/, /\]/, /!/, /\ > +/, /"/, /~/, /\^/, /-/, /|/, />/, / References: <24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> <1169235728-sup-8063@south> Message-ID: <24d6b7620701191716j1852a86dwb814cbdb3c452607@mail.gmail.com> On 1/20/07, William Morgan wrote: > Excerpts from Aldous D. Penaranda's message of Fri Jan 19 09:01:43 -0800 2007: > > The StandardAnalyzer documentation says that it filters > > LetterTokenizer with LowerCaseFilter. > > My interpretation of > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html > is that StandardAnalyzer uses FULL_ENGLISH_STOP_WORDS as the stopword > list. Yes, my bad. The latest documentation does say that. The 0.9 api doesn't and it's the version that we're using. What if the Document in question looks like this: Document { stored/uncompressed,indexed,tokenized, } Should a search for 'fieldname:"foo and bar"' result on the said document? -- Linux Just Simply Rocks! dous at penarmac.com | dous at ubuntu.com http://deathwing.penarmac.com/ GPG: 0xD6655C18 From wmorgan-ferret at masanjin.net Fri Jan 19 23:22:27 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 19 Jan 2007 20:22:27 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> <1169235728-sup-8063@south> Message-ID: <1169265603-sup-5894@south> Excerpts from Andreas Korth's message of Fri Jan 19 12:26:20 -0800 2007: > Could you elaborate on that, please? What exactly has changed since > the 70's which isn't relevant any more and what is the TREC ad-hoc > query paradigm anyway? TREC is a competition that arguably drove most information retrieval research for the past several decades. The ad-hoc task is one of the tasks in the competition, and is essentially what we think of as "search": given a fixed set of documents, take an arbitrary query and produce a subset of documents that are considered "relevant". (Other TREC tasks involve things like document clustering, or question answering, or responding to a fixed query on a changing set of documents.) Almost all the ideas behind Ferret, Lucene, etc., are from the IR research community, were evaluated and found to be favorable in the context of TREC. The "inverted" index, stop words, boosting, the twiddle operator, etc, are all many decades old. The problem is that the ad-hoc task is pretty different from, say, web search, or email search in Sup. An ad-hoc query is essentially a mini-document, with a separate title, and several complete, grammatical sentences describing the "information need" in somewhat formal English. By contrast, in our case, the user is typically entering in just a few words, and is typically making explicit use of the mechanics of the search (glorified word matching) and thus isn't entering in a grammatical English description of what he'd like to find. Stop words make a lot of sense for the ad-hoc task because they eliminate "content-free" words. But I think they don't make nearly as much sense for the uses that you and I have for Ferret. The other big difference, of course, is that disk space is much cheaper now than when this stuff was developed. > My understanding is that stop words reduce the size of the index (and > hence speed up queries) by filtering out words that occur frequently > in almost any text of considerable length. Isn't it even worse if you > store term vectors? True, and yes. The question is: by how much? > I'd turn off stop words right away if there wasn't any considerable > impact on performance, but I'd like to have a little more information > on that. I'd appreciate if you could give some pointers. Unfortunately all I have are opinions. :) I'd be very interested in an empirical analysis of just how much bigger the index gets when using stopwords (with and without term vectors), and just how much slower queries get. I'm guessing that neither will be serious, but I could be wrong. -- William From wmorgan-ferret at masanjin.net Fri Jan 19 23:27:05 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Fri, 19 Jan 2007 20:27:05 -0800 Subject: [Ferret-talk] removing special/syntax characters In-Reply-To: References: <55168CB2-E27D-45E6-9F1B-A0CB76B94F76@digitalpulp.comA933890F-CC00-4137-A065-4E824C793DCE@benjaminkrause.com> Message-ID: <1169266983-sup-7712@south> Excerpts from John Bachir's message of Fri Jan 19 15:57:35 -0800 2007: > On Jan 17, 2007, at 5:26 PM, Benjamin Krause wrote: > > > FERRET_SPECIAL_CHARS = [ /:/, /\(/, /\)/, /\[/, /\]/, /!/, /\ > > +/, /"/, /~/, /\^/, /-/, /|/, />/, / > 1. Should $ be in the list? There's a list at http://ferret.davebalmain.com/api/classes/Ferret/QueryParser.html and $ doesn't seem to be on it. (Neither does & or .) > 2. Here is the solution I came up with, (nothing mind shattering but > I thought some folks on the list might appreciate seeing it): > > query = (query.split('') - (FERRET_SPECIAL_CHARS - CONFIG > [:allowed_ferret_syntax])).join() Doesn't this also eliminate escaped versions of the special characters? (Might not be a problem, depending on the specifics of the corpus.) -- William From marvin at rectangular.com Sat Jan 20 02:48:36 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Fri, 19 Jan 2007 23:48:36 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <1169265603-sup-5894@south> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com> <1169235728-sup-8063@south> <1169265603-sup-5894@south> Message-ID: On Jan 19, 2007, at 8:22 PM, William Morgan wrote: > Stop words make a lot of sense for the ad-hoc task because they > eliminate "content-free" words. But I think they don't make nearly as > much sense for the uses that you and I have for Ferret. > > The other big difference, of course, is that disk space is much > cheaper > now than when this stuff was developed. You've expressed pretty much the reasons why the default "PolyAnalyzer" configuration in KinoSearch consists of an LCNormalizer, a Tokenizer, and a Stemmer -- no Stopalizer. See pages 74-80. > Unfortunately all I have are opinions. :) I'd be very interested in an > empirical analysis of just how much bigger the index gets when using > stopwords (with and without term vectors), and just how much slower > queries get. I'm guessing that neither will be serious, but I could be > wrong. The search-time benefit from using a stoplist can be substantial. Search-time costs are dominated by time spent pawing through postings for common terms. Eliminating the most common terms can make a big difference. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From sean_cnp at yahoo.com Sat Jan 20 09:57:46 2007 From: sean_cnp at yahoo.com (Sean Osh) Date: Sat, 20 Jan 2007 15:57:46 +0100 Subject: [Ferret-talk] Sorting/Ordering Search Results In-Reply-To: <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> References: <44aaf847f32d69bcdcdb71af3bf1d1bd@ruby-forum.com> <459C01DC.8040300@benjaminkrause.com> <13155345b60df7c0b4e24d4fde8f5d21@ruby-forum.com> <022669AA-F090-474D-A44F-2B368178ED82@benjaminkrause.com> Message-ID: <627fa564561099624485951a58b748f7@ruby-forum.com> Benjamin Krause wrote: > Hey... > > just checked back with david about that sorting issue.. The problem > on my index was, that you cannot sort by fields that you've indexed. > you need to store the fields untokenized if you want to sort be them. > maybe that'll fix your problem as well? > > Ben Yeah that did it! Thanks for all the help! -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Sat Jan 20 21:35:47 2007 From: manoel at lemos.net (Manoel Lemos) Date: Sun, 21 Jan 2007 03:35:47 +0100 Subject: [Ferret-talk] Help with Installation on OpenSolaris (TextDrive Containers) Message-ID: Gents, I installed ferret successfully in my MacOS (rails development) using the gem install. I did some tests and everything worked fine. Then I tried to install it on the same way on my production environment at TextDrive? running OpenSolaris? (container). This time 'gem install ferret' seems to be completed: [92140-AA:~/web/labs/blogblogs/trunk] root# gem install ferret Attempting local installation of 'ferret' Local gem file not found: ferret*.gem Attempting remote installation of 'ferret' Select which gem to install for your platform (i386-solaris2.8) 1. ferret 0.10.14 (ruby) 2. ferret 0.10.13 (ruby) ... 3. ferret 0.1.0 (ruby) 4. Cancel installation 1 Building native extensions. This could take a while... make: cc: Command not found make: *** [helper.o] Error 127 make: cc: Command not found make: *** [helper.o] Error 127 ruby extconf.rb install ferret creating Makefile make cc -KPIC -xO3 -xarch=386 -xspace -xildoff -I/opt/csw/include -I/opt/csw/include -KPIC -fno-common -D_FILE_OFFSET_BITS=64 -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. -I/opt/csw/include -I/opt/csw/include -c helper.c make install cc -KPIC -xO3 -xarch=386 -xspace -xildoff -I/opt/csw/include -I/opt/csw/include -KPIC -fno-common -D_FILE_OFFSET_BITS=64 -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. -I/opt/csw/include -I/opt/csw/include -c helper.c Successfully installed ferret-0.10.14 Installing RDoc documentation for ferret-0.10.14... Then I tried to run and I'm always getting a require no such file to load error, see: [92140-AA:~/web/labs/blogblogs/trunk] pocscom$ irb irb(main):001:0> require 'ferret' LoadError?: no such file to load -- ferret from (irb):1:in `require' from (irb):1 irb(main):002:0> require 'rferret' LoadError?: no such file to load -- rferret from (irb):2:in `require' from (irb):2 irb(main):003:0> Any suggestions? -- Posted via http://www.ruby-forum.com/. From dous at penarmac.com Sun Jan 21 09:26:20 2007 From: dous at penarmac.com (Aldous D. Penaranda) Date: Sun, 21 Jan 2007 22:26:20 +0800 Subject: [Ferret-talk] Help with Installation on OpenSolaris (TextDrive Containers) In-Reply-To: References: Message-ID: <24d6b7620701210626l7fa574e6n56e1d09f50263fb3@mail.gmail.com> On 1/21/07, Manoel Lemos wrote: > Then I tried to run and I'm always getting a require no such file to > load error, see: > > [92140-AA:~/web/labs/blogblogs/trunk] pocscom$ irb irb(main):001:0> > require 'ferret' LoadError?: no such file to load -- ferret > > from (irb):1:in `require' from (irb):1 > > irb(main):002:0> require 'rferret' LoadError?: no such file to load -- > rferret > > from (irb):2:in `require' from (irb):2 > > irb(main):003:0> > > Any suggestions? try: require "rubygems" require_gem "ferret" require "ferret" -- Linux Just Simply Rocks! dous at penarmac.com | dous at ubuntu.com http://deathwing.penarmac.com/ GPG: 0xD6655C18 From manoel at lemos.net Sun Jan 21 10:40:22 2007 From: manoel at lemos.net (Manoel Lemos) Date: Sun, 21 Jan 2007 16:40:22 +0100 Subject: [Ferret-talk] Help with Installation on OpenSolaris (TextDrive Contain In-Reply-To: References: Message-ID: <67ac7cbebb67140e3f533df5e895f24d@ruby-forum.com> Gents, Just found the error and the solution. Error: CC no present on my OpenSolaris installation. > make: cc: Command not found > make: *** [helper.o] Error 127 > make: cc: Command not found > make: *** [helper.o] Error 127 Solution: Use the new rbconfig file provided by TextDrive Details here: http://forum.textdrive.com/viewtopic.php?id=12630 I just renamed my old rbconfig.rb to rbconfig.rb.original and downloaded this one http://www.cuddletech.com/rbconfig.rb Things are now working fine. Special thanks to Jason from TextDrive for the hint. []s Manoel -- Posted via http://www.ruby-forum.com/. From carl.lerche at gmail.com Sun Jan 21 12:09:59 2007 From: carl.lerche at gmail.com (Carl Lerche) Date: Sun, 21 Jan 2007 09:09:59 -0800 Subject: [Ferret-talk] A few questions: Tweaking StemFilter, indexes, ... Message-ID: Hello all, I am new to the list, but I have been using ferret for a little bit already. I would first like to thank Dave for all his work on ferret. I had a few questions that I haven't been able to figure out after messing around with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing is filtered to plumb, so it doesn't match. Is there any way to tweak the filter to be able to do these matches? I would like to match all noun and verbs together (and ideally with a lower score than different verb conjugations would match). Another example would be driving and driver. Worst case scenario, I could probably do some preprocessing to the search queries to expand "plumber" or "driving" to a query that includes both stems (for example expand the query for plumber to "plumber plumb") Indexes --- I was wondering how exactly indexes are implemented under the hood and if there is a way to give hints to ferret as to how our queries will be formed in order to optimize performance. Maybe I'm thinking of ferret too much as a database, but I am not too familiar with what's under ferret's hood. The reason I ask is that for the project I am working on, I have huge amounts of text to search, but each item also has a location associated with it (longitude & lattitude) and each query will only want to search the text located in a specific area (point and radius). I can add ranged parameters to the query and that will work, but is that optimal? Hopefully I am making sense. Donations --- I was wondering if there is a page that lists the total amount of donations so far? Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) From dan_at_works at yahoo.com Sun Jan 21 13:47:56 2007 From: dan_at_works at yahoo.com (Ngoc Ngoc) Date: Sun, 21 Jan 2007 19:47:56 +0100 Subject: [Ferret-talk] could not install in WinXP Message-ID: Directory of C:\search_app 01/21/2007 19:37 . 01/21/2007 19:37 .. 01/21/2007 19:36 427 008 ferret-0.10.13.gem 01/21/2007 19:07 148 992 rdig-0.3.4.gem 2 File(s) 576 000 bytes 2 Dir(s) 45 135 982 592 bytes free C:\search_app>gem install ferret Building native extensions. This could take a while... ERROR: Error installing gem ferret[.gem]: ERROR: Failed to build gem native ext ension. Gem files will remain installed in c:/Program Files/Ruby/lib/ruby/gems/1.8/gems/ ferret-0.10.13 for inspection. Results logged to c:/Program Files/Ruby/lib/ruby/gems/1.8/gems/ferret-0.10.13/ex t/gem_make.out C:\search_app> --------------------- And looking at gem_make.out showing nothing -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Sun Jan 21 15:32:25 2007 From: manoel at lemos.net (Manoel Lemos) Date: Sun, 21 Jan 2007 21:32:25 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues Message-ID: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> Gents, I successfully installed AAF on my TextDrive OpenSolaris Container, but I'm having some issues with indexing. I have a model called Blogs which has AAF enabled. The first time I tried to find_by_contents for a 'word' I know was on the Database I got now results. Apparently the index was not ready yet. Then I waited a few hours and checked that the /index directory was receiving no changes, so the indexing was not happening also. Then I tried to re-index and I got the following error after a few hours of work: >> Blog.rebuild_index IOError: IO Error occured at :79 in xraise Error occured in fs_store.c:324 - fs_open_input couldn't create InStream script/../config/../config/../index/production/blog/_73j.fdx: from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in `delete' from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in `<<' from /opt/csw/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:256:in `<<' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:199:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:198:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:197:in `rebuild_index' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/connection_adapters/abstract/database_statements.rb:51:in `transaction' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transactions.rb:91:in `transaction' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:196:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:194:in `rebuild_index' from (irb):9 Again, it seems that the index is incomplete and is bringing partial results. Any suggestions on what to do? PS:. During the indexing, there is nothing being queried on the DB, actually the unique thing running on that DB was the console where I runned the rebuild_index. Thanks in advance. Manoel Lemos -- Posted via http://www.ruby-forum.com/. From evtroost at vub.ac.be Sun Jan 21 19:15:33 2007 From: evtroost at vub.ac.be (Ewout) Date: Mon, 22 Jan 2007 01:15:33 +0100 Subject: [Ferret-talk] A few questions: Tweaking StemFilter, indexes, ... In-Reply-To: References: Message-ID: <20070122001533.462674231@localhost> Hi, You could use a FuzzyQuery, that will match words that have some degree of resemblance, with lower score. >StemFilter ------ > >I am trying to improve the quality of my searches in context of the >content of my application. I have created an analyzer using the >following: > >StemFilter.new StopFilter.new( >LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) > >This has been pretty good so far, however, I really would like to get >a search for "plumber" match "plumbing" at maybe a lower score than it >would match "plumbers". The thing is that plumber(s) is filtered to >"plumber" and plumbing is filtered to plumb, so it doesn't match. Is >there any way to tweak the filter to be able to do these matches? I >would like to match all noun and verbs together (and ideally with a >lower score than different verb conjugations would match). Another >example would be driving and driver. From dilip.bvd at gmail.com Mon Jan 22 00:57:04 2007 From: dilip.bvd at gmail.com (Dilip Bv) Date: Mon, 22 Jan 2007 06:57:04 +0100 Subject: [Ferret-talk] dropdown list Message-ID: <310308381bc96b639371f59d079eda12@ruby-forum.com> hi i am new to ruby and i have installed ferret for search using text box for search by the string in the text box insted of that i am trying to implement a drop down list,with the selected item in the drop down it should search. problem hear is that the search is not going by the item please help me The code is: insted of this i tryied <%= select_tag 'category',options_for_select([['ALL RESIDENTIAL'], ['ALL COMMERCIAL'], ['Commercial Land'], ['Industrial Building'], ['Industrial Shed'], ['Farm House']], to_s), :onchange => "content.category(this,notnull ) ;" %> please help me.... Thank you. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jan 22 03:58:27 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 09:58:27 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> Message-ID: <20070122085827.GB29989@cordoba.webit.de> On Sun, Jan 21, 2007 at 09:32:25PM +0100, Manoel Lemos wrote: > Gents, > > I successfully installed AAF on my TextDrive OpenSolaris Container, but > I'm having some issues with indexing. > > I have a model called Blogs which has AAF enabled. > > The first time I tried to find_by_contents for a 'word' I know was on > the Database I got now results. Apparently the index was not ready yet. > Then I waited a few hours and checked that the /index directory was > receiving no changes, so the indexing was not happening also. > > Then I tried to re-index and I got the following error after a few hours > of work: does this mean it took a few hours for rebuilding the index, or did you only start the rebuild after a few hours? > >> Blog.rebuild_index > > IOError: IO Error occured at :79 in xraise > Error occured in fs_store.c:324 - fs_open_input > couldn't create InStream > script/../config/../config/../index/production/blog/_73j.fdx: file or directory> strange. This does really look like the index has been modified by something else while the rebuild was running. Could you try to start over with a new, empty index directory? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jan 22 04:01:10 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 10:01:10 +0100 Subject: [Ferret-talk] could not install in WinXP In-Reply-To: References: Message-ID: <20070122090110.GC29989@cordoba.webit.de> Hi! Just a wild guess - I didn't ever use Ferret on Windows - Do you have a compiler installed? Can you build and install other gems with native extensions? Jens On Sun, Jan 21, 2007 at 07:47:56PM +0100, Ngoc Ngoc wrote: > Directory of C:\search_app > > 01/21/2007 19:37 . > 01/21/2007 19:37 .. > 01/21/2007 19:36 427 008 ferret-0.10.13.gem > 01/21/2007 19:07 148 992 rdig-0.3.4.gem > 2 File(s) 576 000 bytes > 2 Dir(s) 45 135 982 592 bytes free > > C:\search_app>gem install ferret > Building native extensions. This could take a while... > > ERROR: Error installing gem ferret[.gem]: ERROR: Failed to build gem > native ext > ension. > Gem files will remain installed in c:/Program > Files/Ruby/lib/ruby/gems/1.8/gems/ > ferret-0.10.13 for inspection. > > > Results logged to c:/Program > Files/Ruby/lib/ruby/gems/1.8/gems/ferret-0.10.13/ex > t/gem_make.out > > C:\search_app> > > > --------------------- > And looking at gem_make.out showing nothing > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From saimonmoore at gmail.com Mon Jan 22 05:18:17 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Mon, 22 Jan 2007 11:18:17 +0100 Subject: [Ferret-talk] [Ferret] Test failures for ferret tagged REL-0.10.14 Message-ID: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> Hi Dave, I've been getting some segment faults while running my tests using 0.10.14 gem so I decided to package the gem locally to add -dH and generate core dumps for you. So I followed instructions here http://ferret.davebalmain.com/trac/wiki/DownloadCurrent and first off ran the tests. I'm getting the following failures. (see this pastie http://pastie.caboo.se/34790). I don't suppose these are expected but I thought I may as well let you know. Regards, Saimon -- Posted via http://www.ruby-forum.com/. From saimonmoore at gmail.com Mon Jan 22 05:29:50 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Mon, 22 Jan 2007 11:29:50 +0100 Subject: [Ferret-talk] [Ferret] Test failures for ferret tagged REL-0.10.14 In-Reply-To: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> Message-ID: Saimon Moore wrote: > Hi Dave, > > I've been getting some segment faults while running my tests using > 0.10.14 gem so I decided to package the gem locally to add -dH and > generate core dumps for you. > > So I followed instructions here > http://ferret.davebalmain.com/trac/wiki/DownloadCurrent and first off > ran the tests. > > I'm getting the following failures. (see this pastie > http://pastie.caboo.se/34790). > > I don't suppose these are expected but I thought I may as well let you > know. > > Regards, > > Saimon Also: >> saimon at iris ~/tmp $ irb >> require 'rubygems' => false >> require_gem 'ferret', '>=0.10.14' => true >> puts Ferret::VERSION 0.9.6 => nil -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Mon Jan 22 05:59:06 2007 From: manoel at lemos.net (Manoel Lemos) Date: Mon, 22 Jan 2007 11:59:06 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <20070122085827.GB29989@cordoba.webit.de> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> Message-ID: <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> Jens, answering your questions: 1. Yes, it took a few hours from the start of the rebuild_index and the failure. 2. I don't think that any other process is modifying the index folder, but I'll try your suggestion. Cleaning the index folder and running rebuild_index again. Thanks for the attention. Jens Kraemer wrote: > On Sun, Jan 21, 2007 at 09:32:25PM +0100, Manoel Lemos wrote: >> receiving no changes, so the indexing was not happening also. >> >> Then I tried to re-index and I got the following error after a few hours >> of work: > > does this mean it took a few hours for rebuilding the index, or did you > only start the rebuild after a few hours? > >> >> Blog.rebuild_index >> >> IOError: IO Error occured at :79 in xraise >> Error occured in fs_store.c:324 - fs_open_input >> couldn't create InStream >> script/../config/../config/../index/production/blog/_73j.fdx: > file or directory> > > strange. This does really look like the index has been modified by > something else while the rebuild was running. Could you try to start > over with a new, empty index directory? > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From saimonmoore at gmail.com Mon Jan 22 06:01:05 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Mon, 22 Jan 2007 12:01:05 +0100 Subject: [Ferret-talk] [Ferret] Test failures for ferret tagged REL-0.10.14 In-Reply-To: References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> Message-ID: <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> What's also strange when using the locally packaged gem is the following: >> saimon at iris ~/tmp $ irb >> require 'rubygems' => false >> require_gem 'ferret','>= 0.10.14.1' #I packaged it as => true >> require 'ferret' => false >> Ferret => Ferret >> Ferret::Index => Ferret::Index >> Ferret::Index::FieldInfos NameError: uninitialized constant Ferret::Index::FieldInfos from (irb):7 For some reason it can't find Ferret::Index::FieldInfos. ??? Maybe it's the way I packaged it but I simply did: I just used gem package REL=0.10.14.1 sudo gem install pkg/ferret-0.10.14.1.gem Regards, Saimon Saimon Moore wrote: > Saimon Moore wrote: >> Hi Dave, >> >> I've been getting some segment faults while running my tests using >> 0.10.14 gem so I decided to package the gem locally to add -dH and >> generate core dumps for you. >> >> So I followed instructions here >> http://ferret.davebalmain.com/trac/wiki/DownloadCurrent and first off >> ran the tests. >> >> I'm getting the following failures. (see this pastie >> http://pastie.caboo.se/34790). >> >> I don't suppose these are expected but I thought I may as well let you >> know. >> >> Regards, >> >> Saimon > > Also: >>> saimon at iris ~/tmp $ irb >>> require 'rubygems' > => false >>> require_gem 'ferret', '>=0.10.14' > => true >>> puts Ferret::VERSION > 0.9.6 > => nil -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Mon Jan 22 06:25:13 2007 From: manoel at lemos.net (Manoel Lemos) Date: Mon, 22 Jan 2007 12:25:13 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> Message-ID: <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> Jens, Maybe you are correct. Actually my Rails application was UP. I mean, while I was running Blog.rebuild_index on the console, the Rails app was running. Is this the kind of simultaneous modification of the index that you talked about? If yes, how will Ferret and Acts-As-Ferret behave in a real life situation where we have several Mongrels running the Rails application? Is this a problem? The Blog.rebuild_index is running, I'll let you know the results (now with only the console running). Thanks for the help. Sincerely, Manoel Lemos -- Posted via http://www.ruby-forum.com/. From shaklev at gmail.com Mon Jan 22 07:29:08 2007 From: shaklev at gmail.com (Stian Haklev) Date: Mon, 22 Jan 2007 19:29:08 +0700 Subject: [Ferret-talk] Ferret-talk Digest, Vol 15, Issue 8 In-Reply-To: References: Message-ID: <566574ef0701220429k7350fba1s6561eda249c093fc@mail.gmail.com> Hi everyone, thank you for the help last time. A quick question, through rereading the ferret tutorial I realized that by adding :key => :id to the index loading, I could access my documents through index["11"], in addition to using the doc_id from ferret through index[122]... This is great, and saves me a line or two a lot of places in my code. However, is there a way of extracting the doc_id from an object you find like this. Ie I want to know the doc_id for document index["11"]. The reason for this is that I need the doc_id for the highlight function (this won't work based only on a document key will it?) My code works as it is, but it's still annoying and I'd love to "clean it up" just a bit. Thank you very much Stian From dan_at_works at yahoo.com Mon Jan 22 07:59:29 2007 From: dan_at_works at yahoo.com (ngoc) Date: Mon, 22 Jan 2007 13:59:29 +0100 Subject: [Ferret-talk] could not install in WinXP In-Reply-To: <20070122090110.GC29989@cordoba.webit.de> References: <20070122090110.GC29989@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > Just a wild guess - I didn't ever use Ferret on Windows - Do you have a > compiler installed? Can you build and install other gems with native > extensions? Hi Jens I looked closer to ferret download page and saw there are ferret version for Windows, but with older version. So I downloaded the latest Windows version and the installation is successful. Next step is RDig. And I see you are the author of RDig. Installing gave message "Could not load rubyful_soup.rb". First I installed Hpricot, so there is no need for rubyful_soup.rb. I tried to figure out what is wrong? Reading at the code, I found RDig is made for Linux and Unix platform. The path # load content extractors Dir["#{File.expand_path(File.dirname(__FILE__))}/content_extractors/**/*.rb"].each do |f| begin require f rescue LoadError puts "could not load #{f}: #{$!}" end end I changed to below-> the error message is disappear. # load content extractors Dir["#{File.expand_path(File.dirname(__FILE__))}\\content_extractors\\**\\*.rb"].each do |f| begin require f rescue LoadError puts "could not load #{f}: #{$!}" end end But I still could not get index. Because rdig -c config.rb -q 'ruby' gave no result. I changed from url_type = RDig.config.crawler.start_urls.first =~ /^file:\/\// ? :file : :http to url_type = RDig.config.crawler.start_urls.first =~ /^https?:\/\// ? :http : :file my url is "C://data_store//files" But still no index. So if I have more time, I will work further on it to make it work in Windows. And I will report to you if you want. Ngoc -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jan 22 08:22:01 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:22:01 +0100 Subject: [Ferret-talk] could not install in WinXP In-Reply-To: References: <20070122090110.GC29989@cordoba.webit.de> Message-ID: <20070122132201.GH29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 01:59:29PM +0100, ngoc wrote: > Jens Kraemer wrote: > > > Just a wild guess - I didn't ever use Ferret on Windows - Do you have a > > compiler installed? Can you build and install other gems with native > > extensions? > > Hi Jens > I looked closer to ferret download page and saw there are ferret version > for Windows, but with older version. So I downloaded the latest Windows > version and the installation is successful. What version do you have installed now? latest RDig requires a 0.10.x Ferret. Besides that, It would be cool if you could send me a patch with the changes you had to apply to rdig when you made it work on windows. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jan 22 08:34:36 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:34:36 +0100 Subject: [Ferret-talk] Ferret-talk Digest, Vol 15, Issue 8 In-Reply-To: <566574ef0701220429k7350fba1s6561eda249c093fc@mail.gmail.com> References: <566574ef0701220429k7350fba1s6561eda249c093fc@mail.gmail.com> Message-ID: <20070122133436.GI29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 07:29:08PM +0700, Stian Haklev wrote: > Hi everyone, thank you for the help last time. > > A quick question, through rereading the ferret tutorial I realized > that by adding :key => :id to the index loading, I could access my > documents through index["11"], in addition to using the doc_id from > ferret through index[122]... This is great, and saves me a line or two > a lot of places in my code. However, is there a way of extracting the > doc_id from an object you find like this. Ie I want to know the doc_id > for document index["11"]. The reason for this is that I need the > doc_id for the highlight function (this won't work based only on a > document key will it?) by definition your key field has to be unique just like a primary key in a database. So I think the easiest way to retrieve the doc_id for key '11' is to run a search for 'key:11'. However that might not be what you want since you would end up querying again for the record you already have found. Instead, the Index class should store the document id in the returned LazyDoc instance, which is not possible atm, imho. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jan 22 08:36:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:36:57 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> Message-ID: <20070122133657.GJ29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 11:59:06AM +0100, Manoel Lemos wrote: > Jens, answering your questions: > > 1. Yes, it took a few hours from the start of the rebuild_index and the > failure. wow, either that machine is *really* slow or you have an enormous amount of data to index... or something really weird is going on there. > 2. I don't think that any other process is modifying the index folder, > but I'll try your suggestion. Cleaning the index folder and running > rebuild_index again. let us know how it works out. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jan 22 08:42:52 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:42:52 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> Message-ID: <20070122134252.GK29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 12:25:13PM +0100, Manoel Lemos wrote: > Jens, > > Maybe you are correct. Actually my Rails application was UP. > I mean, while I was running Blog.rebuild_index on the console, the Rails > app was running. > > Is this the kind of simultaneous modification of the index that you > talked about? exactly. > If yes, how will Ferret and Acts-As-Ferret behave in a real life > situation where we have several Mongrels running the Rails application? > Is this a problem? It should not, since Ferret is supposed to have a file system based locking that manages inter-process synchronisation. However it doesn't seem to be reliable under certain circumstances - the usual workaround is to use a backgroundrb process that does all the indexing, and only do the searching inside the mongrels. Unfortunately aaf does not support this kind of remote-indexing yet, but it is definitely on my list. > The Blog.rebuild_index is running, I'll let you know the results (now > with only the console running). Sounds like you index a whole Farm of Blogs - I'm still wondering about the reason for the long indexing time ;-) cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From jduflost at ben.vub.ac.be Mon Jan 22 08:46:50 2007 From: jduflost at ben.vub.ac.be (johan duflost) Date: Mon, 22 Jan 2007 14:46:50 +0100 Subject: [Ferret-talk] stopwords Message-ID: <002201c73e2b$c49ba900$0700000a@ORION> Hello all, Does anybody know if the word 'other' is a special word for ferret ? I don't manage to index it ! Johan Johan Duflost Analyst Programmer Belgian Biodiversity Platform ( http://www.biodiversity.be) Belgian Federal Science Policy Office (http://www.belspo.be ) Tel:+32 2 650 5751 Fax: +32 2 650 5124 From kraemer at webit.de Mon Jan 22 08:49:13 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:49:13 +0100 Subject: [Ferret-talk] =?iso-8859-1?q?How_to_have_=27o=27_=3D=3D_=27=F6=27?= In-Reply-To: <11d4433ce411d7f457fdf09671e32b58@ruby-forum.com> References: <11d4433ce411d7f457fdf09671e32b58@ruby-forum.com> Message-ID: <20070122134913.GL29989@cordoba.webit.de> On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote: > Greetings, > > (using acts_as_ferret) > > So I have a book title "M?ngrel ?Horsemen?" in my index. > > Searching for "M?ngrel" retrieves the document. > > But I would like searching for "Mongrel" to also retrieve the document. > Which it does not currently. > > Anyone have any good solutions to this problem? > > I suppose I could filter the documents and queries first which something > like: > > > (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "M?ngrel > ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") > > But perhaps there is a better, or built in solution. I don't think so - a custom Analyzer would be the right place for this. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jan 22 08:57:13 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 Jan 2007 14:57:13 +0100 Subject: [Ferret-talk] stopwords In-Reply-To: <002201c73e2b$c49ba900$0700000a@ORION> References: <002201c73e2b$c49ba900$0700000a@ORION> Message-ID: <20070122135713.GM29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 02:46:50PM +0100, johan duflost wrote: > > Hello all, > > Does anybody know if the word 'other' is a special word for ferret ? I > don't manage to index it ! looks it is: irb(main):005:0> Ferret::Analysis::FULL_ENGLISH_STOP_WORDS.include? 'other' => true ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dan_at_works at yahoo.com Mon Jan 22 09:11:29 2007 From: dan_at_works at yahoo.com (ngoc) Date: Mon, 22 Jan 2007 15:11:29 +0100 Subject: [Ferret-talk] could not install in WinXP In-Reply-To: <20070122132201.GH29989@cordoba.webit.de> References: <20070122090110.GC29989@cordoba.webit.de> <20070122132201.GH29989@cordoba.webit.de> Message-ID: <3eeb95b714ed51031ec0d32c17addae6@ruby-forum.com> > > What version do you have installed now? latest RDig requires a 0.10.x > Ferret. > I installed ferret-0.10.9-mswin32.gem. I think the reason is derivative path is not reached RDig.config.crawler.start_urls.each { |url| add_url(url, filterchain) } To tried to print url value by RDig.config.crawler.start_urls.each { |url| add_url(url, filterchain) puts url } ->It looped only one path which is C://data_store//files When I changed to file:///data_store//files, It looped a lot but still no index. I have a lot of files in my Windows pc. I want to search it. Windows XP search is not good. Using Windows Desktop Search index my files when machine is not active. So It will reduce my hard drive life time. That is the reason, I am searching other search software than Windows XP standard search software. I will report to you patch for windows. It will take a while, because now I am very busy with other tasks. Thanks Jens ngoc -- Posted via http://www.ruby-forum.com/. From fxn at hashref.com Mon Jan 22 10:24:53 2007 From: fxn at hashref.com (Xavier Noria) Date: Mon, 22 Jan 2007 16:24:53 +0100 Subject: [Ferret-talk] =?iso-8859-1?q?How_to_have_=27o=27_=3D=3D_=27=F6=27?= In-Reply-To: <20070122134913.GL29989@cordoba.webit.de> References: <11d4433ce411d7f457fdf09671e32b58@ruby-forum.com> <20070122134913.GL29989@cordoba.webit.de> Message-ID: <9E80FA17-7AF1-40DA-BBC5-7ADFDEDA2077@hashref.com> On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote: > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote: >> Greetings, >> >> (using acts_as_ferret) >> >> So I have a book title "M?ngrel ?Horsemen?" in my index. >> >> Searching for "M?ngrel" retrieves the document. >> >> But I would like searching for "Mongrel" to also retrieve the >> document. >> Which it does not currently. >> >> Anyone have any good solutions to this problem? >> >> I suppose I could filter the documents and queries first which >> something >> like: >> >> >> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "M?ngrel >> ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") >> >> But perhaps there is a better, or built in solution. > > I don't think so - a custom Analyzer would be the right place for > this. We use a normalizer to store/query (to be revised for Rails 1.2): # Utility method that retursn an ASCIIfied, downcased, and sanitized string. # It relies on the Unicode Hacks plugin by means of String#chars. We assume # $KCODE is 'u' in environment.rb. By now we support a wide range of latin # accented letters, based on the Unicode Character Palette bundled in Macs. def self.normalize(str) n = str.chars.downcase.strip.to_s n.gsub!(/[????????]/, 'a') n.gsub!(/?/, 'ae') n.gsub!(/[??]/, 'd') n.gsub!(/[?????]/, 'c') n.gsub!(/[?????????]/, 'e') n.gsub!(/?/, 'f') n.gsub!(/[????]/, 'g') n.gsub!(/[??]/, 'h') n.gsub!(/[????????]/, 'i') n.gsub!(/[????]/, 'j') n.gsub!(/[??]/, 'k') n.gsub!(/[?????]/, 'l') n.gsub!(/[??????]/, 'n') n.gsub!(/[??????????]/, 'o') n.gsub!(/?/, 'oe') n.gsub!(/?/, 'q') n.gsub!(/[???]/, 'r') n.gsub!(/[?????]/, 's') n.gsub!(/[????]/, 't') n.gsub!(/[??????????]/, 'u') n.gsub!(/?/, 'w') n.gsub!(/[???]/, 'y') n.gsub!(/[???]/, 'z') n.gsub!(/\s+/, ' ') n.gsub!(/[^\sa-z0-9_-]/, '') n end And this convenience class method to use in Rails models with acts_as_ferret (slightly edited): # Wrapper function to normalize fields before calling acts_as_ferret # # Usage: index_fields [:field1, :field2], :option1 => ..., :option2 => ... # # Please note that your queries should use a "_normalized" suffix on # each field, i.e: +field1_normalized:foo class ActiveRecord::Base def self.index_fields(fields, *options) aaf_fields = [] fields.each do |f| class_eval <<-EOS def #{f}_normalized MyAppUtils.normalize(#{f}) end EOS aaf_fields.push ":#{f}_normalized" end aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join (',') + ']' options.each do |option_pair| option_pair.each do |key, value| aaf_call << ", :#{key} => #{value}" end end logger.info aaf_call class_eval(aaf_call) end end -- fxn From wmorgan-ferret at masanjin.net Mon Jan 22 11:42:47 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 22 Jan 2007 08:42:47 -0800 Subject: [Ferret-talk] [Ferret] Test failures for ferret taggedREL-0.10.14 In-Reply-To: <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.come642e87066f3005c51358ef4309214e3@ruby-forum.com> <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> Message-ID: <1169484097-sup-4986@south> Excerpts from Saimon Moore's message of Mon Jan 22 03:01:05 -0800 2007: > NameError: uninitialized constant Ferret::Index::FieldInfos > from (irb):7 > > For some reason it can't find Ferret::Index::FieldInfos. ??? This, and the Ferret::VERSION thing, are both symptomatic of you having packaged 0.9.6 rather than 0.10.14. The SVN directions on the Ferret page are wrong. You actually need to do svn co svn://www.davebalmain.com/exp ferret to get the latest version -- William From wmorgan-ferret at masanjin.net Mon Jan 22 12:02:54 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 22 Jan 2007 09:02:54 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com1169235728-sup-8063@south> <1169265603-sup-5894@south> Message-ID: <1169484176-sup-4548@south> Excerpts from Marvin Humphrey's message of Fri Jan 19 23:48:36 -0800 2007: > The search-time benefit from using a stoplist can be substantial. > Search-time costs are dominated by time spent pawing through postings > for common terms. Eliminating the most common terms can make a big > difference. I agree that common terms can really affect search time cost. I just don't think it's a problem. At least, I don't think it's a problem in a world where the query creaters are motivated, sophisticated users who have developed an understanding of how search engines work (i.e. glorified word matching). You don't have to use a search engine more than a few times before you understand that putting stopwords in your query is basically wasting your time. One can certainly argue about just how much we are in that world. Perhaps the AARP website search folks are in a different one. In my case, a text-only email client backed by an IR engine and with a user interface that smacks of Emacs is a pretty selective filter. :) -- William From wmorgan-ferret at masanjin.net Mon Jan 22 12:10:32 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 22 Jan 2007 09:10:32 -0800 Subject: [Ferret-talk] A few questions: Tweaking StemFilter, indexes, ... In-Reply-To: References: Message-ID: <1169485579-sup-7049@south> Excerpts from Carl Lerche's message of Sun Jan 21 09:09:59 -0800 2007: > Worst case scenario, I could probably do some preprocessing to the > search queries to expand "plumber" or "driving" to a query that > includes both stems (for example expand the query for plumber to > "plumber plumb") You can either do query expansion or you can modify the stemmer. Query expansion is probably a little easier to experiment with because you don't have to worry about reindexing, but it does come with a search-time cost which may or may not be negligible. (And it gets a little tricky with phrasal queries.) > I can add ranged parameters to the query and that will work, but is > that optimal? Hopefully I am making sense. I don't know for sure whether Ferret is sophisticated enough to optimize retrieval based on multiple ranges, but it may very well be. In any case, I think you're doing the right thing. -- William From manoel at lemos.net Mon Jan 22 12:22:27 2007 From: manoel at lemos.net (Manoel Lemos) Date: Mon, 22 Jan 2007 18:22:27 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <20070122134252.GK29989@cordoba.webit.de> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> Message-ID: Jens, In fact, I'm indexing around 150K blogs, my app is a Blog/Posts indexing service, just like Technorati, but focused on the Brazilian blogosphere. Same error, even with only the console running Blog.rebuild_index, see: /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27: /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/ind ex.rb:273:in `delete': IO Error occured at :79 in xraise (IOError) Error occured in fs_store.c:324 - fs_open_input couldn't create InStream script/../config/../index/production/blog/_3pe.fdx: from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in `<<' from /opt/csw/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:256:in `<<' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:199:in `rebuild_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:198:in `rebuild_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:197:in `rebuild_index' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/connection_adapters/abstract/database_statements. rb:51:in `transaction' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transactions.rb:91:in `transaction' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:196:in `rebuild_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:194:in `rebuild_index' from (eval):1 from /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `eval' from /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27 from /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' from /opt/csw/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' from ./script/runner:3 Suggestions? Any thing else I can do to gather more debug data? []s Manoel -- Posted via http://www.ruby-forum.com/. From saimonmoore at gmail.com Mon Jan 22 12:26:43 2007 From: saimonmoore at gmail.com (Saimon Moore) Date: Mon, 22 Jan 2007 18:26:43 +0100 Subject: [Ferret-talk] [Ferret] Test failures for ferret taggedREL-0.10.14 In-Reply-To: <1169484097-sup-4986@south> References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> <1169484097-sup-4986@south> Message-ID: <882b5e27d33068fe79023741902d4dfc@ruby-forum.com> William Morgan wrote: > Excerpts from Saimon Moore's message of Mon Jan 22 03:01:05 -0800 2007: >> NameError: uninitialized constant Ferret::Index::FieldInfos >> from (irb):7 >> >> For some reason it can't find Ferret::Index::FieldInfos. ??? > > This, and the Ferret::VERSION thing, are both symptomatic of you having > packaged 0.9.6 rather than 0.10.14. > > The SVN directions on the Ferret page are wrong. You actually need to > do > > svn co svn://www.davebalmain.com/exp ferret > > to get the latest version Hi William, Actually what I downloaded was : svn co svn://davebalmain.com/ferret/tags/REL-0.10.14 ferret_0.10.4 and Ferret::VERSION was 0.9.6 in that version. Is this still not correct? -- Posted via http://www.ruby-forum.com/. From marvin at rectangular.com Mon Jan 22 12:34:02 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Mon, 22 Jan 2007 09:34:02 -0800 Subject: [Ferret-talk] Double-quoted query with "and" fails. In-Reply-To: <1169484176-sup-4548@south> References: <24d6b7620701190133n51f02ca2me76d8ca7be6699bf@mail.gmail.com1169222395-sup-3301@south24d6b7620701190901l26e8b223g5c04579be66c2d46@mail.gmail.com1169235728-sup-8063@south> <1169265603-sup-5894@south> <1169484176-sup-4548@south> Message-ID: <38581155-E836-4A6B-A727-D06F922D8204@rectangular.com> On Jan 22, 2007, at 9:02 AM, William Morgan wrote: > Excerpts from Marvin Humphrey's message of Fri Jan 19 23:48:36 > -0800 2007: >> The search-time benefit from using a stoplist can be substantial. >> Search-time costs are dominated by time spent pawing through postings >> for common terms. Eliminating the most common terms can make a big >> difference. > > I agree that common terms can really affect search time cost. I just > don't think it's a problem. Yes. If your corpus is small enough and your machine is fast enough, the absolute search-time costs of using an engine as efficient as Ferret or KinoSearch aren't consequential. As the corpus grows you have the option of trading away some relevance for speed, or, in the case of KS, distributing the index over multiple machines and aggregating search results. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From manoel at lemos.net Mon Jan 22 12:37:49 2007 From: manoel at lemos.net (Manoel Lemos) Date: Mon, 22 Jan 2007 18:37:49 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> Message-ID: <31e833bc1a24552bb87862c5fe6b53da@ruby-forum.com> Jens, The content of my app/index/production/blog directory is: (just after the exception on Blog.rebuild_index) [92140-AA:~/web/labs/blogblogs/trunk/index/production/blog] pocscom$ ls -al > index.txt [92140-AA:~/web/labs/blogblogs/trunk/index/production/blog] pocscom$ more index.txt total 53451 drwxr-xr-x 2 pocscom pocscom 40 Jan 22 15:32 ./ drwxr-xr-x 3 pocscom pocscom 3 Jan 22 09:00 ../ -rw------- 1 pocscom pocscom 6.3M Jan 22 10:18 _1pp.cfs -rw------- 1 pocscom pocscom 6.2M Jan 22 11:00 _2kk.cfs -rw------- 1 pocscom pocscom 6.9M Jan 22 11:44 _3ff.cfs -rw------- 1 pocscom pocscom 143K Jan 22 11:48 _3ii.cfs -rw------- 1 pocscom pocscom 260K Jan 22 11:51 _3ll.cfs -rw------- 1 pocscom pocscom 893K Jan 22 11:56 _3oo.cfs -rw------- 1 pocscom pocscom 82K Jan 22 11:56 _3oz.cfs -rw------- 1 pocscom pocscom 42K Jan 22 11:56 _3pa.cfs -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f0 -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f1 -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f2 -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f3 -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f4 -rw------- 1 pocscom pocscom 1 Jan 22 11:57 _3pe.f5 -rw------- 1 pocscom pocscom 5 Jan 22 11:57 _3pe.frq -rw------- 1 pocscom pocscom 5 Jan 22 11:57 _3pe.prx -rw------- 1 pocscom pocscom 37 Jan 22 11:57 _3pe.tfx -rw------- 1 pocscom pocscom 100 Jan 22 11:57 _3pe.tis -rw------- 1 pocscom pocscom 24 Jan 22 11:57 _3pe.tix -rw------- 1 pocscom pocscom 226 Jan 22 11:57 _3pe.tmp -rw------- 1 pocscom pocscom 5.5K Jan 22 11:57 _3pl.cfs -rw------- 1 pocscom pocscom 44K Jan 22 11:57 _3pw.cfs -rw------- 1 pocscom pocscom 84K Jan 22 11:57 _3q7.cfs -rw------- 1 pocscom pocscom 85K Jan 22 11:57 _3qi.cfs -rw------- 1 pocscom pocscom 1.1K Jan 22 11:57 _3qj.cfs -rw------- 1 pocscom pocscom 2.2K Jan 22 11:57 _3qk.cfs -rw------- 1 pocscom pocscom 767 Jan 22 11:57 _3ql.cfs -rw------- 1 pocscom pocscom 550 Jan 22 11:57 _3qm.cfs -rw------- 1 pocscom pocscom 684 Jan 22 11:57 _3qn.cfs -rw------- 1 pocscom pocscom 949 Jan 22 11:57 _3qo.cfs -rw------- 1 pocscom pocscom 776 Jan 22 11:57 _3qp.cfs -rw------- 1 pocscom pocscom 1.1K Jan 22 11:57 _3qq.cfs -rw------- 1 pocscom pocscom 40K Jan 22 11:57 _3qr.cfs -rw------- 1 pocscom pocscom 4.4M Jan 22 09:38 _uu.cfs -rw------- 1 pocscom pocscom 114 Jan 22 11:57 _uu.del -rw------- 1 pocscom pocscom 79 Jan 22 11:57 fields -rw------- 1 pocscom pocscom 156 Jan 22 11:57 segments -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Mon Jan 22 13:41:00 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Mon, 22 Jan 2007 10:41:00 -0800 Subject: [Ferret-talk] [Ferret] Test failures for ferrettaggedREL-0.10.14 In-Reply-To: <882b5e27d33068fe79023741902d4dfc@ruby-forum.com> References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.come642e87066f3005c51358ef4309214e3@ruby-forum.com> <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> <1169484097-sup-4986@south> <882b5e27d33068fe79023741902d4dfc@ruby-forum.com> Message-ID: <1169490546-sup-4303@south> Excerpts from Saimon Moore's message of Mon Jan 22 09:26:43 -0800 2007: > svn co svn://davebalmain.com/ferret/tags/REL-0.10.14 ferret_0.10.4 > > and Ferret::VERSION was 0.9.6 in that version. > > Is this still not correct? Hm. I'm not sure at this point. There's an svn log message there saying that Dave tagged release 0.10.14, but the code sure looks like 0.9.6. The code in the svn repository I pointed to looks much more recent. I could be completely confused though. -- William From julioody at gmail.com Mon Jan 22 17:34:09 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 23 Jan 2007 09:34:09 +1100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <20070122134252.GK29989@cordoba.webit.de> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> Message-ID: On 1/23/07, Jens Kraemer wrote: > It should not, since Ferret is supposed to have a file system based > locking that manages inter-process synchronisation. > (a bit OT, but since it was mentioned...) As in managing simultaneous writes as well? Reason I'm asking is, I wrote an app a few months ago which is a networked index that is supposed to handle multiple "clients" writing to the index at the same time. What I did was to write a class that queued those requests and dispatched them one at a time, since otherwise, the server would crash because of Ferret locking issues. That was around Ferret 0.9.3 or so. I understand I could flush the index every time I insert something, but that's too much of a cost in terms of performance that I can't afford... -- Julio C. Ody http://rootshell.be/~julioody From nappin713 at yahoo.com Mon Jan 22 18:47:05 2007 From: nappin713 at yahoo.com (Raymond O'connor) Date: Tue, 23 Jan 2007 00:47:05 +0100 Subject: [Ferret-talk] memcache Message-ID: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> Just curious, is there anyway to use memcache with a ferret index? Thanks, Ray -- Posted via http://www.ruby-forum.com/. From erik.eide at gmail.com Mon Jan 22 19:47:21 2007 From: erik.eide at gmail.com (Erik) Date: Tue, 23 Jan 2007 01:47:21 +0100 Subject: [Ferret-talk] making acts_as_ferret thread safe? Message-ID: <85fe47703915fecc03827922e2beb0ec@ruby-forum.com> Hi I get a synchronize error (see below) when I run a lib script with script/runner. The script updates a status field in a model object that is indexed and searchable within the script/server (mongrel) process. $ script/runner -e production 'load "lib/billing/credit_subscribers.rb"' /usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27: IOError (IOError) from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:124:in `initialize' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:122:in `initialize' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:240:in `create_index_instance' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:232:in `ferret_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/instance_methods.rb:88:in `ferret_update' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbacks.rb:344:in `callback' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbacks.rb:341:in `callback' ... 16 levels... from /usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27 from /usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' from script/runner:3 I'm guessing this is a lock contention issue, is it possible to cordinate the locking performed by the plugin ? Erik -- Posted via http://www.ruby-forum.com/. From ej.finneran at gmail.com Mon Jan 22 23:15:08 2007 From: ej.finneran at gmail.com (EJ Finneran) Date: Tue, 23 Jan 2007 05:15:08 +0100 Subject: [Ferret-talk] Exact phrase score Message-ID: <40c330811615f85d3848fc907531fa19@ruby-forum.com> Sorry if this has been beaten to death here but I couldn't find the exact answer I was looking for. In the app I'm writing, we convert the score to a percentage and display it with the search results. The problem is when you search for an exact phrase (for example) and it matches the title of a document exactly, you only get a 17% match. Has anyone seen a way to either curve the scores or make an exact phrase match get a higher score? I've looked over the similarity formula so I'm pretty sure I understand why it happens but I'm just looking for ways to counteract it/make the score make more sense to the user. Thanks, ej.finneran at gmail.com -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Jan 23 04:33:45 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 10:33:45 +0100 Subject: [Ferret-talk] [Ferret] Test failures for ferret taggedREL-0.10.14 In-Reply-To: <882b5e27d33068fe79023741902d4dfc@ruby-forum.com> References: <124c2cf5e1a375b0d8fed1e2a10070f6@ruby-forum.com> <6775b9f061c7e8ef0582ad1d6a74339f@ruby-forum.com> <1169484097-sup-4986@south> <882b5e27d33068fe79023741902d4dfc@ruby-forum.com> Message-ID: <20070123093345.GO29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 06:26:43PM +0100, Saimon Moore wrote: > William Morgan wrote: > > Excerpts from Saimon Moore's message of Mon Jan 22 03:01:05 -0800 2007: > >> NameError: uninitialized constant Ferret::Index::FieldInfos > >> from (irb):7 > >> > >> For some reason it can't find Ferret::Index::FieldInfos. ??? > > > > This, and the Ferret::VERSION thing, are both symptomatic of you having > > packaged 0.9.6 rather than 0.10.14. > > > > The SVN directions on the Ferret page are wrong. You actually need to > > do > > > > svn co svn://www.davebalmain.com/exp ferret > > > > to get the latest version > > Hi William, > > Actually what I downloaded was : > > svn co svn://davebalmain.com/ferret/tags/REL-0.10.14 ferret_0.10.4 > > and Ferret::VERSION was 0.9.6 in that version. > > Is this still not correct? it should state the correct versipn, at least my 0.10.14 installed via rubygems does: irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'ferret' => true irb(main):003:0> Ferret::VERSION => "0.10.14" irb(main):004:0> Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 23 04:50:08 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 10:50:08 +0100 Subject: [Ferret-talk] making acts_as_ferret thread safe? In-Reply-To: <85fe47703915fecc03827922e2beb0ec@ruby-forum.com> References: <85fe47703915fecc03827922e2beb0ec@ruby-forum.com> Message-ID: <20070123095008.GP29989@cordoba.webit.de> On Tue, Jan 23, 2007 at 01:47:21AM +0100, Erik wrote: > Hi > > I get a synchronize error (see below) when I run a lib script with > script/runner. > > The script updates a status field in a model object that is indexed and > searchable within the script/server (mongrel) process. > > $ script/runner -e production 'load "lib/billing/credit_subscribers.rb"' > /usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27: > IOError (IOError) > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:124:in > `initialize' that line does just check for the existence of the segments file inside the index directory, maybe it's just a file system permissions issue? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 23 05:03:05 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 11:03:05 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> Message-ID: <20070123100305.GQ29989@cordoba.webit.de> On Tue, Jan 23, 2007 at 09:34:09AM +1100, Julio Cesar Ody wrote: > On 1/23/07, Jens Kraemer wrote: > > It should not, since Ferret is supposed to have a file system based > > locking that manages inter-process synchronisation. > > > > (a bit OT, but since it was mentioned...) > > As in managing simultaneous writes as well? The locking is supposed to prevent simultaneous writing. Afair Ferret internally waits some time and then retries the write, throwing an error if it still doesn't succeed. > Reason I'm asking is, I wrote an app a few months ago which is a > networked index that is supposed to handle multiple "clients" writing > to the index at the same time. What I did was to write a class that > queued those requests and dispatched them one at a time, since > otherwise, the server would crash because of Ferret locking issues. > That was around Ferret 0.9.3 or so. I'd still go this route to make sure the index stays sane, especially with a heavily loaded app. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 23 05:06:22 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 11:06:22 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> Message-ID: <20070123100622.GR29989@cordoba.webit.de> On Mon, Jan 22, 2007 at 06:22:27PM +0100, Manoel Lemos wrote: > Jens, > > In fact, I'm indexing around 150K blogs, my app is a Blog/Posts indexing > service, just like Technorati, but focused on the Brazilian blogosphere. > > Same error, even with only the console running Blog.rebuild_index, see: I really can't imagine why this should happen with only one process accessing the index. do you have the possiblity to try this out on some other platform (i.e., Linux)? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From manoel at lemos.net Tue Jan 23 05:59:34 2007 From: manoel at lemos.net (Manoel Lemos) Date: Tue, 23 Jan 2007 11:59:34 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <20070123100305.GQ29989@cordoba.webit.de> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> <20070123100305.GQ29989@cordoba.webit.de> Message-ID: <71ad3c8e989a5aeb34a3b3fb1209478f@ruby-forum.com> Jens, Any idea on my issue? I still cannot complete the indexing rebuild. All the times I try it I got the same error (but in different files). Now I'm totally sure that only a unique process (console) is running. []s Manoel -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Tue Jan 23 06:33:32 2007 From: manoel at lemos.net (Manoel Lemos) Date: Tue, 23 Jan 2007 12:33:32 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: <71ad3c8e989a5aeb34a3b3fb1209478f@ruby-forum.com> References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> <20070123100305.GQ29989@cordoba.webit.de> <71ad3c8e989a5aeb34a3b3fb1209478f@ruby-forum.com> Message-ID: Jens, I think I found it (dumb), hehe. I just saw that I exceeded my disk quota. I cleared a few Gigs and I'm waiting the indexing. Lets see... []s Manoel -- Posted via http://www.ruby-forum.com/. From dan_at_works at yahoo.com Tue Jan 23 09:55:06 2007 From: dan_at_works at yahoo.com (ngoc) Date: Tue, 23 Jan 2007 15:55:06 +0100 Subject: [Ferret-talk] Someone getting RDig work for Linux? Message-ID: <5d8a8ad2777d786af8c8fe0695f7004f@ruby-forum.com> I got this root at linux:~# rdig -c configfile RDig version 0.3.4 using Ferret 0.10.14 added url file:///home/myaccount/documents/ waiting for threads to finish... root at linux:~# rdig -c configfile -q "Ruby" RDig version 0.3.4 using Ferret 0.10.14 executing query >Ruby< Query: total results: 0 root at linux:~# my configfile I changed from config to cfg, because of maybe mistyping cfg.index.create = false RDig.configuration do |cfg| ################################################################## # options you really should set # provide one or more URLs for the crawler to start from cfg.crawler.start_urls = [ 'http://www.example.com/' ] # use something like this for crawling a file system: cfg.crawler.start_urls = [ 'file:///home/myaccount/documents/' ] # beware, mixing file and http crawling is not possible and might result in # unpredictable results. # limit the crawl to these hosts. The crawler will never # follow any links pointing to hosts other than those given here. # ignored for file system crawling cfg.crawler.include_hosts = [ 'www.example.com' ] # this is the path where the index will be stored # caution, existing contents of this directory will be deleted! cfg.index.path = '/home/myaccount/index' ################################################################## # options you might want to set, the given values are the defaults # set to true to get stack traces on errors cfg.verbose = true # content extraction options cfg.content_extraction = OpenStruct.new( # HPRICOT configuration # this is the html parser used by default from RDig 0.3.3 upwards. # Hpricot by far outperforms Rubyful Soup, and is at least as flexible when # it comes to selection of portions of the html documents. :hpricot => OpenStruct.new( # css selector for the element containing the page title :title_tag_selector => 'title', # might also be a proc returning either an element or a string: # :title_tag_selector => lambda { |hpricot_doc| ... } :content_tag_selector => 'body' # might also be a proc returning either an element or a string: # :content_tag_selector => lambda { |hpricot_doc| ... } ) # RUBYFUL SOUP # This is a powerful, but somewhat slow, ruby-only html parsing lib which was # RDig's default html parser up to version 0.3.2. To use it, comment the # hpricot config above, and uncomment the following: # # :rubyful_soup => OpenStruct.new( # # provide a method that returns the title of an html document # # this method may either return a tag to extract the title from, # # or a ready-to-index string. # :content_tag_selector => lambda { |tagsoup| # tagsoup.html.body # }, # # provide a method that selects the tag containing the page content you # # want to index. Useful to avoid indexing common elements like navigation # # and page footers for every page. # :title_tag_selector => lambda { |tagsoup| # tagsoup.html.head.title # } # ) ) # crawler options # Notice: for file system crawling the include/exclude_document patterns are # applied to the full path of _files_ only (like /home/bob/test.pdf), # for http to full URIs (like http://example.com/index.html). # nil (include all documents) or an array of Regexps # matching the URLs you want to index. cfg.crawler.include_documents = nil # nil (no documents excluded) or an array of Regexps # matching URLs not to index. # this filter is used after the one above, so you only need # to exclude documents here that aren't wanted but would be # included by the inclusion patterns. # cfg.crawler.exclude_documents = nil # number of document fetching threads to use. Should be raised only if # your CPU has idle time when indexing. # cfg.crawler.num_threads = 2 # suggested setting for file system crawling: cfg.crawler.num_threads = 1 # maximum number of http redirections to follow # cfg.crawler.max_redirects = 5 # number of seconds to wait with an empty url queue before # finishing the crawl. Set to a higher number when experiencing incomplete # crawls on slow sites. Don't set to 0, even when crawling a local fs. cfg.crawler.wait_before_leave = 10 # indexer options # create a new index on each run. Will append to the index if false. Use when # building a single index from multiple runs, e.g. one across a website and the # other a tree in a local file system cfg.index.create = false # rewrite document uris before indexing them. This is useful if you're # indexing on disk, but the documents should be accessible via http, e.g. from # a web based search application. By default, no rewriting takes place. # example: # cfg.index.rewrite_uri = lambda { |uri| # uri.path.gsub!(/^\/base\//, '/virtual_dir/') # uri.scheme = 'http' # uri.host = 'www.mydomain.com' # } end -- Posted via http://www.ruby-forum.com/. From pritchie at videotron.ca Tue Jan 23 09:10:28 2007 From: pritchie at videotron.ca (Patrick Ritchie) Date: Tue, 23 Jan 2007 09:10:28 -0500 Subject: [Ferret-talk] memcache In-Reply-To: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> Message-ID: <45B61754.8090006@videotron.ca> Bump. > Just curious, is there anyway to use memcache with a ferret index? > > Has anyone tried this out? Cheers! Patrick From kraemer at webit.de Tue Jan 23 10:38:13 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 16:38:13 +0100 Subject: [Ferret-talk] memcache In-Reply-To: <45B61754.8090006@videotron.ca> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> Message-ID: <20070123153813.GA30792@cordoba.webit.de> On Tue, Jan 23, 2007 at 09:10:28AM -0500, Patrick Ritchie wrote: > Bump. > > Just curious, is there anyway to use memcache with a ferret index? > > > > > Has anyone tried this out? What exactly do you want to use memcached for in the context of ferret? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ross.singer at library.gatech.edu Tue Jan 23 10:43:14 2007 From: ross.singer at library.gatech.edu (Ross Singer) Date: Tue, 23 Jan 2007 10:43:14 -0500 Subject: [Ferret-talk] memcache In-Reply-To: <20070123153813.GA30792@cordoba.webit.de> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> <20070123153813.GA30792@cordoba.webit.de> Message-ID: <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> My guess would be as the place to store the index. -Ross. On 1/23/07, Jens Kraemer wrote: > On Tue, Jan 23, 2007 at 09:10:28AM -0500, Patrick Ritchie wrote: > > Bump. > > > Just curious, is there anyway to use memcache with a ferret index? > > > > > > > > Has anyone tried this out? > > What exactly do you want to use memcached for in the context of ferret? > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From kraemer at webit.de Tue Jan 23 11:15:15 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 17:15:15 +0100 Subject: [Ferret-talk] memcache In-Reply-To: <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> <20070123153813.GA30792@cordoba.webit.de> <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> Message-ID: <20070123161515.GB30792@cordoba.webit.de> On Tue, Jan 23, 2007 at 10:43:14AM -0500, Ross Singer wrote: > My guess would be as the place to store the index. this isn't possible out of the box. But Ferret has an abstraction layer (Ferret::Store::Directory) that allows for in-memory or on-disk storage of an index, so it might be possible to write a memcached storage backend as well. Jens > On 1/23/07, Jens Kraemer wrote: > > On Tue, Jan 23, 2007 at 09:10:28AM -0500, Patrick Ritchie wrote: > > > Bump. > > > > Just curious, is there anyway to use memcache with a ferret index? > > > > > > > > > > > Has anyone tried this out? > > > > What exactly do you want to use memcached for in the context of ferret? > > -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From pritchie at videotron.ca Tue Jan 23 11:22:13 2007 From: pritchie at videotron.ca (Patrick Ritchie) Date: Tue, 23 Jan 2007 11:22:13 -0500 Subject: [Ferret-talk] memcache In-Reply-To: <20070123161515.GB30792@cordoba.webit.de> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> <20070123153813.GA30792@cordoba.webit.de> <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> <20070123161515.GB30792@cordoba.webit.de> Message-ID: <45B63635.1080907@videotron.ca> Hi, Has anyone tried putting the Ferret index on a RAMdisk or some such to speed things up? > On Tue, Jan 23, 2007 at 10:43:14AM -0500, Ross Singer wrote: > >> My guess would be as the place to store the index. >> > > this isn't possible out of the box. > > But Ferret has an abstraction layer (Ferret::Store::Directory) that > allows for in-memory or on-disk storage of an index, so it might be > possible to write a memcached storage backend as well. > > Jens > > >> On 1/23/07, Jens Kraemer wrote: >> >>> On Tue, Jan 23, 2007 at 09:10:28AM -0500, Patrick Ritchie wrote: >>> >>>> Bump. >>>> >>>>> Just curious, is there anyway to use memcache with a ferret index? >>>>> >>>>> >>>>> >>>> Has anyone tried this out? >>>> >>> What exactly do you want to use memcached for in the context of ferret? >>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070123/928cf25a/attachment-0001.html From kraemer at webit.de Tue Jan 23 11:43:48 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 17:43:48 +0100 Subject: [Ferret-talk] Someone getting RDig work for Linux? In-Reply-To: <5d8a8ad2777d786af8c8fe0695f7004f@ruby-forum.com> References: <5d8a8ad2777d786af8c8fe0695f7004f@ruby-forum.com> Message-ID: <20070123164348.GC30792@cordoba.webit.de> On Tue, Jan 23, 2007 at 03:55:06PM +0100, ngoc wrote: > I got this > > root at linux:~# rdig -c configfile > RDig version 0.3.4 > using Ferret 0.10.14 > added url file:///home/myaccount/documents/ > waiting for threads to finish... > root at linux:~# rdig -c configfile -q "Ruby" > RDig version 0.3.4 > using Ferret 0.10.14 > executing query >Ruby< > Query: > total results: 0 > root at linux:~# strange. I cut'n'pasted your config and only changed the start_urls and index location, and it worked like a charm. what is in the documents directory - only files, or subdirectories, any strange file names (spaces and such)? There's a known bug concerning spaces in file/directory names, maybe that's the problem? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Jan 23 11:49:22 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 Jan 2007 17:49:22 +0100 Subject: [Ferret-talk] memcache In-Reply-To: <45B63635.1080907@videotron.ca> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> <20070123153813.GA30792@cordoba.webit.de> <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> <20070123161515.GB30792@cordoba.webit.de> <45B63635.1080907@videotron.ca> Message-ID: <20070123164922.GD30792@cordoba.webit.de> On Tue, Jan 23, 2007 at 11:22:13AM -0500, Patrick Ritchie wrote: > Hi, > > Has anyone tried putting the Ferret index on a RAMdisk or some such to > speed things up? Have you tried using RAMDirectory for in-memory storage of the index? There's even the possiblity to clone a persistent index into a RAM-based one for faster access. However I can't imagine you'll get a huge speed increase - I'd guess with a modern operating system even a persistent index will end up completely buffered in RAM after some usage. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From pritchie at videotron.ca Tue Jan 23 12:06:21 2007 From: pritchie at videotron.ca (Patrick Ritchie) Date: Tue, 23 Jan 2007 12:06:21 -0500 Subject: [Ferret-talk] memcache In-Reply-To: <20070123164922.GD30792@cordoba.webit.de> References: <4a84b60e24dd8ebfb59d1570cf85a60c@ruby-forum.com> <45B61754.8090006@videotron.ca> <20070123153813.GA30792@cordoba.webit.de> <23b83f160701230743x5512977as4dce36c27438c6fc@mail.gmail.com> <20070123161515.GB30792@cordoba.webit.de> <45B63635.1080907@videotron.ca> <20070123164922.GD30792@cordoba.webit.de> Message-ID: <45B6408D.4040907@videotron.ca> Jens Kraemer wrote: > On Tue, Jan 23, 2007 at 11:22:13AM -0500, Patrick Ritchie wrote: > >> Hi, >> >> Has anyone tried putting the Ferret index on a RAMdisk or some such to >> speed things up? >> > > Have you tried using RAMDirectory for in-memory storage of the index? > > There's even the possiblity to clone a persistent index into a RAM-based > one for faster access. However I can't imagine you'll get a huge speed > increase - I'd guess with a modern operating system even a persistent > index will end up completely buffered in RAM after some usage. > Yes, but if it's really big parts of it may be unbuffered if the machine is under high load... Would be nice to have a way to ensure that the entire index is always in memory. Thanks for the suggestions. Cheers! Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070123/728e626e/attachment.html From dan_at_works at yahoo.com Tue Jan 23 12:48:03 2007 From: dan_at_works at yahoo.com (ngoc) Date: Tue, 23 Jan 2007 18:48:03 +0100 Subject: [Ferret-talk] Someone getting RDig work for Linux? In-Reply-To: <20070123164348.GC30792@cordoba.webit.de> References: <5d8a8ad2777d786af8c8fe0695f7004f@ruby-forum.com> <20070123164348.GC30792@cordoba.webit.de> Message-ID: <56d630397d9e99b124b6170fe5d79d78@ruby-forum.com> > and such)? There's a known bug concerning spaces in file/directory > names, maybe that's the problem? Hi Jens I stored only one file in the catalogue. And it has space in file name without ending. So I correct it with connected name and ending html -> It works. I recognise that I need to work more with it before taking in use. It is so linux oriented. Now I have to read line by line to learn more how it works inside. It will take long time. Thanks Jens ngoc -- Posted via http://www.ruby-forum.com/. From erik.eide at gmail.com Tue Jan 23 17:14:50 2007 From: erik.eide at gmail.com (Erik) Date: Tue, 23 Jan 2007 23:14:50 +0100 Subject: [Ferret-talk] making acts_as_ferret thread safe? In-Reply-To: <20070123095008.GP29989@cordoba.webit.de> References: <85fe47703915fecc03827922e2beb0ec@ruby-forum.com> <20070123095008.GP29989@cordoba.webit.de> Message-ID: <7bb0a30d04fd9f32c0e73991de5e382e@ruby-forum.com> Jens Kraemer wrote: > that line does just check for the existence of the segments file inside > the index directory, maybe it's just a file system permissions issue? > > Jens > That was the problem, thank you! -- Posted via http://www.ruby-forum.com/. From andrew at shavers.co.uk Tue Jan 23 22:13:11 2007 From: andrew at shavers.co.uk (Andrew Edwards) Date: Wed, 24 Jan 2007 04:13:11 +0100 Subject: [Ferret-talk] Disable search match on model id Message-ID: <7d17dd5e954816e70f438b434f7618ec@ruby-forum.com> Hi, Given a numerical search term I want to avoid matching the model id as this has no real world significance when returned in the results (in this instance). For example the user may enter '13' when looking for a product code. Presently they will also get back the product that has id 13. I have tried: acts_as_ferret :fields => {:id => { :index => :no }, :description => {}, :manufacturer_code => {}} However this fails with "You can't store the term vectors of an unindexed field". As the id is likely a special case do I need to instead limit the search somehow when calling the find_by_contents method? Presently I simply have: @products = Product.find_by_contents(@phrase, :num_docs => 200) Any advice much appreciated. Andrew. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 24 04:07:28 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 24 Jan 2007 10:07:28 +0100 Subject: [Ferret-talk] Someone getting RDig work for Linux? In-Reply-To: <56d630397d9e99b124b6170fe5d79d78@ruby-forum.com> References: <5d8a8ad2777d786af8c8fe0695f7004f@ruby-forum.com> <20070123164348.GC30792@cordoba.webit.de> <56d630397d9e99b124b6170fe5d79d78@ruby-forum.com> Message-ID: <20070124090728.GG30792@cordoba.webit.de> Hi! On Tue, Jan 23, 2007 at 06:48:03PM +0100, ngoc wrote: > > and such)? There's a known bug concerning spaces in file/directory > > names, maybe that's the problem? > Hi Jens > I stored only one file in the catalogue. And it has space in file name > without ending. So I correct it with connected name and ending html -> > It works. ah ok. The filename ending is needed, since there is no other (easy) way to get an idea what kind of content extractor to use. On *nix systems the 'file' command might be of use here, but that would even more tie RDig to Linux and friends... > I recognise that I need to work more with it before taking in use. It is > so linux oriented. Now I have to read line by line to learn more how it > works inside. It will take long time. sorry for the inconvenience, but I only rarely get to use something else than Linux - however I'll happily apply any fixes to make RDig work on windows. However I'll fix the problem with spaces in filenames by the end of the week. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed Jan 24 04:22:00 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 24 Jan 2007 10:22:00 +0100 Subject: [Ferret-talk] Disable search match on model id In-Reply-To: <7d17dd5e954816e70f438b434f7618ec@ruby-forum.com> References: <7d17dd5e954816e70f438b434f7618ec@ruby-forum.com> Message-ID: <20070124092200.GH30792@cordoba.webit.de> On Wed, Jan 24, 2007 at 04:13:11AM +0100, Andrew Edwards wrote: > Hi, > > Given a numerical search term I want to avoid matching the model id as > this has no real world significance when returned in the results (in > this instance). > > For example the user may enter '13' when looking for a product code. > Presently they will also get back the product that has id 13. what version of aaf do you use? the id is not supposed to be included in queries, however it might be that this is the case in older versions. [..] > acts_as_ferret :fields => {:id => { :index => :no }, :description => {}, > :manufacturer_code => {}} > > However this fails with "You can't store the term vectors of an > unindexed field". I don't know if not indexing the id would be a good idea, since it is used as a key field and needed for updates (delete old document by id, add new one). Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From michael at mahemoff.com Wed Jan 24 06:05:24 2007 From: michael at mahemoff.com (Michael Mahemoff) Date: Wed, 24 Jan 2007 12:05:24 +0100 Subject: [Ferret-talk] [ActsAsFerret] Index Directory Disappears and Not Re-cre In-Reply-To: <20070119105119.GZ11020@cordoba.webit.de> References: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> <20070119105119.GZ11020@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > On Thu, Jan 18, 2007 at 01:05:47PM +0100, Michael Mahemoff wrote: >> Hi, >> >> This is a recurring issue for me - the index directory on my production >> server and everything below it occasionally disappears and isn't >> reconstructed. I tried manually creating the entire index path manually >> before starting the server, but it still happened while the server is >> running. >> >> I don't know what's causing the index to disappear and I'm also not sure >> why it's not automagically re-created in any event? > > I suspect you use capistrano for your deployment - is it possible that > on each deploy your index gets lost because it is located inside the > releases/.../ subdirectory? Thanks for the reply and you guessed correctly. It's running capistrano and maybe that explains why it happens periodically. > for the index not being recreated problem - maybe it's just a permission > issue? I'm not sure about that as the log messages don't seem to indicate it and the server user is the same as the user that created the release structure. I do have the index checked in to the repository though (as a hack) and maybe the attributes on it are wrong (e.g. the dir's not executable). Thanks, Michael -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 24 07:38:42 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 24 Jan 2007 13:38:42 +0100 Subject: [Ferret-talk] [ActsAsFerret] Index Directory Disappears and Not Re-cre In-Reply-To: References: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> <20070119105119.GZ11020@cordoba.webit.de> Message-ID: <20070124123842.GK30792@cordoba.webit.de> On Wed, Jan 24, 2007 at 12:05:24PM +0100, Michael Mahemoff wrote: > Jens Kraemer wrote: > > On Thu, Jan 18, 2007 at 01:05:47PM +0100, Michael Mahemoff wrote: > >> Hi, > >> > >> This is a recurring issue for me - the index directory on my production > >> server and everything below it occasionally disappears and isn't > >> reconstructed. I tried manually creating the entire index path manually > >> before starting the server, but it still happened while the server is > >> running. > >> > >> I don't know what's causing the index to disappear and I'm also not sure > >> why it's not automagically re-created in any event? > > > > I suspect you use capistrano for your deployment - is it possible that > > on each deploy your index gets lost because it is located inside the > > releases/.../ subdirectory? > > Thanks for the reply and you guessed correctly. It's running capistrano > and maybe that explains why it happens periodically. > > > for the index not being recreated problem - maybe it's just a permission > > issue? > > I'm not sure about that as the log messages don't seem to indicate it > and the server user is the same as the user that created the release > structure. I do have the index checked in to the repository though (as a > hack) and maybe the attributes on it are wrong (e.g. the dir's not > executable). I usually symlink the RAILS_ROOT/index/ directory to shared/index in an after_update_code recipe. That way the index physically stays the same from deployment to deployment. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From evtroost at vub.ac.be Wed Jan 24 07:42:37 2007 From: evtroost at vub.ac.be (Ewout) Date: Wed, 24 Jan 2007 13:42:37 +0100 Subject: [Ferret-talk] [ActsAsFerret] Index Directory Disappears and Not Re-cre In-Reply-To: References: <0bc8c0b71921d210a8810bb7fcb4e24f@ruby-forum.com> <20070119105119.GZ11020@cordoba.webit.de> Message-ID: <20070124124237.443301342@localhost> Checking in the index is a very bad idea. Instead, the index should be in the shared directory of your deployment, and releases should symlink to this shared index. In deploy.rb, this would look like this: <<-DESC Create a shared index dir. All deployed versions will share the same index, as they share the same database. When upgrading ferret, this index might have to be rebuilt. DESC task :create_index_dir do run <<-CMD mkdir -p -m 777 #{shared_path}/index CMD end <<-DESC Create a symlink from the current release to the shared index. DESC task :create_index_symlink do run <<-CMD ln -fs #{shared_path}/index/ #{current_release}/index CMD end # Hooks task :after_setup do create_index_dir end task :after_symlink do create_index_symlink end > >> for the index not being recreated problem - maybe it's just a permission >> issue? > >I'm not sure about that as the log messages don't seem to indicate it >and the server user is the same as the user that created the release >structure. I do have the index checked in to the repository though (as a >hack) and maybe the attributes on it are wrong (e.g. the dir's not >executable). > >Thanks, >Michael > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk From andrew at shavers.co.uk Wed Jan 24 07:59:03 2007 From: andrew at shavers.co.uk (Andrew Edwards) Date: Wed, 24 Jan 2007 13:59:03 +0100 Subject: [Ferret-talk] Disable search match on model id In-Reply-To: <20070124092200.GH30792@cordoba.webit.de> References: <7d17dd5e954816e70f438b434f7618ec@ruby-forum.com> <20070124092200.GH30792@cordoba.webit.de> Message-ID: <78c9eeb931082c653f8760c072b3efd6@ruby-forum.com> Thanks, I have just updated the project plugin to the trunk version. It now works as described. I think I might of had some confusion over project plugin and system gem versions. Either way it is now resolved. Thanks again. Andrew. -- Posted via http://www.ruby-forum.com/. From manoel at lemos.net Wed Jan 24 11:15:40 2007 From: manoel at lemos.net (Manoel Lemos) Date: Wed, 24 Jan 2007 17:15:40 +0100 Subject: [Ferret-talk] [ActsAsFerret] OpenSolaris (TextDrive) indexing issues In-Reply-To: References: <671fa6e17d559492d77e98ab5d88314e@ruby-forum.com> <20070122085827.GB29989@cordoba.webit.de> <1e8dc7d9bbc02479e784d8ee99a92eb9@ruby-forum.com> <97a8f2d2b37cb8994804e4b7f9a9d33e@ruby-forum.com> <20070122134252.GK29989@cordoba.webit.de> <20070123100305.GQ29989@cordoba.webit.de> <71ad3c8e989a5aeb34a3b3fb1209478f@ruby-forum.com> Message-ID: <8db0beb3a83515274f68c10609fa0ee8@ruby-forum.com> Jens, Seems that the problem was really councurring process building the index at the same time. I was not aware that I had some runner process on the Cron. Now I'm running Blog.rebuild_index really alone, and no failures until now. The crazy thing is, 19 HOURS of CPU already and we are far from ending I think. I don't what a completed index seems to be, but the file names give me an idea of the progress. TOP Result: load averages: 3.25, 4.62, 5.88 14:09:29 73 processes: 71 sleeping, 2 on cpu CPU states: 19.8% idle, 62.1% user, 18.1% kernel, 0.0% iowait, 0.0% swap Memory: 16G real, 2053M free, 7520M swap in use, 21G swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 5737 pocscom 1 59 0 36M 32M sleep 19:38 0.32% runner Current contents of app/index/production/blog: [92140-AA:~/web/labs/blogblogs/trunk/index/production/blog] pocscom$ ls -al total 349436 drwxr-xr-x 2 pocscom pocscom 34 Jan 24 14:10 ./ drwxr-xr-x 3 pocscom pocscom 3 Jan 23 10:40 ../ -rw------- 1 pocscom pocscom 53M Jan 23 17:25 _8km.cfs -rw------- 1 pocscom pocscom 45M Jan 24 00:03 _h59.cfs -rw------- 1 pocscom pocscom 35M Jan 24 06:20 _ppw.cfs -rw------- 1 pocscom pocscom 2.6M Jan 24 06:57 _qkr.cfs -rw------- 1 pocscom pocscom 1000K Jan 24 07:34 _rfm.cfs -rw------- 1 pocscom pocscom 2.2M Jan 24 08:12 _sah.cfs -rw------- 1 pocscom pocscom 4.5M Jan 24 08:50 _t5c.cfs -rw------- 1 pocscom pocscom 4.0M Jan 24 09:32 _u07.cfs -rw------- 1 pocscom pocscom 4.9M Jan 24 10:16 _uv2.cfs -rw------- 1 pocscom pocscom 4.3M Jan 24 11:27 _vpx.cfs -rw------- 1 pocscom pocscom 3.3M Jan 24 12:24 _wks.cfs -rw------- 1 pocscom pocscom 5.2M Jan 24 13:32 _xfn.cfs -rw------- 1 pocscom pocscom 968K Jan 24 13:38 _xiq.cfs -rw------- 1 pocscom pocscom 656K Jan 24 13:45 _xlt.cfs -rw------- 1 pocscom pocscom 226K Jan 24 13:50 _xow.cfs -rw------- 1 pocscom pocscom 655K Jan 24 13:54 _xrz.cfs -rw------- 1 pocscom pocscom 457K Jan 24 13:58 _xv2.cfs -rw------- 1 pocscom pocscom 575K Jan 24 14:03 _xy5.cfs -rw------- 1 pocscom pocscom 459K Jan 24 14:07 _y18.cfs -rw------- 1 pocscom pocscom 82K Jan 24 14:07 _y1j.cfs -rw------- 1 pocscom pocscom 42K Jan 24 14:08 _y1u.cfs -rw------- 1 pocscom pocscom 42K Jan 24 14:08 _y25.cfs -rw------- 1 pocscom pocscom 3.6K Jan 24 14:09 _y2g.cfs -rw------- 1 pocscom pocscom 2.5K Jan 24 14:09 _y2r.cfs -rw------- 1 pocscom pocscom 121K Jan 24 14:10 _y32.cfs -rw------- 1 pocscom pocscom 584 Jan 24 14:10 _y33.cfs -rw------- 1 pocscom pocscom 593 Jan 24 14:10 _y34.cfs -rw------- 1 pocscom pocscom 94 Jan 24 14:10 _y35.fdt -rw------- 1 pocscom pocscom 0 Jan 24 14:10 _y35.fdx -rw------- 1 pocscom pocscom 0 Jan 23 10:40 ferret-write.lck -rw------- 1 pocscom pocscom 79 Jan 24 14:10 fields -rw------- 1 pocscom pocscom 195 Jan 24 14:10 segments -- Posted via http://www.ruby-forum.com/. From chitam at gmail.com Wed Jan 24 14:19:56 2007 From: chitam at gmail.com (donut donut) Date: Wed, 24 Jan 2007 20:19:56 +0100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 Message-ID: Hi, I've just updated rails from 1.1.6 to 1.2.1 and I'm getting the following errors whenever I load a page that uses a class that uses ferret. I have ferret 0.10.13 and acts_as_ferret. They were working fine before the upgrade. # ["/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in `remove_const'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in `remove_constant'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in `remove_unloadable_constants!'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in `remove_unloadable_constants!'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:73:in `clear'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:60:in `reset_application!'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:116:in `reset_after_dispatch'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:51:in `dispatch'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:113:in `handle_dispatch'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:79:in `service'", "/usr/local/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'", "/usr/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'", "/usr/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'", "/usr/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'", "/usr/local/lib/ruby/1.8/webrick/server.rb:95:in `start'", "/usr/local/lib/ruby/1.8/webrick/server.rb:92:in `start'", "/usr/local/lib/ruby/1.8/webrick/server.rb:23:in `start'", "/usr/local/lib/ruby/1.8/webrick/server.rb:82:in `start'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:63:in `dispatch'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/commands/servers/webrick.rb:59", "/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in `require'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:343:in `new_constants_in'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in `require'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/commands/server.rb:39", "/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in `require'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:343:in `new_constants_in'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in `require'", "script/server:3"] [2007-01-23 11:50:29] ERROR `/market/list' not found. 192.168.1.102 - - [23/Jan/2007:11:50:29 PST] "GET /market/list HTTP/1.1" 404 283 - -> /market/list Have you seen this? Thanks. -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Wed Jan 24 18:39:45 2007 From: john at johnleach.co.uk (John Leach) Date: Wed, 24 Jan 2007 23:39:45 +0000 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: References: Message-ID: <1169681985.11386.32.camel@localhost.localdomain> Hi, In case it's any use, I moved a Rails 1.1.6 site to 1.2.1 the other day with Ferret 0.10.14 and it's working fine, though it doesn't use acts_as_ferret. The dependencies system has changed in Rails 1.2, perhaps acts_as_ferret is tickled by this. You might want to try including the Ferret libraries manually, in config/environment.rb. http://weblog.rubyonrails.org/2006/8/11/reloading-revamped John. -- http://johnleach.co.uk On Wed, 2007-01-24 at 20:19 +0100, donut donut wrote: > Hi, I've just updated rails from 1.1.6 to 1.2.1 and I'm getting the > following errors whenever I load a page that uses a class that uses > ferret. I have ferret 0.10.13 and acts_as_ferret. They were working > fine before the upgrade. > > # > ["/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in > `remove_const'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in > `remove_constant'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in > `remove_unloadable_constants!'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in > `remove_unloadable_constants!'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:73:in > `clear'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:60:in > `reset_application!'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:116:in > `reset_after_dispatch'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:51:in > `dispatch'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:113:in > `handle_dispatch'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:79:in > `service'", "/usr/local/lib/ruby/1.8/webrick/httpserver.rb:104:in > `service'", "/usr/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:95:in `start'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:92:in `start'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:23:in `start'", > "/usr/local/lib/ruby/1.8/webrick/server.rb:82:in `start'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/webrick_server.rb:63:in > `dispatch'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/commands/servers/webrick.rb:59", > "/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in > `require'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in > `require'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:343:in > `new_constants_in'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in > `require'", > "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/commands/server.rb:39", > "/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in > `require'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in > `require'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:343:in > `new_constants_in'", > "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:496:in > `require'", "script/server:3"] > [2007-01-23 11:50:29] ERROR `/market/list' not found. > 192.168.1.102 - - [23/Jan/2007:11:50:29 PST] "GET /market/list HTTP/1.1" > 404 283 > - -> /market/list > > Have you seen this? Thanks. > From sjoonk at gmail.com Thu Jan 25 00:42:01 2007 From: sjoonk at gmail.com (sjoonk) Date: Thu, 25 Jan 2007 06:42:01 +0100 Subject: [Ferret-talk] multibyte character corrupt in highlight method Message-ID: <5e7407f9628dd4884b44fe11832cda3a@ruby-forum.com> When I apply highlight() method to the search result, multibyte characters are corrupted. The post_tag is located in the middle of last character, so the last character corrupts. Here is my code. query = "SOME_MULTIBYTE_CHARS" searcher.search_each(query) do |doc_id, score| puts searcher.highlight(query, doc_id, :field => :content) end And this is the result. ... bla, bla, bla, .. SOME_MULTIBYTE_CHARACTERS ... How can I do?? -- Posted via http://www.ruby-forum.com/. From robertonrails at gmail.com Thu Jan 25 18:04:49 2007 From: robertonrails at gmail.com (Robert Dempsey) Date: Fri, 26 Jan 2007 00:04:49 +0100 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: References: <20060503163044.GS29289@cordoba.webit.de> <17ea47bafad9e62b273baa7796003c11@ruby-forum.com> <20061017114316.GL14271@cordoba.webit.de> Message-ID: <32d52b553f1da83d5bdc43dccb2a056b@ruby-forum.com> http://blog.zmok.net/articles/2006/10/18/full-text-search-in-ruby-on-rails-3-ferret enjoy -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jan 26 08:26:01 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 26 Jan 2007 14:26:01 +0100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: <1169681985.11386.32.camel@localhost.localdomain> References: <1169681985.11386.32.camel@localhost.localdomain> Message-ID: <20070126132601.GB26810@cordoba.webit.de> On Wed, Jan 24, 2007 at 11:39:45PM +0000, John Leach wrote: > Hi, > > In case it's any use, I moved a Rails 1.1.6 site to 1.2.1 the other day > with Ferret 0.10.14 and it's working fine, though it doesn't use > acts_as_ferret. > > The dependencies system has changed in Rails 1.2, perhaps acts_as_ferret > is tickled by this. You might want to try including the Ferret > libraries manually, in config/environment.rb. > > http://weblog.rubyonrails.org/2006/8/11/reloading-revamped I have a live application here running aaf trunk, Rails 1.2.1 and Ferret 0.10.14 without problems. I'm not running Mongrel, though. But it really looks like some dependencies issue. can you make sure there is no other version of ferret lying around? It'd be also interesting to know where ferret is installed (systemwide or frozen to your app). Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From py at landanger.fr Fri Jan 26 12:49:13 2007 From: py at landanger.fr (Pierre-Yves Landanger) Date: Fri, 26 Jan 2007 18:49:13 +0100 Subject: [Ferret-talk] How to store Ferret index in DB using acts_as_ferret ? Message-ID: <0b198d3c4cba3958c6d6442b2f847d77@ruby-forum.com> Hello, I would like Ferret index to be stored in DB ( instead of multiple files ) It would allow me to centralize all datas in a unique media + backup my index using a simple DB export. Is there a way to do so ? Thks -- Posted via http://www.ruby-forum.com/. From jonathan.soma at gmail.com Fri Jan 26 12:58:53 2007 From: jonathan.soma at gmail.com (Jonathan Soma) Date: Fri, 26 Jan 2007 18:58:53 +0100 Subject: [Ferret-talk] ferret installation woes Message-ID: <8f5fef938babb06e2dbfa3791909a037@ruby-forum.com> Ferret installation fails for me (ferret: 0.10.14, ruby: 1.8.4, OS: Fedora Core 4) in the following way: >ruby setup.rb setup ... In file included from r_qparser.c:2: search.h:714: field `comparables' has incomplete type make: *** [r_qparser.o] Error 1 The C extensions were not installed. But don't worry. Everything should work fine .... >ruby setup.rb install ... ---> ext no ruby extention exists: 'ruby setup.rb setup' first ... Any ideas? Installing via 'gem install ferret' just gives the 'comparables' error, I figured this would be more informative. -- Posted via http://www.ruby-forum.com/. From jonathan.soma at gmail.com Fri Jan 26 13:33:19 2007 From: jonathan.soma at gmail.com (Jonathan Soma) Date: Fri, 26 Jan 2007 19:33:19 +0100 Subject: [Ferret-talk] ferret installation woes In-Reply-To: <8f5fef938babb06e2dbfa3791909a037@ruby-forum.com> References: <8f5fef938babb06e2dbfa3791909a037@ruby-forum.com> Message-ID: <5194be94bc707c5ac5ad5b5cffb98761@ruby-forum.com> Just in case someone else runs into this: what ended up working was changing comparables[] on that line to comparables[1], and positions[] to positions[1] in some other file. -- Posted via http://www.ruby-forum.com/. From andreas.korth at gmx.net Fri Jan 26 15:22:54 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Fri, 26 Jan 2007 21:22:54 +0100 Subject: [Ferret-talk] How to store Ferret index in DB using acts_as_ferret ? In-Reply-To: <0b198d3c4cba3958c6d6442b2f847d77@ruby-forum.com> References: <0b198d3c4cba3958c6d6442b2f847d77@ruby-forum.com> Message-ID: <35281415-E5E4-4772-8959-FCB18AE39763@gmx.net> On 26.01.2007, at 18:49, Pierre-Yves Landanger wrote: > I would like Ferret index to be stored in DB ( instead of multiple > files ) Ferret is optimized for speed and therefore uses it's own file format. A database is not a suitable place to store a full text index, mainly for performance reasons. > It would allow me to centralize all datas in a unique media + > backup my > index using a simple DB export. I assume you use Ferret to index data which is stored in your database anyway. It's of little use to backup your Ferret index since it can be rebuilt from the contents of your database. If you need to restore your database from a backup just rebuild your index afterwards. Cheers, Andy From john at johnleach.co.uk Sat Jan 27 08:55:29 2007 From: john at johnleach.co.uk (John Leach) Date: Sat, 27 Jan 2007 13:55:29 +0000 Subject: [Ferret-talk] concurrency errors adding to a keyed index Message-ID: <1169906129.30887.33.camel@localhost.localdomain> Hi, I'm adding some news articles to a keyed Ferret 0.10.14 index and encountering quite serious instability when concurrently reading and writing to the index, even though with just 1 writer and 1 reader process. If I recreate the index without a key, concurrent reading and writing seem to work fine (and indexing is about 10 times quicker :) I'm testing by running my indexing script (which retrieves up to 1000 database records using ActiveRecord, adds to the index and exits) and concurrently manually re-running a search on the index using my Rails web interface. This is in a dev environment with only 1 user (me) and about 58000 docs. The error I get is along the lines of the following, with a different filename each time: IO Error occured at :79 in xraise Error occured in fs_store.c:324 - fs_open_input couldn'ferret_index/development/news_article_versions/_2ih.tix: /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:682:in `initialize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:682:in `ensure_reader_open' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:385:in `[]' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:384:in `[]' #{RAILS_ROOT}/app/models/news_article_version.rb:35:in `ferret_search' #{RAILS_ROOT}/app/models/news_article_version.rb:35:in `ferret_search' #{RAILS_ROOT}/app/controllers/news_articles_controller.rb:56:in `search' It seems to occur roughly once per batch, and usually towards the end of the batch. I'm not using aaf. I create my keyed index like this: @@ferret_index = Index::Index.new(:path => "#{RAILS_ROOT}/ferret_index/#{RAILS_ENV}/news_article_versions", :field_infos => field_infos, :id_field => :id, :key => :id, :default_input_field => :text) Unkeyed, I just drop the :key option (duh). :id is just the ActiveRecord id, from an auto_increment field in MySQL. As a note, when concurrently searching on the keyed index, the number of hits returned increases throughout the indexing process. With a non-keyed index, the number of hits doesn't increase until the end. It looks to me that when using a keyed index, Ferret commits each record added. When non-keyed, it commits when the Index is closed. That I don't get the error with non-keyed might just be because there are less commits, so less opportunities for the "bug" to trigger. Is this is bug I've come across? Is concurrent reading/writing like this expected to work? I'm using Ferret 0.10.14 on Ubuntu Edgy, with "ruby 1.8.4 (2005-12-24) [i486-linux]" and "gcc version 4.1.2 20060928" Thanks in advance! John -- http://johnleach.co.uk From kraemer at webit.de Sat Jan 27 09:45:58 2007 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 27 Jan 2007 15:45:58 +0100 Subject: [Ferret-talk] ferret installation woes In-Reply-To: <5194be94bc707c5ac5ad5b5cffb98761@ruby-forum.com> References: <8f5fef938babb06e2dbfa3791909a037@ruby-forum.com> <5194be94bc707c5ac5ad5b5cffb98761@ruby-forum.com> Message-ID: <20070127144558.GA7775@cordoba.webit.de> On Fri, Jan 26, 2007 at 07:33:19PM +0100, Jonathan Soma wrote: > Just in case someone else runs into this: what ended up working was > changing comparables[] on that line to comparables[1], and positions[] > to positions[1] in some other file. strange, what kind of compiler do you have there? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From john at johnleach.co.uk Sat Jan 27 20:56:42 2007 From: john at johnleach.co.uk (John Leach) Date: Sun, 28 Jan 2007 01:56:42 +0000 Subject: [Ferret-talk] test/functional/thread_safety_index_test.rb failures - segfaults too Message-ID: <1169949402.30887.50.camel@localhost.localdomain> Hi, I just checked out ferret 0.10.14 from subversion and am getting intermittent failures from the thread_safety_index_test.rb script. Sometimes it runs through to the end with no errors, but other times I get errors, including segfaults. I reported the bug here: http://ferret.davebalmain.com/trac/ticket/153 Can anyone else reproduce this? svn co svn://davebalmain.com/ferret/tags/REL-0.10.14/ ferret-0.10.14 cd ferret-0.10.14 rake ext ruby test/functional/thread_safety_index_test.rb You'll need to run it a few times though (just the last line). Sometimes it returns successful, sometimes it returns weird and random test errors, and every so often it actually segfaults. Additionally, I encountered problems running "rake ext" on a box that already has Ferret 0.10.14 installed as a gem. You may need to do this on a box without Ferret already installed (or figure out why it fails to build and fix it :) I'm on Ubuntu Edgy. Thanks, John. -- http://johnleach.co.uk From doug.pfeffer+ror at gmail.com Sun Jan 28 13:08:03 2007 From: doug.pfeffer+ror at gmail.com (Doug Pfeffer) Date: Sun, 28 Jan 2007 19:08:03 +0100 Subject: [Ferret-talk] Is this the best approach? Message-ID: <7da17c43cfd62f9fdef12555c23943a2@ruby-forum.com> Hello, I'm working on a Rails app with a fairly complex set of model relationships. I'm abandoning my MySQL based search mechanism for Ferret/acts_as_ferret in the hopes of being able to more easily and effectivly enable searching. Before I get coding I was wondering if you folks might be able to tell me if my plan makes sense, or what could be a better way. The situation is as follows: Users of the app all "own" a variety of data. Most of the time only the creator of an object has access to it, but there are also scenarios when multiple users have access to a single object. No object can be manipulated without knowledge of what user is making the request. This makes searching difficult because I've got to filter the results based not just on the search params but by user permissions as well. At this point I plan on using a single index, with :store_class_name enabled for all models. The tricky part is limiting the search results to only the items the searching user has access to. I think the easiest way to do with would just to store an "owner_id" along with every entry in the index, which would be the User's ID in my database. So for every search I'd include the user's id as a requirement. The downside is that I'll need to have redundant data for multi-owner situations, with one entry for each user that has access to it. This will make managing deletes and edits complicated (and slow?) because the system will need to be removing/creating multiple instances of the same entry in the index. The end result of this approach is a sort of flattened clone of the database, which seems a little awkward, but I'm not sure how else to go about it. I could just do a regular search by content and filter the results after the fact, but that seems kind of crappy too. Hopefully that made sense. Is there a better way to do this? Did I leave out any details? Thanks! Doug -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Sun Jan 28 11:19:05 2007 From: john at johnleach.co.uk (John Leach) Date: Sun, 28 Jan 2007 16:19:05 +0000 Subject: [Ferret-talk] test/functional/thread_safety_index_test.rb failures - segfaults too In-Reply-To: <1169949402.30887.50.camel@localhost.localdomain> References: <1169949402.30887.50.camel@localhost.localdomain> Message-ID: <1170001145.30887.62.camel@localhost.localdomain> Hi, I've tweaked the test script to almost guarantee a segfault now. I've also simplified it to try help rule out a few things. It no longer deletes or optimizes the index: just one thread for adding docs and one thread for searching docs. http://johnleach.co.uk/downloads/ruby/ferret/thread_safety_read_write_test.rb You'll need to drop it in the directory "test/functional/" in an unpacked Ferret source package. John. -- http://johnleach.co.uk On Sun, 2007-01-28 at 01:56 +0000, John Leach wrote: > Hi, > > I just checked out ferret 0.10.14 from subversion and am getting > intermittent failures from the thread_safety_index_test.rb script. > > Sometimes it runs through to the end with no errors, but other times I > get errors, including segfaults. > > I reported the bug here: http://ferret.davebalmain.com/trac/ticket/153 > > Can anyone else reproduce this? > > svn co svn://davebalmain.com/ferret/tags/REL-0.10.14/ ferret-0.10.14 > cd ferret-0.10.14 > rake ext > ruby test/functional/thread_safety_index_test.rb > > You'll need to run it a few times though (just the last line). > Sometimes it returns successful, sometimes it returns weird and random > test errors, and every so often it actually segfaults. > > Additionally, I encountered problems running "rake ext" on a box that > already has Ferret 0.10.14 installed as a gem. You may need to do this > on a box without Ferret already installed (or figure out why it fails to > build and fix it :) > > I'm on Ubuntu Edgy. > > Thanks, > > John. > -- > http://johnleach.co.uk From andreas.korth at gmx.net Sun Jan 28 15:31:41 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Sun, 28 Jan 2007 21:31:41 +0100 Subject: [Ferret-talk] Is this the best approach? In-Reply-To: <7da17c43cfd62f9fdef12555c23943a2@ruby-forum.com> References: <7da17c43cfd62f9fdef12555c23943a2@ruby-forum.com> Message-ID: On 28.01.2007, at 19:08, Doug Pfeffer wrote: > I've got to filter the results based > not just on the search params but by user permissions as well. You can store multiple user ids in a single field (:owners) and limit the search to only those documents that contain the currents user's id in that field. Cheers, Andy From bk at benjaminkrause.com Sun Jan 28 17:32:08 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sun, 28 Jan 2007 23:32:08 +0100 Subject: [Ferret-talk] LazyDocs vs. ActiveRecord Message-ID: Hi .. i've been using lazydocs for quite some time now and it's a real pleasure working with them.. you should definately use them.. http://ferret.davebalmain.com/api/classes/Ferret/Index/LazyDoc.html my implementation: http://blog.omdb-beta.org/2006/12/12/lazydocs-and-database-objects as far as i know, Jens wanted to implement them in acts_as_ferret as well :-) Ben From sjoonk at gmail.com Mon Jan 29 00:08:22 2007 From: sjoonk at gmail.com (sjoonk) Date: Mon, 29 Jan 2007 06:08:22 +0100 Subject: [Ferret-talk] Segmentation fault in Search::Searcher#highlight Message-ID: <8a17d04d0bf869334b35bd0665e72a10@ruby-forum.com> I'm using ferret 0.10.14 in Linux Fedora 3. When I do highlight with Index::Index#highlight, it works well. But, doing the same test with Searcher#highlight, [BUG] Segmentation fault occurred. Here's my test code. require 'rubygems' require 'ferret' include Ferret::Search #searcher = Ferret::Index::Index.new(:path => './index') # works searcher = Searcher.new("./index") # not works! segmentation fault!! query = TermQuery.new(:content, ARGV[0]) searcher.search_each(query) do |doc_id, score| puts "Document #{doc_id} found with a score of #{score}" puts searcher.highlight(query, doc_id, :field => :content) end Do I have some wrong implementation? Help me... -- Posted via http://www.ruby-forum.com/. From maz at rift.fr Mon Jan 29 12:05:04 2007 From: maz at rift.fr (maz) Date: Mon, 29 Jan 2007 18:05:04 +0100 Subject: [Ferret-talk] Segmentation fault in Index::Index#add_document Message-ID: Hello, Here's the code that segfaults: http://pastie.caboo.se/36467 I could have submitted a patch, but I'm not sure whether this segfault is caused by Ferret or Ruby. This seems to be triggered only when combining a split and a gsub on an empty string of the returned array, and trying to insert it directly into the index. However, there's no problem when you duplicate or transform the string. -- maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/. From john at johnleach.co.uk Mon Jan 29 12:55:16 2007 From: john at johnleach.co.uk (John Leach) Date: Mon, 29 Jan 2007 17:55:16 +0000 Subject: [Ferret-talk] Segmentation fault in Index::Index#add_document In-Reply-To: References: Message-ID: <45BE3504.4020602@johnleach.co.uk> Hi, reproduced here on Ubuntu Edgy Intel. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1211037504 (LWP 6193)] 0xb79b878e in mb_std_advance_to_start (ts=0x816ae20) at analysis.c:156 156 { (gdb) bt #0 0xb79b878e in mb_std_advance_to_start (ts=0x816ae20) at analysis.c:156 Have you reported this on the bug tracker? http://ferret.davebalmain.com/trac/report/1 John. -- http://johnleach.co.uk maz wrote: > Hello, > > Here's the code that segfaults: > > http://pastie.caboo.se/36467 > > I could have submitted a patch, but I'm not sure > whether this segfault is caused by Ferret or Ruby. > > This seems to be triggered only when combining > a split and a gsub on an empty string of the returned > array, and trying to insert it directly into the > index. > > However, there's no problem when you duplicate or > transform the string. > > -- > maz > Rift Technologies - http://rift.fr/ > From rohatgi83 at google.com Mon Jan 29 15:28:04 2007 From: rohatgi83 at google.com (amit) Date: Mon, 29 Jan 2007 21:28:04 +0100 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: <2973F4EC-FCB3-41D6-BB18-7B75E89B67CB@patientslikeme.com> References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> <426978784cdd2e7bb1d6e0b31e296a8e@ruby-forum.com> <8dd78e760bd44e18faa8b9d21bb6aa2b@ruby-forum.com> <64AFD52B-3981-4390-920C-93710CAA4070@patientslikeme.com> <71d964d68f25c0e449bb423ebfaccbc3@ruby-forum.com> <2973F4EC-FCB3-41D6-BB18-7B75E89B67CB@patientslikeme.com> Message-ID: Steven Hammond wrote: > Thanks, I'll report back success or failure with this. > > Steve All, Is the hack proposed in the thread above is the only solution ? Or do we have had any public fixes available? Please revert if there are Thanks Amit -- Posted via http://www.ruby-forum.com/. From maz at rift.fr Mon Jan 29 17:25:59 2007 From: maz at rift.fr (maz) Date: Mon, 29 Jan 2007 23:25:59 +0100 Subject: [Ferret-talk] Segmentation fault in Index::Index#add_document In-Reply-To: <45BE3504.4020602@johnleach.co.uk> References: <45BE3504.4020602@johnleach.co.uk> Message-ID: <14f73032fd81a06719253b949f998bdf@ruby-forum.com> John Leach wrote: [...] > Have you reported this on the bug tracker? > > http://ferret.davebalmain.com/trac/report/1 Done, thanks. I still think it's probably a Ruby bug. -- maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/. From py at landanger.fr Tue Jan 30 10:48:14 2007 From: py at landanger.fr (Guest) Date: Tue, 30 Jan 2007 16:48:14 +0100 Subject: [Ferret-talk] How to store Ferret index in DB using acts_as_ferret ? In-Reply-To: <35281415-E5E4-4772-8959-FCB18AE39763@gmx.net> References: <0b198d3c4cba3958c6d6442b2f847d77@ruby-forum.com> <35281415-E5E4-4772-8959-FCB18AE39763@gmx.net> Message-ID: <81f2caeff5858ac26cecfee10509b74e@ruby-forum.com> All right, Thank you Andy ^^ py -- Posted via http://www.ruby-forum.com/. From chitam at gmail.com Tue Jan 30 14:24:45 2007 From: chitam at gmail.com (donut donut) Date: Tue, 30 Jan 2007 20:24:45 +0100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: <20070126132601.GB26810@cordoba.webit.de> References: <1169681985.11386.32.camel@localhost.localdomain> <20070126132601.GB26810@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > On Wed, Jan 24, 2007 at 11:39:45PM +0000, John Leach wrote: >> http://weblog.rubyonrails.org/2006/8/11/reloading-revamped > I have a live application here running aaf trunk, Rails 1.2.1 and Ferret > 0.10.14 without problems. I'm not running Mongrel, though. > > But it really looks like some dependencies issue. can you make sure > there is no other version of ferret lying around? It'd be also > interesting to know where ferret is installed (systemwide or frozen to > your app). > > Jens > Well I finally got around to testing this again after finishing a work item today. I upgraded my ferret gem from 0.10.13 to 0.10.14. I also installed acts_as_ferret 0.3.1 as a gem today(it wasn't available as a gem before). I added require 'acts_as_ferret' in my environment.rb and my project ran in rails 1.1.6 fine. Next I switched to rails 1.2.1(changed RAILS_GEM_VERSION = '1.2.1' in environment.rb) and did a rake rails:update (which didn't do anything related to my problem, I think). Ferret now seems to be working. However, I'm getting another error: # ["/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in `remove_const'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:423:in `remove_constant'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in `remove_unloadable_constants!'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:274:in `remove_unloadable_constants!'", "/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.0/lib/active_support/dependencies.rb:73:in `clear'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:60:in `reset_application!'", "/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.1/lib/dispatcher.rb:116:in `reset_after_dispatch'", blah blah blah HWBColorspace is part of the Rmagick gem. I think rails 1.2.1 breaks a lot of backward compatibility. Now I need to decide if I want to upgrade my Rmagick gem and hope that I won't run into another compatibility problem. May be I should stay with 1.1.6 as 1.2.1 is at least 50% slower than 1.1.6. Thanks. -- Posted via http://www.ruby-forum.com/. From julioody at gmail.com Tue Jan 30 17:26:06 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 31 Jan 2007 09:26:06 +1100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: References: <1169681985.11386.32.camel@localhost.localdomain> <20070126132601.GB26810@cordoba.webit.de> Message-ID: I bit OT I know, but let me ask On 1/31/07, donut donut wrote: > ... May be I should stay with 1.1.6 as 1.2.1 is at > least 50% slower than 1.1.6. Are you sure? The exact figure isn't what concerns me, but what did you notice in terms of performance, and how? Thanks -- Julio C. Ody http://rootshell.be/~julioody From chitam at gmail.com Tue Jan 30 19:33:22 2007 From: chitam at gmail.com (donut donut) Date: Wed, 31 Jan 2007 01:33:22 +0100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: References: <1169681985.11386.32.camel@localhost.localdomain> <20070126132601.GB26810@cordoba.webit.de> Message-ID: > Are you sure? The exact figure isn't what concerns me, but what did > you notice in terms of performance, and how? > > Thanks "Slower" is vague as there are many ways such as memory, rps, response time etc to measure perf. Anyway, I came across this: http://www.ruby-forum.com/topic/95947#new -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 31 03:53:02 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 31 Jan 2007 09:53:02 +0100 Subject: [Ferret-talk] Ferret problems with Rails 1.2.1 In-Reply-To: References: <1169681985.11386.32.camel@localhost.localdomain> <20070126132601.GB26810@cordoba.webit.de> Message-ID: <20070131085302.GE21355@cordoba.webit.de> On Tue, Jan 30, 2007 at 08:24:45PM +0100, donut donut wrote: [..] > Well I finally got around to testing this again after finishing a work > item today. I upgraded my ferret gem from 0.10.13 to 0.10.14. I also > installed acts_as_ferret 0.3.1 as a gem today(it wasn't available as a > gem before). I added > > require 'acts_as_ferret' > > in my environment.rb and my project ran in rails 1.1.6 fine. Next I > switched to rails 1.2.1(changed RAILS_GEM_VERSION = '1.2.1' in > environment.rb) and did a rake rails:update (which didn't do anything > related to my problem, I think). Ferret now seems to be working. > However, I'm getting another error: > > # [..] > > HWBColorspace is part of the Rmagick gem. I think rails 1.2.1 breaks a > lot of backward compatibility. Now I need to decide if I want to > upgrade my Rmagick gem and hope that I won't run into another > compatibility problem. I doubt that updating the gem will solve your problem. Rails 1.2 just has a different way of how it deals with dependencies - afair the main difference is that it does not try as hard as older versions to resolve missing classes - so maybe you are missing a require somewhere? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From bob at sporkmonger.com Wed Jan 31 09:21:31 2007 From: bob at sporkmonger.com (Bob Aman) Date: Wed, 31 Jan 2007 15:21:31 +0100 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved Message-ID: So, I'm working on a search engine of sorts that restricts results to your local area. I can successfully return all entries within 15 miles of a particular point, and I can successfully return all entries that match a search query, but I'm having trouble combining the two together and doing pagination on them. Basically, for the range query, you do a SQL query that returns all results within +/- 1 latitude/longitude from the point in question, and then you do some spherical trig on each of those results to get only the entries within X number of miles of the point. And for the search query, so far, I've been using acts_as_ferret's find_by_contents method. But now I need to figure out how to take an array of results from the range query, and only do the find_by_contents magic on just the entries in that Array. So far, everything method I've thought of looks like it's going to have performance problems. Any suggestions? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jan 31 12:03:23 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 31 Jan 2007 18:03:23 +0100 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: References: Message-ID: <20070131170323.GG21355@cordoba.webit.de> On Wed, Jan 31, 2007 at 03:21:31PM +0100, Bob Aman wrote: > So, I'm working on a search engine of sorts that restricts results to > your local area. I can successfully return all entries within 15 miles > of a particular point, and I can successfully return all entries that > match a search query, but I'm having trouble combining the two together > and doing pagination on them. > > Basically, for the range query, you do a SQL query that returns all > results within +/- 1 latitude/longitude from the point in question, and > then you do some spherical trig on each of those results to get only the > entries within X number of miles of the point. wouldn't it be possible to query the ferret index for all points withing the +/- 1 long/lat range? then you could combine the user's search terms with that query, and afterwards do the calculations to filter out those points outside the x miles radius. > And for the search query, so far, I've been using acts_as_ferret's > find_by_contents method. But now I need to figure out how to take an > array of results from the range query, and only do the find_by_contents > magic on just the entries in that Array. So far, everything method I've > thought of looks like it's going to have performance problems. In any case I'd first try what works, and then look at the performance of different approaches with a realistical amount of data. That aside, my first guess is that narrowing down the result set with Ferret before doing any spatial calculations would be a good idea. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From mi-ferret at moensolutions.com Wed Jan 31 12:48:15 2007 From: mi-ferret at moensolutions.com (Michael Moen) Date: Wed, 31 Jan 2007 09:48:15 -0800 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: References: Message-ID: <27301813-3FDE-4DD5-9496-F33646275277@moensolutions.com> On Jan 31, 2007, at 6:21 AM, Bob Aman wrote: > And for the search query, so far, I've been using acts_as_ferret's > find_by_contents method. But now I need to figure out how to take an > array of results from the range query, and only do the > find_by_contents > magic on just the entries in that Array. So far, everything method > I've > thought of looks like it's going to have performance problems. > > Any suggestions? Bob- I don't know what would be involved in using this method with aaf, but I am using this method for a bounding box geo query. If you need a more strict radial search you can use a custom filter with 0.10.x. During index population I'm doing: doc << Field.new("latitude", latitude.to_f + 1000, Field::Store::NO, Field::Index::UNTOKENIZED) doc << Field.new("longitude", longitude.to_f + 1000, Field::Store::NO, Field::Index::UNTOKENIZED) and during the query I'm doing: query << "latitude:[#{box[:lat_min] + 1000} #{box[:lat_max] + 1000}] AND " query << "longitude:[#{box[:lon_min] + 1000} #{box[:lon_max] + 1000}] AND " I have helper methods outside of this scope to handle the min/max that I'm searching in. I also have a complete (yet untested) GeoFilter, but we aren't using the .10.x Ferret yet so I have no idea if it actually works. Michael- From matt at mattschnitz.com Wed Jan 31 13:15:54 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Wed, 31 Jan 2007 10:15:54 -0800 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: <27301813-3FDE-4DD5-9496-F33646275277@moensolutions.com> References: <27301813-3FDE-4DD5-9496-F33646275277@moensolutions.com> Message-ID: <497cc4a0701311015y498a9469rf3b6f670edb57f4c@mail.gmail.com> I could go on for hours about this, but lemme see if I can summarize what I know. Search engines aren't really optimized to geographic queries (unless they have a built-in geographic index feature). You can make them work, but it's gonna be a hack. Several things to try: - What you described already, the lat / lon box idea. The issue with that is that it selects the lat first, the lon second. By doing so, it's like taking a vertical or horizontal stripe of the country and sticking it in a temporary result set. That temporary result set can be HUGE, especially on the US East Coast. Which is why this is slow. - You could, instead, select all the zip codes or postal codes in the area, first, then do your precise calculations . That will be fast if and only if Ferret can handle that many query terms at once. Most search engines can't, really, but still, this is usually a bit faster than the first solution. The hard part here is computing the set of zip codes you want in the first place. - You could limit individual queries to greater metro areas, first, then do your precise calculations. Two issues here: one, getting that data; two, those areas have borders, so getting coverage is difficult at best. Faster, still, is doing the zip code coverage area solution in a database. The reason this'd be faster is that you can take your set of zip codes covering your search area, and join them to the table in question. Joins are much faster than a list of search query terms. Like I said, the true solution is geographic indexes. Unfortunately, Ferret doesn't have them. Maybe Lucene does? It's possible to fake geographic indexes in a non-geographic engine, but it's really nasty math; I'd only recommend that if you need to bleed every last ounce of speed out of it. Schnitz On 1/31/07, Michael Moen wrote: > > > On Jan 31, 2007, at 6:21 AM, Bob Aman wrote: > > > And for the search query, so far, I've been using acts_as_ferret's > > find_by_contents method. But now I need to figure out how to take an > > array of results from the range query, and only do the > > find_by_contents > > magic on just the entries in that Array. So far, everything method > > I've > > thought of looks like it's going to have performance problems. > > > > Any suggestions? > > Bob- I don't know what would be involved in using this method with > aaf, but I am using this method for a bounding box geo query. If you > need a more strict radial search you can use a custom filter with > 0.10.x. > > During index population I'm doing: > > doc << Field.new("latitude", latitude.to_f + 1000, > Field::Store::NO, Field::Index::UNTOKENIZED) > doc << Field.new("longitude", longitude.to_f + 1000, > Field::Store::NO, Field::Index::UNTOKENIZED) > > and during the query I'm doing: > > query << "latitude:[#{box[:lat_min] + 1000} #{box[:lat_max] + 1000}] > AND " > query << "longitude:[#{box[:lon_min] + 1000} #{box[:lon_max] + 1000}] > AND " > > I have helper methods outside of this scope to handle the min/max > that I'm searching in. I also have a complete (yet untested) > GeoFilter, but we aren't using the .10.x Ferret yet so I have no idea > if it actually works. > > Michael- > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070131/61b99233/attachment-0001.html From matt at mattschnitz.com Wed Jan 31 13:25:28 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Wed, 31 Jan 2007 10:25:28 -0800 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: <497cc4a0701311015y498a9469rf3b6f670edb57f4c@mail.gmail.com> References: <27301813-3FDE-4DD5-9496-F33646275277@moensolutions.com> <497cc4a0701311015y498a9469rf3b6f670edb57f4c@mail.gmail.com> Message-ID: <497cc4a0701311025t5f937da8r1061e565767ece24@mail.gmail.com> Oh, and one bit of advice on lat/lon boxes - The order the engine performs the lat & lon query - either lat then lon, or lon then lat - matters. Try swapping the terms in the queries. You want the engine to select the vertical stripe (lon between x and y) first in most of the US, because that has the least chance of picking up other major metro areas. If you're really slick, though, you'll switch it up depending on the area of the country, because that vertical stripe is murder on certain parts of the East Coast. (This all assumes you're in the US, but the principle is the same wherever you are - check your map!) Schnitz On 1/31/07, Matt Schnitz wrote: > > I could go on for hours about this, but lemme see if I can summarize what > I know. > > Search engines aren't really optimized to geographic queries (unless they > have a built-in geographic index feature). You can make them work, but it's > gonna be a hack. > > Several things to try: > - What you described already, the lat / lon box idea. The issue with > that is that it selects the lat first, the lon second. By doing so, it's > like taking a vertical or horizontal stripe of the country and sticking it > in a temporary result set. That temporary result set can be HUGE, > especially on the US East Coast. Which is why this is slow. > - You could, instead, select all the zip codes or postal codes in the > area, first, then do your precise calculations . That will be fast if and > only if Ferret can handle that many query terms at once. Most search > engines can't, really, but still, this is usually a bit faster than the > first solution. The hard part here is computing the set of zip codes you > want in the first place. > - You could limit individual queries to greater metro areas, first, then > do your precise calculations. Two issues here: one, getting that data; two, > those areas have borders, so getting coverage is difficult at best. > > Faster, still, is doing the zip code coverage area solution in a > database. The reason this'd be faster is that you can take your set of zip > codes covering your search area, and join them to the table in question. > Joins are much faster than a list of search query terms. > > Like I said, the true solution is geographic indexes. Unfortunately, > Ferret doesn't have them. Maybe Lucene does? It's possible to fake > geographic indexes in a non-geographic engine, but it's really nasty math; > I'd only recommend that if you need to bleed every last ounce of speed out > of it. > > > Schnitz > > On 1/31/07, Michael Moen wrote: > > > > > > On Jan 31, 2007, at 6:21 AM, Bob Aman wrote: > > > > > And for the search query, so far, I've been using acts_as_ferret's > > > find_by_contents method. But now I need to figure out how to take an > > > array of results from the range query, and only do the > > > find_by_contents > > > magic on just the entries in that Array. So far, everything method > > > I've > > > thought of looks like it's going to have performance problems. > > > > > > Any suggestions? > > > > Bob- I don't know what would be involved in using this method with > > aaf, but I am using this method for a bounding box geo query. If you > > need a more strict radial search you can use a custom filter with > > 0.10.x. > > > > During index population I'm doing: > > > > doc << Field.new("latitude", latitude.to_f + 1000, > > Field::Store::NO, Field::Index::UNTOKENIZED) > > doc << Field.new("longitude", longitude.to_f + 1000, > > Field::Store::NO, Field::Index::UNTOKENIZED) > > > > and during the query I'm doing: > > > > query << "latitude:[#{box[:lat_min] + 1000} #{box[:lat_max] + 1000}] > > AND " > > query << "longitude:[#{box[:lon_min] + 1000} #{box[:lon_max] + 1000}] > > AND " > > > > I have helper methods outside of this scope to handle the min/max > > that I'm searching in. I also have a complete (yet untested) > > GeoFilter, but we aren't using the .10.x Ferret yet so I have no idea > > if it actually works. > > > > Michael- > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070131/c3c121f0/attachment.html From bob at sporkmonger.com Wed Jan 31 14:45:06 2007 From: bob at sporkmonger.com (Bob Aman) Date: Wed, 31 Jan 2007 20:45:06 +0100 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: <20070131170323.GG21355@cordoba.webit.de> References: <20070131170323.GG21355@cordoba.webit.de> Message-ID: > wouldn't it be possible to query the ferret index for all points withing > the > +/- 1 long/lat range? then you could combine the user's search terms > with that > query, and afterwards do the calculations to filter out those points > outside > the x miles radius. Hadn't actually thought of having ferret index the lat/long points. However, I'm pretty sure I don't want to go down that route. However, while I was waiting for an answer, I think I figured out how to do this query in the fastest way possible. In this case, it would be optimized specifically for my app, and for this one query, but that's fine by me because the app only has one thing you ever search against. The way things are set up, you have a single location in the DB for each subscriber. Each location can have multiple entries. The geoquery is actually done against the locations, not the entries. So there's a LOT fewer of them. You might have 10-15 locations in a city, max, but each location could have hundreds of entries. So you do the geoquery only against the locations, and you get an array of ids for locations within range of the point back. Now you just change the indexed entries to also include the ids of the locations they're associated with. When you do the search, you include the list of ids that it can match against as part of the query. That should be possible, correct? I haven't sat down and worked out exactly what the ferret query syntax would look like for that though. -- Posted via http://www.ruby-forum.com/. From matt at mattschnitz.com Wed Jan 31 14:57:04 2007 From: matt at mattschnitz.com (Matt Schnitz) Date: Wed, 31 Jan 2007 11:57:04 -0800 Subject: [Ferret-talk] GeoQuery with acts_as_ferret involved In-Reply-To: References: <20070131170323.GG21355@cordoba.webit.de> Message-ID: <497cc4a0701311157j2df5f55egc75734392bd47e95@mail.gmail.com> That's a good plan. Beware the number of search terms, though. Typically search engines slow down pretty quick the more you add search terms onto the query. It creates a lot of merging work on the back-end of the query execution. Schnitz On 1/31/07, Bob Aman wrote: > > > wouldn't it be possible to query the ferret index for all points withing > > the > > +/- 1 long/lat range? then you could combine the user's search terms > > with that > > query, and afterwards do the calculations to filter out those points > > outside > > the x miles radius. > > Hadn't actually thought of having ferret index the lat/long points. > However, I'm pretty sure I don't want to go down that route. > > However, while I was waiting for an answer, I think I figured out how to > do this query in the fastest way possible. In this case, it would be > optimized specifically for my app, and for this one query, but that's > fine by me because the app only has one thing you ever search against. > > The way things are set up, you have a single location in the DB for each > subscriber. Each location can have multiple entries. The geoquery is > actually done against the locations, not the entries. So there's a LOT > fewer of them. You might have 10-15 locations in a city, max, but each > location could have hundreds of entries. So you do the geoquery only > against the locations, and you get an array of ids for locations within > range of the point back. Now you just change the indexed entries to > also include the ids of the locations they're associated with. When you > do the search, you include the list of ids that it can match against as > part of the query. > > That should be possible, correct? > > I haven't sat down and worked out exactly what the ferret query syntax > would look like for that though. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070131/144680f0/attachment.html From wmorgan-ferret at masanjin.net Wed Jan 31 16:21:19 2007 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Wed, 31 Jan 2007 13:21:19 -0800 Subject: [Ferret-talk] Exact phrase score In-Reply-To: <40c330811615f85d3848fc907531fa19@ruby-forum.com> References: <40c330811615f85d3848fc907531fa19@ruby-forum.com> Message-ID: <1170276973-sup-6670@south> Excerpts from EJ Finneran's message of Mon Jan 22 20:15:08 -0800 2007: > In the app I'm writing, we convert the score to a percentage and > display it with the search results. The problem is when you search > for an exact phrase (for example) and it matches the title of a > document exactly, you only get a 17% match. > > Has anyone seen a way to either curve the scores or make an exact > phrase match get a higher score? How are you converting scores to percentages? I.e. what are you using as the maximum value? Are you looking for percentages that are consistent across queries, or just across results in a single query? -- William From andreas.korth at gmx.net Wed Jan 31 17:15:19 2007 From: andreas.korth at gmx.net (Andreas Korth) Date: Wed, 31 Jan 2007 23:15:19 +0100 Subject: [Ferret-talk] Exact phrase score In-Reply-To: <1170276973-sup-6670@south> References: <40c330811615f85d3848fc907531fa19@ruby-forum.com> <1170276973-sup-6670@south> Message-ID: On 31.01.2007, at 22:21, William Morgan wrote: >> In the app I'm writing, we convert the score to a percentage and >> display it with the search results. > > How are you converting scores to percentages? I.e. what are you > using as > the maximum value? Ferret::Search::TopDocs#max_score Example: index = Ferret::I.new index << "tic" index << "tic tac" index << "tic tac toe" docs = index.search("tic OR tac") docs.hits.each { |hit| puts hit.score / docs.max_score * 100 } Result: 100.0 79.999993785928 26.9283741206894 Cheers, Andy From blah at blah.com Wed Jan 31 20:42:38 2007 From: blah at blah.com (Mark) Date: Thu, 1 Feb 2007 02:42:38 +0100 Subject: [Ferret-talk] Automatically Indexing Associated Models Message-ID: PROBLEM I have two models, Blog and BlogComment. When a blog is initially created, it has no comments. Upon creation, the title and body are automatically added to the ferret index and directly searchable. However, when a comment is added to a blog, that comment does not get added to the index and is therefore not ferretable. The desired behavior is that when a comment is added to a blog, that the comment be ferretable. CURRENT SETUP Blog (id, title, body, user_id) BlogComment (id, blog_id, comment) class Blog < ActiveRecord::Base has_many :blog_comments, :dependent => :destroy acts_as_ferret :additional_fields => [:blog_comments] def blog_comments self.blog_comments.collect {|comment| comment.body } end end class BlogComment < ActiveRecord::Base belongs_to :blog end CONTROLLER [..] if @blog.blog_comments << comment do_something else do_something end Can someone recommend a good approach to automatically updating the index when a comment is added? -- Posted via http://www.ruby-forum.com/.